A practical guide to ZIP, 7Z, RAR, and TAR

Four archive formats that look interchangeable until you ship one across platforms. Here's when to use each, and how to tell them apart from the bytes.

formats archives compression

There are dozens of archive formats. In practice, four account for the vast majority of files anyone outside specialized fields will ever encounter: ZIP, 7Z, RAR, and TAR. They look interchangeable in your file manager. They’re not.

This post is the comparison I wish I’d had the first time I had to argue with a Windows user about why I’d sent them a .tar.gz.

ZIP: the lowest common denominator

ZIP is the archive format every operating system understands without installing anything extra. macOS, Windows, every desktop Linux, every mobile OS, every web browser - all of them can extract a ZIP without help.

That ubiquity comes from age and licensing. PKWARE published the ZIP spec in 1989 and made the core format royalty-free. The result is that ZIP support is now baked into operating systems, browsers, file managers, and roughly every programming language’s standard library. If you’re shipping an archive to someone whose technical environment you don’t know, ZIP is the safe bet.

The cost of ubiquity is that ZIP’s compression is mediocre - it uses Deflate (1990s vintage), which gets beaten by every modern codec. ZIP also has historical baggage: encrypted ZIPs use a famously broken cipher (ZipCrypto) by default, the central directory is at the end of the file (which makes streaming awkward), and Unicode filename support depends on flag bits that not every implementation respects.

Use ZIP when you don’t control the recipient’s environment and you don’t need top-tier compression.

7Z: best compression, narrower support

7-Zip’s native format, designed in 1999 specifically to outperform ZIP. The compression is genuinely better - typically 30-70% smaller than ZIP for the same content - because it uses LZMA2 by default and supports large dictionary sizes that find long-range redundancy.

7Z also fixes most of ZIP’s structural problems: solid compression (compress related files as a single stream rather than separately), proper AES-256 encryption that includes filename encryption as an option, and Unicode filenames as a baseline rather than an afterthought.

The catch is install base. macOS ships without 7Z support. Windows didn’t have it built into Explorer until very recent versions, and even now many users don’t know that’s a thing. Most mobile OSes still don’t open 7Z files natively. If you send a 7Z to a non-technical recipient, expect a phone call.

Use 7Z when you control the recipient’s environment (or know they’re technical), and the size savings matter - large code drops, dataset distributions, software installers.

RAR: legacy compression, proprietary

RAR has been around almost as long as ZIP and was for a while the format-of-choice in software piracy circles, which gave it a strange cultural footprint. Compression is competitive with 7Z, sometimes better on specific file types (RAR has good redundant-file handling and recovery records).

The problem is licensing. The compression algorithm is proprietary - the WinRAR creator publishes the decoder spec but not the encoder, which means Free Software implementations can extract RAR archives but not create them. That’s a real friction point: you can’t pipe tar | rar on Linux without licensing WinRAR. You also have to think about whether you really want to ship recipients a format whose toolchain costs them a license fee to participate in.

RAR’s recovery records are genuinely useful. A RAR archive can be configured to survive partial corruption - if a few bytes go bad in transit or storage, the archive can still be repaired. ZIP and 7Z don’t offer this.

Use RAR when you specifically need recovery records or you’re working in an ecosystem (anime fansubs, certain academic disciplines) where it’s already the convention. Otherwise, prefer 7Z.

TAR: the Unix archive, almost never alone

TAR is fundamentally different from the others. It’s not a compression format; it’s a concatenation format. A .tar file is just a sequence of file headers and payloads stuck together end-to-end, with no compression at all.

That’s why you almost never see a bare .tar in the wild. You see .tar.gz (TAR compressed with gzip), .tar.bz2 (compressed with bzip2), or .tar.xz (compressed with xz/LZMA). The TAR step preserves Unix file metadata - permissions, symlinks, ownership, timestamps - and the compression step compresses the whole archive as a single stream.

The single-stream design is TAR’s superpower for the use case it was built for: streaming. You can pipe a TAR archive over the network and start extracting it before the whole thing has arrived, because each file’s header announces the file’s size. ZIP can’t do this; ZIP’s central directory is at the end, so a streaming reader has to wait for the whole archive to know what’s in it.

The flip side: TAR is awful for random access. Want to extract just one file from a 50 GB tarball? You have to read until you find it, which means decompressing everything before it. ZIP is much better for this because of its central directory.

Use TAR (with gzip or xz) when you’re shipping Unix-flavored content - source distributions, system backups, container images. Avoid it for archives a Windows user will receive via email.

Telling them apart from the bytes

If you’re handed a file with no extension, the first few bytes give it away:

  • ZIP: 50 4B 03 04 (PK\x03\x04). Same magic for DOCX, XLSX, JAR, EPUB, ODT, etc. - those are all ZIP archives at the file level. To tell them apart you have to peek inside the archive.
  • 7Z: 37 7A BC AF 27 1C (7z\xbc\xaf\x27\x1c). Distinctive, no overlap with anything else.
  • RAR (legacy): 52 61 72 21 1A 07 00 (Rar!\x1a\x07\x00).
  • RAR 5+: 52 61 72 21 1A 07 01 00 (one extra version byte).
  • TAR: harder. The first 257 bytes are a filename plus zeros; at offset 257 there should be the literal string ustar (POSIX) or ustar (GNU). If the file is gzipped (tar.gz), the outer signature is gzip’s 1F 8B; you only see the TAR signature after decompressing.

Magika handles all of these reliably; it can also distinguish DOCX/XLSX/JAR/EPUB/ODT from a generic ZIP because it samples bytes from inside the archive, not just the wrapper.

A decision tree

                    Recipient is a stranger?
                  /                          \
               yes                            no
                /                              \
            ZIP                              size matters?
                                          /              \
                                       yes               no
                                       /                  \
                                Unix-only?              ZIP or 7Z
                                /         \
                              yes          no
                              /             \
                           tar.xz          7Z

The decision is rarely about the format itself - it’s about how much ceremony you’re willing to inflict on the recipient.