Reading binary files: ELF, Mach-O, and PE explained

Every executable program on a modern desktop OS is one of three formats: ELF (Linux, BSD), Mach-O (macOS, iOS), or PE (Windows). They solve the same problem - “how do I describe a chunk of compiled code to the operating system” - but they do it differently enough that you can spot which is which from the first four bytes.

This post is a guided tour for people who want to understand what’s actually inside an executable beyond “the OS can run it.”

What an executable file has to describe

Every executable format has to answer the same handful of questions:

Where’s the code? Bytes the CPU should execute.
Where’s the data? Constants, globals, string tables.
What dependencies does it need? Shared libraries, symbol versions.
Where do execution and data start? The entry point and base address.
What architecture and OS does this expect? x86_64 vs ARM64, Linux vs macOS, etc.

The differences between ELF, Mach-O, and PE are largely about how they answer these questions, not what the questions are.

ELF: the open standard

Used by: Linux, BSD, Solaris, every Unix-like system except macOS.

Magic bytes: 7F 45 4C 46 (\x7fELF). The next byte tells you 32-bit (01) or 64-bit (02).

ELF stands for Executable and Linkable Format. It was designed in the late 1980s for AT&T System V and adopted by every serious Unix since. The design philosophy is simple: a small header at the start, a program header table for the loader, and a section header table for the linker. Everything else is data those tables point at.

What makes ELF particularly clean is the separation of segments (what the loader cares about - what to map into memory and where) from sections (what the linker cares about - how to combine multiple object files into one executable). The same physical bytes are typically described twice: once as a segment for mmap, once as a section for ld.

ELF is also extensible. Custom section types are first-class; you can add .note.go.buildid, .debug_* for DWARF debug info, .gnu.version_r for symbol versioning, all without breaking the format. That extensibility is a big part of why ELF has lasted 35+ years without a successor.

To inspect an ELF: readelf -a binary shows you the full structure. objdump -d binary disassembles the code segments.

Mach-O: Apple’s lineage

Used by: macOS, iOS, iPadOS, tvOS, watchOS.

Magic bytes: FE ED FA CE (32-bit), FE ED FA CF (64-bit). Or CA FE BA BE if it’s a “fat” binary containing multiple architectures (x86_64 + ARM64 in the same file). The reverse-byte-order variants CE FA ED FE / CF FA ED FE show up because Mach-O headers are written in the target byte order, and tools sometimes need to recognize little-endian binaries on a big-endian host.

Mach-O comes from NeXTSTEP, which Apple inherited when they acquired NeXT in 1996. The format predates ELF in some senses but took longer to mature. Where ELF separates segments and sections, Mach-O has load commands - a list of typed instructions that tell the loader how to assemble the process. There’s LC_SEGMENT_64 (map this region into memory), LC_LOAD_DYLIB (link against this library), LC_CODE_SIGNATURE (verify this signature), and dozens of others.

The “fat” binary feature is genuinely useful - the same .app bundle can ship binaries for both Intel Macs and Apple Silicon, the OS picks the right slice at launch, and developers don’t have to maintain separate distributions. Linux solves the same problem with multiple packages; Apple solves it inside the file. Both work.

The downside: Mach-O is more tightly coupled to Apple’s tooling than ELF is to GCC/Clang on Linux. Reading Mach-O binaries with non-Apple tools is doable but tends to lag the format’s evolution by years.

To inspect a Mach-O: otool -L binary shows linked libraries, otool -d binary dumps data sections, nm binary lists symbols. On modern macOS the dyld_info tool is more thorough.

PE: Windows, with three decades of compatibility

Used by: Windows, Wine. Also UEFI boot loaders.

Magic bytes: 4D 5A (MZ), at the very start. Then a wait - MZ is also the magic for the original DOS executable format from 1981. Microsoft kept that header for backwards compatibility, so every Windows EXE technically starts with a tiny DOS stub program that prints “This program cannot be run in DOS mode.” The real PE header is at an offset that the DOS stub points to: look for 50 45 00 00 (PE\0\0) somewhere in the first few hundred bytes.

PE - Portable Executable - is COFF underneath, the same object format that originated on early Unix and that Microsoft adapted when they built Windows NT. The structure is conceptually similar to ELF: a header, a section table, sections containing code and data. But it carries a lot of historical accretion - data directories that are mostly unused now, fields that exist for OS/2 compatibility, alignment requirements that come from 1990s page-size assumptions.

What PE does well: rich metadata. The PE format embeds version info, icon resources, manifest XML, and digital signatures directly in the file. Windows uses this for everything from the right-click “Properties” dialog to UAC elevation prompts to driver signature enforcement. ELF and Mach-O handle most of this externally; PE bundles it in.

What PE does badly: complexity. Reading a PE correctly means handling the DOS stub, the COFF header, the optional header (which is required), data directories, section flags, relocation tables, import tables, export tables, and the resource directory tree. The spec runs to several hundred pages. There’s a reason every malware reverse engineer eventually develops opinions about PE parsers.

To inspect a PE: dumpbin /headers binary.exe (Visual Studio toolchain), or objdump -p binary.exe from a MinGW or LLVM cross-toolchain. Free GUI tools like CFF Explorer are popular among security researchers.

Telling them apart from the first 16 bytes

Practical signatures, in order of how often you’ll see them:

First bytes	Format	Notes
`7F 45 4C 46 02`	64-bit ELF	Linux, BSD, Solaris
`7F 45 4C 46 01`	32-bit ELF	Older Linux/embedded
`4D 5A`	PE (or DOS)	Real PE has `PE\0\0` later
`FE ED FA CF`	64-bit Mach-O	macOS, iOS, etc
`FE ED FA CE`	32-bit Mach-O	Older Apple binaries
`CA FE BA BE`	Fat Mach-O	Multi-arch Apple binary
`CA FE BA BE`	Java class file	Same magic, different format - disambiguate by trailing bytes

The CA FE BA BE clash is real. Java class files and fat Mach-O binaries both start with the same 4 bytes; you have to look at the bytes that follow to know which you’ve got. Magika handles this correctly because it’s looking at structure, not just signatures.

Why this matters

If you’re a developer who only ever builds for one platform, you can probably go years without thinking about which format your toolchain emits. The compiler picks the right one, the OS loads it, you ship.

The moment you do anything cross-platform, the format becomes load-bearing. Cross-compiling to Windows from Linux: your toolchain has to emit PE. Building a multi-arch macOS binary: you’re producing a fat Mach-O. Reverse-engineering an unknown binary you found on a customer’s system: the first thing you ask is “what format is this?” because the rest of your investigation depends on the answer.

Detection is the foundation. If you’ve got a file and you don’t know whether it’s ELF, Mach-O, or PE, drop it on the home page - Magika will tell you with high confidence. From there, you can pick the right tools and start asking the more interesting questions about what the binary actually does.