Why MIME types alone aren't enough

The browser tells you what it thinks a download is. The server told the browser what to think. Neither of them looked at the bytes. Here's why that's a problem.

security mime web detection

Open the browser DevTools, go to the Network tab, look at any download. There’s a Content-Type header. The browser believes it. The server provided it. The user sees a download dialog labeled with whatever the server said the file was.

Nobody in that chain looked at the actual bytes. The whole system runs on trust.

That’s mostly fine. But it’s not always fine - and the cases where it isn’t are some of the most reliable malware delivery vectors of the last decade.

The MIME-type contract, briefly

A MIME type is a label. The original spec (RFC 2045) defined it for email attachments; the web inherited the same type/subtype notation for HTTP. When a browser requests a URL, the server replies with a Content-Type: image/png (or whatever) and the browser uses that to decide:

  • Which renderer to use (PNG goes to the image pipeline, PDF goes to the PDF viewer).
  • Whether to display inline or trigger a download (Content-Disposition: attachment overrides this).
  • Whether the cross-origin request is allowed (CORS preflight cares about content type).
  • Whether the response can be safely interpreted as a script or stylesheet.

Browsers do some validation on top - if a server claims a response is text/html but the bytes start with %PDF-, modern browsers will refuse to interpret it as HTML. This is called MIME sniffing and it exists specifically to defend against the failure mode where a server lies and the browser believes it.

But MIME sniffing is conservative. It exists to prevent specific attacks (HTML injection in image responses, mostly). It’s not a substitute for knowing what a file actually is.

Where the contract breaks

Three concrete failure modes:

The server is wrong by accident

A surprising fraction of HTTP servers misconfigure MIME types and never notice. CSS files served as text/plain work in browsers because of forgiving fallback behavior. Custom file extensions on a hand-rolled web server default to application/octet-stream. WebAssembly binaries served before the WASM standard was finalized often went out as application/x-wasm or just application/octet-stream, and users had to learn that “if you see weird MIME, check for a .wasm magic byte at the start.”

This isn’t malicious; it’s just the result of every HTTP server having a different built-in MIME database, and no two of them agreeing.

The server is wrong on purpose

Phishing pages routinely serve executables with PDF MIME types because the browser’s download dialog will say “Open with: Adobe Acrobat” - and a non-technical user clicks Open. The bytes are a PE binary; the browser believed the server.

Modern browsers mitigate this by checking executable signatures separately and warning on the discrepancy. They don’t always block, though, and the warning text is often dismissible. Users habituated to clicking through warnings often do.

The user uploads something disguised

The mirror image: a user uploads a file to your application. Your app trusts the browser-supplied MIME type from the multipart upload. The browser, in turn, trusted what the OS told it - which on Windows means it trusted the file’s extension. So a file named vacation.jpg arrives at your server with Content-Type: image/jpeg and you happily store it in the bucket marked “user images.”

When that file is actually a PE binary, you’ve now hosted malware on your CDN, indexed by your image pipeline, and served from a domain users trust. The fact that it never matched its declared MIME at any point doesn’t matter, because nothing in the chain checked.

What “actually checking” looks like

Three layers, each of which any reasonable web application should implement:

Layer 1: magic-byte verification

Take the first ~20 bytes of the upload, compare against a database of known signatures. Reject if the declared MIME doesn’t match the bytes.

This is cheap (sub-millisecond), well-understood, and catches the lazy version of the attack. Any modern language has a library: file-type on npm, python-magic on PyPI, detect-mime on crates.io. Add it to your upload handler.

Layer 2: structural validation

For complex formats, magic bytes alone aren’t enough. A file that starts with the PNG signature but isn’t a valid PNG can still cause problems - image libraries have a long history of CVEs around malformed PNG/JPEG/GIF inputs. The fix is to actually parse the file with a real parser and reject what fails to round-trip.

This is more expensive (milliseconds to seconds depending on size). It catches an entire class of “the file passes magic-byte checks but exploits a parser bug downstream” attack.

Layer 3: content classification

For files where the format is genuinely ambiguous (text-like content, generic ZIP containers), use ML-based classification. Magika is the easy answer: ~1 MB model, runs in the browser via WebAssembly, ~10ms per file, returns a probability distribution.

The probability distribution itself is useful. A file that the model is 99% confident is a PE binary, but is being uploaded as image/jpeg, is almost certainly malicious. A file the model is 60% confident is YAML and 35% confident is text/plain, declared as text/yaml - probably fine, just ambiguous content.

Defense-in-depth, briefly

The actual production architecture I’d recommend for any application that handles user uploads:

upload                                 (browser)
  |
  v
extension whitelist                    (cheap, blocks 80% of garbage)
  |
  v
content-length check                   (reject obvious DOS attempts)
  |
  v
magic-byte verification                (libmagic or file-type)
  |
  v
declared-MIME-vs-detected-MIME check   (reject discrepancies)
  |
  v
structural parse for known formats     (PIL for images, etc)
  |
  v
ML classification with Magika          (catches the rest)
  |
  v
quarantine on low confidence           (manual review queue)
  |
  v
storage with no-execute headers        (Content-Type: text/plain on S3 etc)

Each layer is cheap enough to run on every upload. Each layer rejects work the next layer would have to do. The cost is ~50-100ms total per upload, dominated by the structural parse step on large files.

This is also roughly what high-trust platforms do internally - GitHub’s attachment pipeline, Slack’s file uploads, anything that processes documents at scale. The defense-in-depth pattern isn’t novel; it’s just rarely fully implemented in smaller applications because each individual layer can be skipped without anything obviously breaking.

The takeaway

A MIME type is a claim about what a file is. The cost of verifying that claim is low. The cost of trusting it without verification - in some specific failure modes - is severe. If you handle files from untrusted sources, treating MIME types as advisory rather than authoritative will prevent more security incidents than almost any other single change.

You can experiment with the difference yourself: rename any binary on your machine to .jpg, drop it on the home page, and watch the detected type. The model doesn’t care what you called it. Neither should your upload pipeline.