PDF vs DOCX vs ODT: when each format wins
The three dominant document formats look interchangeable until you actually try to do something with them. Here's a practical comparison.
If you only ever read documents, the difference between PDF, DOCX, and ODT is invisible. The moment you try to do something - edit one, search inside it, archive it for ten years, render it on a phone - the gaps become obvious.
This is a tour of where each format earns its keep and where it actively hurts you.
PDF: the format you ship to others
PDF was built for one job: make a document look identical on every device, forever. It nails that job. The trade-off is that PDFs are essentially read-only by design - they describe a finished page, not a structured document.
Where PDF wins
- Distribution. You can’t change a PDF without leaving fingerprints, which is why contracts, invoices, regulatory filings, and academic papers ship as PDF. The recipient gets the same pixels you saw.
- Print fidelity. Type, kerning, color profiles, and image bleeds are preserved exactly. If you’ve ever sent a Word doc to a designer and watched their kerning shift, you understand why.
- Long-term archiving. PDF/A is an ISO-standardized subset specifically for long-term preservation. Libraries and government archives use it because the format embeds everything it needs to render itself - fonts, color, metadata - in a single file.
Where PDF hurts
- Editing. Editing a PDF is a series of compromises. Acrobat lets you patch text, but the underlying structure resists meaningful changes; you’re working with positioned glyphs, not paragraphs.
- Reflow. PDFs assume a fixed page size. On a phone, the user pinch-zooms; the format itself doesn’t reflow. Tagged PDFs help screen readers, but the experience is still worse than a real responsive document.
- Extraction. Programmatic text extraction from PDF is famously unreliable. Two columns become interleaved gibberish; tables come out as space-padded strings; figure captions float free of their figures. There’s a reason “extract text from PDF” is a recurring research problem.
Use PDF when the document is final, you care about exact appearance, and the recipient will read it - not edit it.
DOCX: the format you collaborate in
DOCX is what Microsoft Word produces when you save - a ZIP archive containing XML files, images, and metadata. It’s also the de facto exchange format whenever non-technical people share editable documents.
Where DOCX wins
- Editing. Track changes, comments, suggestions, redlines. The document model preserves paragraphs as paragraphs and styles as named styles, which means a careful author can produce something that survives a ten-person revision cycle.
- Tooling. Every office suite reads it. Pandoc converts to and from it. Cloud services (Google Docs, OnlyOffice) round-trip it. If you need a document to be read and edited by someone you don’t control, DOCX is the default answer.
- Tables and forms. Real tables (not images of tables), real form fields, real numbered lists with proper outline levels.
Where DOCX hurts
- Fidelity drift. Open a DOCX in Word, save it in Pages, open it back in Word - something will be subtly different. Custom fonts get substituted. Margins shift by a millimeter. For anything where exact appearance matters, DOCX is the wrong tool.
- Diff-ability. Even though DOCX is XML-under-a-ZIP, the XML is verbose enough that a meaningful
git diffrequires extracting and pretty-printing first. Tools likepandocround-tripping through Markdown work, but it’s a workflow. - Macros and embedded objects. Old DOCs (and the macro-enabled
.docmvariant) are a popular malware delivery channel. Even modern DOCXs can carry remote template injections.
Use DOCX when the document needs to be edited by someone, especially someone who isn’t going to learn a new tool to do it.
ODT: the open-standard cousin
ODT is the OpenDocument Text format - same conceptual structure as DOCX (zipped XML), different lineage. LibreOffice’s native format; readable by Word and Google Docs as well.
Where ODT wins
- Genuinely open. The spec is an ISO standard maintained by OASIS, not a single vendor. Public-sector procurement in the EU and several other jurisdictions favors it for that reason.
- Cleaner XML. If you’re going to programmatically generate or transform documents, ODT’s XML is meaningfully more readable than DOCX’s. Less ceremony, fewer namespace prefixes, more predictable structure.
- Smaller install base of macros. Less of a target for the document-as-malware delivery vector that DOC/DOCM has cultivated.
Where ODT hurts
- Adoption. Outside Europe and parts of academia, “send me an ODT” gets you a confused look. Word can open them, but most users will save back to DOCX without noticing, lossing some fidelity in the round-trip.
- Visual fidelity in mixed environments. Rendering ODT in Word, or vice versa, is roughly as fragile as DOCX-in-Pages. Diagrams and complex layouts shift.
Use ODT when you control both ends of the pipeline (your own toolchain, your own organization) and you care about format independence.
Detection cheat sheet
If you’re handed a file with no extension and need to know which format it is by content:
- PDF starts with
%PDF-and ends with%%EOF. The version number is the next character after the dash. - DOCX starts with the ZIP magic
PK\x03\x04. Inside the archive, the file[Content_Types].xmlreferencesapplication/vnd.openxmlformats-officedocument.wordprocessingml.document. - ODT also starts with
PK\x03\x04. Distinguishing it from a DOCX requires reading the archive’smimetypeentry, which containsapplication/vnd.oasis.opendocument.text.
Magika handles all three reliably from a few hundred bytes - you can verify by dropping a file on the home page. The interesting tests are the ones with deliberately wrong extensions; the model doesn’t care what you named the file.
The one-line summary
- Final, formatting-critical, distributed widely → PDF.
- Editable, collaborative, generic office work → DOCX.
- Vendor-neutral, archival, open-standard requirement → ODT.
Pick the one that matches the document’s future rather than the one your editor opens by default.