PDF Metadata Stripper
Inspect and remove PDF Info dictionary + XMP metadata in your browser. Field-level toggle, before/after view.
published
- [FREE]
- [NO_SIGNUP]
- [NO_UPLOAD]
A PDF metadata remover clears the hidden author, producer, and XMP fields embedded in PDF files. PDFs from Word, Pages, Acrobat, or LaTeX almost always leak the original author’s username, the operating system, and the writing software’s version. This tool removes that information in your browser — the PDF never touches a server.
What gets stripped
| Field | Source | Typical leak |
|---|---|---|
| Title | Info dictionary | The document’s own filename or the first heading |
| Author | Info dictionary | OS username of whoever created the file |
| Subject | Info dictionary | Free-text description |
| Keywords | Info dictionary | Search-engine hints |
| Producer | Info dictionary | The PDF library version (e.g., “Microsoft Print To PDF”) |
| Creator | Info dictionary | The originating app (Word 2019, Pages 13.2, LaTeX, Chrome) |
| CreationDate | Info dictionary | When the PDF was first saved |
| ModificationDate | Info dictionary | When it was last saved |
| XMP packet | Document catalog | A second copy of the above + tool-specific history, hierarchical metadata, Adobe pipeline IDs |
By default the tool strips all nine. Toggle the checkboxes to keep specific fields.
Why bother
PDFs carry more identity than people realize:
- Whistleblowing & leaks. Several high-profile leaks have been traced back to a PDF’s
Authorfield naming the leaker’s corporate account. - Legal discovery. Producer / Creator history can reveal what software touched the document and when, which lawyers will use in eDiscovery.
- GDPR / privacy. “Personal data” under GDPR includes usernames and timestamps embedded in artifacts you publish.
- Brand / OPSEC. A PDF on your marketing site that says “Producer: Microsoft Print To PDF” gives away parts of your internal toolchain.
The fix is mechanical: rewrite the Info dictionary to empty strings, delete the XMP stream from the document catalog, re-emit the file. That is exactly what this tool does.
Limitations — what it does NOT remove
- Annotations. Sticky notes, highlights, and comments often store the commenter’s name. They live in page-level annotation arrays this tool does not edit.
- Form fields. Filled values stay filled. Add a flatten step in Acrobat or similar to bake them out.
- Page content streams. If your software stamps “Generated by X” as visible text, that text stays — it’s drawing, not metadata.
- Embedded thumbnails. Some PDFs include a small JPEG preview of the first page. Not removed here.
- Attachments. PDFs can carry attached files (a CSV, another PDF). Not removed; not even surfaced. Use Acrobat for this.
- Encrypted PDFs. Decrypt first.
- Digital signatures. Any rewrite invalidates them.
For a sanitize-everything pass, combine this tool with Acrobat’s Tools → Redact → Sanitize Document, or with qpdf’s --linearize --object-streams=disable --remove-unreferenced-resources=yes + a flatten.
How “Producer” and “Creator” differ
- Creator: the original application that authored the document (Word, Pages, InDesign, LaTeX, Chrome’s Print → PDF).
- Producer: the library or driver that turned the authored content into PDF bytes (Microsoft Print To PDF, Adobe PDF Library, Skia/PDF, pdfTeX-1.40.25).
A PDF can have one of each. Both leak the toolchain. The tool clears both by default.
XMP — the deeper metadata layer
XMP (Extensible Metadata Platform) is Adobe’s RDF/XML metadata block embedded inside the PDF object catalog. It typically duplicates the Info dictionary (Title, Author, etc.) and adds tool-specific data:
- Adobe document ID + instance ID (let Adobe track edit history)
- xmpMM:History — a literal edit log
- pdfx:* fields — PDF/X custom data
- Custom application namespaces (Pages writes
dc:andxmp:entries, Word writes its own)
XMP is opt-in for readers but most modern PDF viewers display its values. Acrobat shows XMP fields in Document Properties → Additional Metadata. This tool clears the whole XMP packet by clearing the /Metadata reference in the document catalog.
Privacy
Static HTML page → small JavaScript bundle → pdf-lib runs in your browser tab. Open DevTools → Network: dropping, parsing, stripping, and downloading produce zero requests with your PDF data. The cleaned PDF is built in memory and offered via URL.createObjectURL.
How it compares
| bytefork.tools | smallpdf.com | ilovepdf.com | |
|---|---|---|---|
| Runs in browser | ✓ | ✗ (uploads to server) | ✗ (uploads to server) |
| Strips Info dict | ✓ | ✓ | ✓ |
| Strips XMP packet | ✓ | partial | partial |
| Field-level toggle (keep some, strip others) | ✓ | ✗ | ✗ |
| Side-by-side before / after | ✓ | ✗ | ✗ |
| Sign-in required | ✗ | for batch | for batch |
| Ad-free | ✓ | ✗ | ✗ |
| Free tier limit | none | 2 files/day | 2 files/day |
Related tools
- EXIF Viewer & Stripper — do the same for image metadata.
- SVG Optimizer — strip editor cruft from SVG files.
- Hash Generator — confirm the cleaned file’s hash for record-keeping.
Frequently asked questions
What metadata does this tool remove?
Two layers: the PDF Info dictionary (Title, Author, Subject, Keywords, Producer, Creator, CreationDate, ModificationDate) and the XMP packet (the longer XML metadata block embedded in most modern PDFs, which can include Adobe-specific tags, source filename, last-saved user, original software version, and more). You pick which fields to clear via the checkbox grid; everything is on by default.
Does it leak metadata that lives elsewhere in the PDF?
Maybe. PDFs can carry sensitive information in places this tool does not touch: annotations (sticky notes, highlights with author names), filled form-field data, attached files, embedded thumbnails, page-content streams (some software writes invisible "Created by X" overlays), and digital signatures. For a true sanitize, also run the cleaned PDF through a "flatten" or "sanitize" pass — Acrobat Pro has one, and the macOS Preview Quartz filter "Remove all metadata" does a similar job. This tool clears the Info dictionary + XMP, which covers the most common leak vector.
Are my PDFs uploaded?
No. pdf-lib parses, rewrites, and re-serializes the file in your browser tab. Open DevTools → Network when you drop a PDF: no requests fire. The cleaned file is built in memory and downloaded via a blob: URL.
Why does the cleaned PDF have a different byte size?
Cleared strings take less space; deleted XMP streams remove a whole object; pdf-lib rebuilds the xref table when it saves. Net change can be a few hundred bytes smaller (or, rarely, a few bytes larger if the original was already aggressively packed and pdf-lib re-emits in a more verbose form). Page content is untouched, so visual rendering is identical.
Will it work on encrypted / password-protected PDFs?
No. pdf-lib loads encrypted PDFs only in a limited way (it can read structure but cannot rewrite without the password). Decrypt the file first, then run it through this tool. macOS Preview, Adobe Acrobat, or qpdf can decrypt.
Does it preserve digital signatures?
No — any rewrite of the PDF invalidates digital signatures, including this one. Do not strip metadata from signed PDFs that you need to remain verifiable. Strip the metadata first, then sign.
What does the XMP packet preview show?
The raw XMP XML, truncated at 4000 characters for display. XMP usually contains a richer mirror of the Info dictionary (Title, Creator, etc.) plus tool-specific tags (Adobe pipeline IDs, original filename, history of edits). Stripping clears the whole XMP stream from the PDF catalog.
Can I strip metadata from a batch of PDFs?
Not in this version — drop one PDF at a time. Batch is planned. For now, you can script the same pdf-lib call locally: `import { stripMetadata } from "pdf-lib"` style usage is straightforward in Node.
Is the tool really free?
Yes. No signup, no watermark on the output, no usage limit, no ads. Open source.