PDF to Markdown & JSON — Extract Text in Your Browser

Turn a PDF into clean Markdown or structured JSON — headings, paragraphs, and per-page text — without uploading anything. The file is parsed locally in your browser.

How it works (and why nothing is uploaded)

This tool runs Mozilla’s pdf.js entirely in your browser. It reads the PDF’s embedded text layer, groups the positioned glyphs back into lines and paragraphs, infers headings from font size, and emits either Markdown or a structured JSON object. The PDF bytes never leave your device — no upload, no server, no account, no storage.

Markdown — readable text with ## headings and paragraph breaks reconstructed from layout. Ideal for pasting into docs, notes, or an AI prompt.
JSON — one object per page ({ page, text, lines: [{ text, heading }] }), ready to process in code.

What it’s good for

Getting a PDF’s text into an editor, notes app, or an LLM without copy-paste mangling.
Pulling structured page text into a script.
Quick, private extraction where you don’t want to upload a sensitive document to a web service.

It extracts the text layer, so it works on real (digital) PDFs, not scanned images.

Doing this at scale — the developer API

Need to convert hundreds of files, or extract strict JSON by your own schema (invoices, receipts, emails → typed fields), or OCR scanned pages? That’s server work a browser can’t do well, so it’s a no-account, pay-per-call API — live now, you pay with a credit key like an API key, no signup, no subscription. See the developer API.

Markdown ↔ HTML Converter — convert between Markdown and HTML, both ways.
Image to Text (OCR) — extract text from images/screenshots in your browser.
PDF Editor — reorder, split, merge, annotate, and sign PDFs locally.
PDF Metadata Stripper — inspect and remove PDF metadata.

Frequently asked questions

Is my PDF uploaded anywhere?

No. The PDF is parsed locally in your browser with Mozilla's pdf.js — the bytes never leave your device, nothing is stored, and there is no account. You can confirm in DevTools → Network: after the page loads, choosing a file produces no upload.

What is the difference between the Markdown and JSON output?

Markdown gives you clean, readable text with headings and paragraphs reconstructed from the PDF layout — good for pasting into docs, notes, or an LLM prompt. JSON gives you a structured object: one entry per page with its lines and a heading flag, so you can process it programmatically.

Does it work on scanned PDFs (images)?

No — this extracts the PDF's embedded text layer. A scanned document is just images with no text, so it returns nothing. AI OCR for scanned pages is on the roadmap for the developer API (see below); for one-off images, try the in-browser Image to Text (OCR) tool.

Why is the formatting sometimes imperfect?

PDFs store positioned glyphs, not a document structure, so headings, columns, and tables are reconstructed heuristically from text position and size. Single-column documents convert cleanly; complex multi-column layouts and tables may need a tidy-up. The text content is always accurate; only the structure is inferred.

Is there a limit or a cost?

It is free with no signup and no page limit — it runs on your own machine, so the only limit is your browser's memory on very large PDFs. A paid, no-account API for doing this at scale (and by JSON schema) is in development — see the developer plans.

How it works (and why nothing is uploaded)

What it’s good for

Doing this at scale — the developer API

Related tools

Frequently asked questions