Image to Text (OCR) — Extract Text from Images in Your Br...

An image-to-text (OCR) tool extracts the written words from screenshots, scans, and photos of printed text. By default it runs tesseract.js in your browser — no upload, no signup, ten languages — with an optional AI mode for the hard cases.

On-device vs AI OCR

Pick the engine with the toggle above the controls:

On-device (default) — tesseract.js runs in your browser. Nothing is uploaded; your image never leaves the device. Best for clean printed text and screenshots. This is the private default and what the rest of this page describes.
AI OCR (opt-in) — your image is uploaded to a Cloudflare Workers AI vision model (llama-3.2-11b-vision-instruct), which is dramatically better on handwriting, photos taken at an angle, and busy layouts. Free, no signup, capped at 4 MB. It does send your image to a server, so it’s clearly labeled with a notice and stores nothing. Use On-device for anything confidential.

After extraction, an optional Translate control sends just the extracted text to a translation model (m2m100) and returns it in the language you pick. This is the rare opt-in exception to the site’s private-by-default rule — see the privacy page.

When OCR works well

Clean screenshots of code, terminals, error dialogs — almost always >95% accurate.
Scanned documents at 300 DPI or better — book pages, contracts, receipts.
Phone photos of printed text, shot straight-on with good lighting and tight crop.
PDFs converted to PNG — extract one page at a time.

When OCR struggles

Handwriting — tesseract is trained on printed fonts; handwriting needs a different model.
Low-resolution photos — anything under ~150 DPI degrades fast.
Text on busy backgrounds — magazine ads, product packaging, decorated invitations.
Skewed or rotated text — straighten the image first (any photo app).
Stylized fonts — script, blackletter, distressed/grunge fonts confuse the model.

When in doubt, look at the confidence percentage in the result. Above 90% usually matches character-for-character. Below 70% means you should proofread.

Languages

Code	Language	Model size
eng	English	~11 MB
spa	Spanish	~7 MB
fra	French	~6 MB
deu	German	~8 MB
por	Portuguese	~5 MB
ita	Italian	~6 MB
nld	Dutch	~5 MB
jpn	Japanese	~13 MB
chi_sim	Chinese (Simplified)	~14 MB
chi_tra	Chinese (Traditional)	~14 MB

Each model is the tesseract LSTM-trained variant from the official tessdata repository. Downloaded once per language and cached by your browser.

How the recognition works

The browser decodes your image into RGBA pixels via createImageBitmap.
tesseract.js spawns a Web Worker for off-main-thread processing.
The worker loads the WebAssembly Tesseract engine (~4 MB, cached after first load).
The worker loads the trained model for your chosen language (5–14 MB, cached after first load per language).
Tesseract runs page-segmentation → line detection → word recognition → LSTM character classification.
The worker returns the full plain-text result + a confidence score.

Total time: 1–3 seconds for a typical screenshot after the first download. The first call takes 10–30 seconds because of the model download (network-bound, not CPU-bound).

Tips for better results

Crop tight to the text region. Background noise hurts.
Increase contrast before OCR if the image is washed out. Most photo apps have a one-click “auto enhance”.
Convert to grayscale for clean printed text. Color rarely helps; busy color backgrounds hurt.
De-skew the image if it was photographed at an angle. Even 5° of rotation reduces accuracy.
Use the right language. English-trained tesseract reads Spanish text but worse than spa-trained — and refuses accented characters outside the eng character set.

Privacy

Static HTML page → small JavaScript bundle → tesseract.js worker downloaded from jsdelivr → trained-model downloaded from jsdelivr → all OCR runs inside your browser tab. The Network tab in DevTools shows what gets fetched: tesseract worker.min.js, tesseract-core-simd.wasm, and the .traineddata file for your chosen language. None of those uploads your image. Your image bytes never leave your device.

How it compares

	bytefork.tools	onlineocr.net	i2ocr.com
Runs in browser	✓	✗ (uploads)	✗ (uploads)
Multiple languages	✓ 10	✓ 46	✓ 100+
Sign-in required	✗	for >15 docs	✗
Free tier limit	unlimited	15/hr	4MB file size
Ad-free	✓	✗	✗
Output as .txt	✓	✓	✓

Image Compressor — shrink scans before processing.
Image Format Converter — convert HEIC / WebP to OCR-friendly PNG.
PDF Metadata Stripper — clean PDFs after OCR.

Frequently asked questions

How accurate is browser OCR?

For clean printed text (screenshots, scanned books, receipts shot straight-on), tesseract.js reaches 85–98% character accuracy at ~300 DPI. For low-resolution photos, skewed angles, handwriting, or text-on-background-pattern, accuracy drops fast — sometimes below 50%. The confidence percentage in the result is tesseract's own estimate. Below 70%, expect to proofread the output by hand.

Which languages are supported?

Ten preset languages cover most European languages plus Japanese and both Chinese variants: English, Spanish, French, German, Portuguese, Italian, Dutch, Japanese, Chinese Simplified, Chinese Traditional. The Tesseract project ships ~120 language trained models in total; this tool surfaces the ten with the broadest demand. Need a different language? Open an issue.

Does my image get uploaded?

Not in the default On-device mode. There, Tesseract runs as a Web Worker inside your browser tab; the image bytes are passed to the worker via structured-clone (in-memory) and the Network tab shows zero requests with your image data. The trained-model download (5–14 MB) is fetched once from the jsdelivr CDN — that's the OCR weights, not your image. There is also an optional AI OCR mode you can switch to, which does upload the image (see below); it is clearly labeled and off by default.

What is AI OCR mode, and when should I use it?

On-device tesseract is great for clean printed text but weak on handwriting, photos, and messy layouts. AI OCR is an opt-in mode that uploads your image to a Cloudflare Workers AI vision model (llama-3.2-11b-vision-instruct), which reads handwriting and real-world photos far better. It is free, needs no signup, and is capped at 4 MB per image. The image is processed to produce the text and is not stored by this tool (Cloudflare, like any host, may log standard request metadata). For confidential documents, stay on On-device.

Can I translate the extracted text?

Yes — after extracting text (either engine), a Translate control appears: pick a target language and the text is sent to a Workers AI translation model (m2m100) and returned translated. Only the extracted text is sent, nothing else, and it is not stored. It is capped at 2000 characters per request; translate longer text in pieces.

Why is the first recognition slow?

The very first recognition triggers two downloads: the tesseract.js worker JavaScript (~200 KB), the WebAssembly core (~4 MB), and the language model for your chosen language (5–14 MB). These are cached by your browser, so subsequent recognitions in the same language take 1–3 seconds depending on image size and complexity. Switching to a new language re-triggers only the language-model download.

Why is my recognized text wrong / garbled?

Common causes: 1) image resolution too low (~150 DPI or worse) — re-scan at 300 DPI. 2) Wrong language selected (English-trained tesseract reads Spanish badly). 3) Photo taken at an angle — straighten it. 4) Text on a busy background — crop tighter to the text. 5) Handwriting — tesseract is for printed text; handwriting needs a specialized model not bundled here. 6) Stylized fonts (script, blackletter) confuse the model.

Can I get the bounding boxes of recognized words?

tesseract.js returns block-level confidence in this UI. Word-level boxes are available in the raw result object — if you need them, fork the source. For now this tool focuses on plain-text extraction.

Will it preserve layout (paragraphs, columns)?

Newlines are preserved at the line level. Multi-column layouts are read left-to-right, which usually produces useful text but does not reconstruct the column boundaries. For magazine-style layouts, screenshot one column at a time for cleaner output.

What image formats are accepted?

PNG, JPG, WebP, GIF, BMP. Decoded by the browser via createImageBitmap before tesseract sees them.

Is this tool really free?

Yes. No signup, no usage limit, no ads. tesseract.js itself is Apache 2.0; the trained models are Apache 2.0.