OCR a Page or Region
OCR (Optical Character Recognition) turns image-based text on a scanned PDF into selectable, searchable text. Spark can run OCR on a whole page or on a user-drawn region.
When to Use OCR
- PDF pages that are scans — you can't select text on them.
- Text markup tools (Highlight / Underline) give "No text found" messages.
- Search (
Ctrl+F) returns no results even though the text is visible.
OCR the Current Page
- Right-click the page → OCR → OCR This Page.
- A progress indicator appears.
- When complete, the page now has a transparent text layer that matches the visual characters.
After OCR:
- Text selection works (drag across text).
Ctrl+Fcan find words.- Highlight / Underline / Strikethrough tools work on the OCR'd text.
OCR a Region
For large pages, OCR'ing the whole thing can be slow. To focus on one area:
- Right-click → OCR → OCR Selected Region.
- Drag a rectangle over the region.
- Release.
- Spark OCRs only that region, adding its text to the page's text layer.
Copy OCR Text
After OCR, right-click → OCR → Copy OCR Text copies the recognised text to the clipboard. Useful when you don't want to paste into a text markup — just grab the text.
Language and Accuracy
Default OCR language is set in Settings → OCR → Language. Supported languages include English, French, German, Spanish, Italian, Portuguese, Dutch, and CJK (Chinese, Japanese, Korean) depending on which language packs are installed.
Accuracy tips:
- Higher-resolution scans produce better results.
- Pages with a straight orientation OCR better than skewed ones.
- Mixed-language pages require the correct language pack; auto-detect works for most Western scripts.
OCR Engine
Spark uses a local OCR engine — no data leaves your machine. All processing happens on your computer.
What OCR Doesn't Do
- It doesn't redraw the page — the visual content is unchanged.
- It doesn't modify annotations already on the page.
- It doesn't auto-correct typos.
- It doesn't extract tables to a structured format — only running text.
After OCR
The recognised text is saved in the PDF's text layer. On next open:
- The page is still image-based visually.
- Text selection and search work because the invisible text layer is present.
This is the standard "searchable PDF" format.
Performance
| Page size | Typical OCR time (local engine) |
|---|---|
| A4 / Letter, 300 DPI | 2–5 seconds |
| A3 or larger, 600 DPI | 5–15 seconds |
| Whole 100-page document (batch) | ~1–5 minutes |
Batch OCR (all pages) is available via Tools → OCR → OCR All Pages where enabled.
Common Pitfalls
Warning — OCR quality depends on scan quality. A blurry or low-contrast scan may produce incorrect text. Always spot-check the recognised text for critical documents.
Tip — For forms with hand-written entries, OCR won't reliably read handwriting. Use OCR for printed text only; add Sticky Notes or Text Boxes over hand-written sections instead.