06.1 · OCR and Text

OCR a Page or Region

OCR (Optical Character Recognition) turns image-based text on a scanned PDF into selectable, searchable text. Spark can run OCR on a whole page or on a user-drawn region.

When to Use OCR

PDF pages that are scans — you can't select text on them.
Text markup tools (Highlight / Underline) give "No text found" messages.
Search (Ctrl+F) returns no results even though the text is visible.

OCR the Current Page

Right-click the page → OCR → OCR This Page.
A progress indicator appears.
When complete, the page now has a transparent text layer that matches the visual characters.

After OCR:

Text selection works (drag across text).
Ctrl+F can find words.
Highlight / Underline / Strikethrough tools work on the OCR'd text.

OCR a Region

For large pages, OCR'ing the whole thing can be slow. To focus on one area:

Right-click → OCR → OCR Selected Region.
Drag a rectangle over the region.
Release.
Spark OCRs only that region, adding its text to the page's text layer.

Copy OCR Text

After OCR, right-click → OCR → Copy OCR Text copies the recognised text to the clipboard. Useful when you don't want to paste into a text markup — just grab the text.

Language and Accuracy

Default OCR language is set in Settings → OCR → Language. Supported languages include English, French, German, Spanish, Italian, Portuguese, Dutch, and CJK (Chinese, Japanese, Korean) depending on which language packs are installed.

Accuracy tips:

Higher-resolution scans produce better results.
Pages with a straight orientation OCR better than skewed ones.
Mixed-language pages require the correct language pack; auto-detect works for most Western scripts.

OCR Engine

Spark uses a local OCR engine — no data leaves your machine. All processing happens on your computer.

What OCR Doesn't Do

It doesn't redraw the page — the visual content is unchanged.
It doesn't modify annotations already on the page.
It doesn't auto-correct typos.
It doesn't extract tables to a structured format — only running text.

After OCR

The recognised text is saved in the PDF's text layer. On next open:

The page is still image-based visually.
Text selection and search work because the invisible text layer is present.

This is the standard "searchable PDF" format.

Performance

Page size	Typical OCR time (local engine)
A4 / Letter, 300 DPI	2–5 seconds
A3 or larger, 600 DPI	5–15 seconds
Whole 100-page document (batch)	~1–5 minutes

Batch OCR (all pages) is available via Tools → OCR → OCR All Pages where enabled.

Common Pitfalls

Tip

Warning — OCR quality depends on scan quality. A blurry or low-contrast scan may produce incorrect text. Always spot-check the recognised text for critical documents.

Tip

Tip — For forms with hand-written entries, OCR won't reliably read handwriting. Use OCR for printed text only; add Sticky Notes or Text Boxes over hand-written sections instead.