User Guide
06.1 · OCR and Text

OCR a Page or Region

OCR (Optical Character Recognition) turns image-based text on a scanned PDF into selectable, searchable text. Spark can run OCR on a whole page or on a user-drawn region.

When to Use OCR

  • PDF pages that are scans — you can't select text on them.
  • Text markup tools (Highlight / Underline) give "No text found" messages.
  • Search (Ctrl+F) returns no results even though the text is visible.

OCR the Current Page

  1. Right-click the page → OCR → OCR This Page.
  2. A progress indicator appears.
  3. When complete, the page now has a transparent text layer that matches the visual characters.

After OCR:

  • Text selection works (drag across text).
  • Ctrl+F can find words.
  • Highlight / Underline / Strikethrough tools work on the OCR'd text.

OCR a Region

For large pages, OCR'ing the whole thing can be slow. To focus on one area:

  1. Right-click → OCR → OCR Selected Region.
  2. Drag a rectangle over the region.
  3. Release.
  4. Spark OCRs only that region, adding its text to the page's text layer.

Copy OCR Text

After OCR, right-click → OCR → Copy OCR Text copies the recognised text to the clipboard. Useful when you don't want to paste into a text markup — just grab the text.

Language and Accuracy

Default OCR language is set in Settings → OCR → Language. Supported languages include English, French, German, Spanish, Italian, Portuguese, Dutch, and CJK (Chinese, Japanese, Korean) depending on which language packs are installed.

Accuracy tips:

  • Higher-resolution scans produce better results.
  • Pages with a straight orientation OCR better than skewed ones.
  • Mixed-language pages require the correct language pack; auto-detect works for most Western scripts.

OCR Engine

Spark uses a local OCR engine — no data leaves your machine. All processing happens on your computer.

What OCR Doesn't Do

  • It doesn't redraw the page — the visual content is unchanged.
  • It doesn't modify annotations already on the page.
  • It doesn't auto-correct typos.
  • It doesn't extract tables to a structured format — only running text.

After OCR

The recognised text is saved in the PDF's text layer. On next open:

  • The page is still image-based visually.
  • Text selection and search work because the invisible text layer is present.

This is the standard "searchable PDF" format.

Performance

Page sizeTypical OCR time (local engine)
A4 / Letter, 300 DPI2–5 seconds
A3 or larger, 600 DPI5–15 seconds
Whole 100-page document (batch)~1–5 minutes

Batch OCR (all pages) is available via Tools → OCR → OCR All Pages where enabled.

Common Pitfalls

Tip

Warning — OCR quality depends on scan quality. A blurry or low-contrast scan may produce incorrect text. Always spot-check the recognised text for critical documents.

Tip

Tip — For forms with hand-written entries, OCR won't reliably read handwriting. Use OCR for printed text only; add Sticky Notes or Text Boxes over hand-written sections instead.