Languages and Preprocessing
Language Setting
Setting — Ocr.Language (default eng). Change it in Settings → OCR → Language.
Tex ships with 29 Tesseract language packs. The .traineddata files live in the tessdata\ folder next to Tex.Wpf.exe. Only one language is active per OCR call — switch in Settings when you need another script.
| Script | Codes |
|---|---|
| Latin — Western European | eng English, fra French, deu German, spa Spanish, ita Italian, por Portuguese, nld Dutch, dan Danish, fin Finnish, hun Hungarian, nor Norwegian, swe Swedish, ces Czech, pol Polish, tur Turkish, vie Vietnamese |
| Cyrillic | rus Russian, ukr Ukrainian |
| CJK | jpn Japanese, chi_sim Chinese (Simplified), chi_tra Chinese (Traditional), kor Korean |
| Arabic / RTL | ara Arabic, heb Hebrew |
| Indic / SE Asia | hin Hindi, tha Thai |
| Classical | grc Ancient Greek, lat Latin |
| Auxiliary | osd (orientation & script detection, not a language itself) |
Tip — Using the right language makes a bigger difference to accuracy than any preprocessing toggle. OCR run with eng on a Japanese screenshot will produce garbage regardless of image quality.
Preprocessing Toggle
Setting — Ocr.PreprocessImage (default true). Settings → OCR → Preprocess Image.
When enabled, Tex runs the captured image through a contrast / threshold / denoise pipeline before passing it to Tesseract. This is tuned for screenshots (crisp anti-aliased text on flat backgrounds) and usually improves confidence scores by 5–20 points.
| Situation | Recommendation |
|---|---|
| Normal app / web screenshots | Leave on. |
| High-resolution documents at 100% zoom | Leave on. |
| Photographs of paper / whiteboards | Leave on — still helps. |
| Already-cleaned binary images | Try off if confidence is oddly low. |
Warning — Preprocessing adds ~50–200 ms per call. If you are scripting bulk OCR and every millisecond counts, benchmark both modes on your actual input.