Getting the best OCR accuracy
Best practices when performing OCR using Document Engine OCR API:
-
Image preprocessing — One of the most important recommendations is to avoid manual preprocessing before OCR. Our OCR engine automatically preprocesses documents with better results than manual preprocessing.
-
Language selection
-
Choose the appropriate OCR language(s) for your document. Document Engine OCR API supports more than 30 built-in languages, with extended support for more than 100 languages from the underlying OCR engine.
-
For documents containing more than one language, specify multiple language codes using the “+” symbol (for example, “deu+fra+spa”).
-
-
Image quality considerations
-
Resolution — Aim for a resolution of 200 DPI (dots per inch) or higher. While 300 DPI is commonly recommended, we’ve observed that 200 DPI may sometimes yield better accuracy, especially for certain font sizes. Resolutions higher than 300 DPI may cause title fonts to exceed the optimal size range and degrade OCR accuracy.
-
Font size — Recommended font sizes are between 10 pt and 30 px. Fonts larger than 30 px may be skipped or misinterpreted by the OCR engine due to internal character sizing algorithms.
-
Clarity — Use clean, high-contrast scans. Avoid excessive noise, skewing, and compression artifacts to maximize recognition accuracy.
By following these best practices, you can maximize the OCR accuracy.