Extracting data from images using OCR
Use OCR to extract text from images for high-throughput workflows.
Common use cases include:
- Invoice and receipt processing
- Search indexing pipelines
- Real-time text capture
- Large-scale document digitization
OCR focuses on text extraction and word-level coordinates. It doesn’t perform full semantic layout analysis like ICR.
Download sampleHow Nutrient helps
Nutrient Python SDK handles OCR configuration, extraction, and JSON output.
The SDK handles:
- OCR engine and model configuration details
- Word-level bounding box calculations
- Text line detection and reading order handling
- Multi-language recognition internals
Prerequisites
Before following this guide, ensure you have:
- Python 3.8 or higher installed
- Nutrient Python SDK installed (
pip install nutrient-sdk) - An image file to process (PNG, JPEG, or other supported formats)
- Basic familiarity with Python context manager(opens in a new tab) and the
withstatement
For initial SDK setup and configuration, refer to the getting started guide.
Complete implementation
This example extracts OCR text and writes the output as JSON:
from nutrient_sdk import Document, Vision, VisionEngineConfiguring OCR mode
Open the image and set the vision engine to OCR.
In this sample:
- The document opens in a context manager(opens in a new tab).
document.settings.vision_settings.engine = VisionEngine.OCRenables OCR mode.- OCR mode prioritizes text extraction speed.
with Document.open("input_ocr_multiple_languages.png") as document: # Configure OCR engine for fast text extraction document.settings.vision_settings.engine = VisionEngine.OCRCreating a vision instance and extracting content
Create a vision instance and call extract_content().
In this sample:
Vision.set(document)binds OCR extraction to the opened document.extract_content()returns OCR results as a JSON string.- The output includes extracted text and coordinates.
vision = Vision.set(document) content_json = vision.extract_content()Write the JSON string to a file for downstream processing.
Use this output for indexing, analytics, or storage:
with open("output.json", "w") as f: f.write(content_json)Understanding the output
extract_content() in OCR mode returns JSON optimized for text and word-level positions.
OCR output includes:
- Text content — Extracted text with line structure
- Bounding boxes — Pixel coordinates for text regions
- Word-level data — Per-word positions for highlighting or targeting
- Language detection — Detected language metadata
Unlike ICR output, OCR output focuses on text and positions instead of semantic document structure.
Error handling
Vision API raises VisionException when OCR extraction fails.
Common failure scenarios include:
- The image file can’t be read because of path or permission issues.
- Image data is corrupted or uses unsupported encoding.
- OCR models are missing or inaccessible.
- The available memory is insufficient for large images.
- The image format or resolution is unsupported.
In production code:
- Catch
VisionException. - Return a clear error message.
- Log failure details for debugging.
Conclusion
Use this workflow for OCR-based text extraction:
- Open the image document using a context manager(opens in a new tab) for automatic resource cleanup.
- Configure the vision settings with the
engineproperty assigned toVisionEngine.OCRfor fast text extraction. - OCR mode focuses on character recognition and word extraction without semantic analysis or layout detection.
- Create a vision instance with
Vision.set()to bind text extraction operations to the document. - Call
extract_content()to invoke the OCR engine for character recognition. - The OCR engine performs word detection, calculates bounding boxes, and generates JSON output with text and coordinates.
- The method returns a JSON-formatted string containing extracted text with word-level bounding boxes in pixel coordinates.
- OCR processing is optimized for speed, minimizing computational overhead for high-throughput scenarios.
- Write the JSON content to a file using Python’s built-in file handling with context manager(opens in a new tab) syntax.
- Handle
VisionExceptionerrors for robust error recovery in production environments. - The JSON output enables integration with search indexing (Elasticsearch, Solr), text analysis, and database storage.
- OCR mode is ideal for invoice processing, receipt scanning, search indexing, and document digitization where speed is critical.
For related image extraction workflows, refer to the Python SDK guides.
Download this ready-to-use sample package to explore the Vision API capabilities with preconfigured OCR settings.