Extracting data from images using OCR

Use Adaptive OCR to extract text from images for high-throughput workflows.

Common use cases include:

Invoice and receipt processing
Search indexing pipelines
Real-time text capture
Large-scale document digitization

OCR focuses on text extraction and word-level coordinates. It doesn’t perform full semantic layout analysis like ICR.

Download sample

How Nutrient helps

Nutrient Python SDK handles Adaptive OCR configuration, extraction, and JSON output.

The SDK handles:

OCR engine and model configuration details
Word-level bounding box calculations
Text line detection and reading order handling
Multi-language recognition internals

Prerequisites

Before following this guide, ensure you have:

Python 3.8 or higher installed
Nutrient Python SDK installed (pip install nutrient-sdk)
An image file to process (PNG, JPEG, or other supported formats)
Basic familiarity with Python context manager(opens in a new tab) and the with statement

For initial SDK setup and configuration, refer to the getting started guide.

Complete implementation

This example extracts OCR text and writes the output as JSON:

from nutrient_sdk import Document, Vision, VisionEngine

Configuring Adaptive OCR mode

Open the image and set the vision engine to Adaptive OCR.

In this sample:

The document opens in a context manager(opens in a new tab).
document.settings.vision_settings.engine = VisionEngine.ADAPTIVE_OCR enables Adaptive OCR mode.
For image inputs like this sample, Adaptive OCR behaves like a fast OCR extraction pipeline.

with Document.open("input_ocr_multiple_languages.png") as document:
    # Configure OCR engine for fast text extraction
    document.settings.vision_settings.engine = VisionEngine.ADAPTIVE_OCR

Creating a vision instance and extracting content

Create a vision instance and call extract_content().

In this sample:

Vision.set(document) binds OCR extraction to the opened document.
extract_content() returns OCR results as a JSON string.
The output includes extracted text and coordinates.

    vision = Vision.set(document)
    content_json = vision.extract_content()

Write the JSON string to a file for downstream processing.

Use this output for indexing, analytics, or storage:

    with open("output.json", "w") as f:
        f.write(content_json)

Understanding the output

extract_content() in Adaptive OCR mode returns JSON optimized for text and word-level positions.

OCR output includes:

Text content — Extracted text with line structure
Bounding boxes — Pixel coordinates for text regions
Word-level data — Per-word positions for highlighting or targeting
Language detection — May be available in OCR output depending on the content and extraction result

Key output fields

These are the most commonly used fields in OCR JSON output:

text — Extracted text for the element.
words — Per-word OCR results.
bounds — Bounding box coordinates for the element or word.
confidence — Confidence score for the element or word.
readingOrder — Sequence in which elements should be read.
id — Unique identifier for the extracted element.
pageNumber — Source page number.
type / role — Semantic type of the extracted block when available.

When an element contains only one word, element-level and word-level bounds/confidence can appear identical.

Unlike ICR output, OCR output focuses on text and positions instead of semantic document structure.

Error handling

Vision API raises VisionException when OCR extraction fails.

Common failure scenarios include:

The image file can’t be read because of path or permission issues.
Image data is corrupted or uses unsupported encoding.
OCR models are missing or inaccessible.
The available memory is insufficient for large images.
The image format or resolution is unsupported.

In production code:

Catch VisionException.
Return a clear error message.
Log failure details for debugging.

Conclusion

Use this workflow for Adaptive OCR-based text extraction:

Open the image document using a context manager(opens in a new tab) for automatic resource cleanup.
Configure the vision settings with the engine property assigned to VisionEngine.ADAPTIVE_OCR for fast text extraction.
For image inputs, Adaptive OCR focuses on character recognition and word extraction without semantic analysis or layout detection.
Create a vision instance with Vision.set() to bind text extraction operations to the document.
Call extract_content() to invoke the OCR engine for character recognition.
The OCR engine performs word detection, calculates bounding boxes, and generates JSON output with text and coordinates.
The method returns a JSON-formatted string containing extracted text with word-level bounding boxes in pixel coordinates.
OCR processing is optimized for speed, minimizing computational overhead for high-throughput scenarios.
Write the JSON content to a file using Python’s built-in file handling with context manager(opens in a new tab) syntax.
Handle VisionException errors for robust error recovery in production environments.
The JSON output enables integration with search indexing (Elasticsearch, Solr), text analysis, and database storage.
Adaptive OCR mode is ideal for invoice processing, receipt scanning, search indexing, and document digitization where speed is critical.

For related image extraction workflows, refer to the Python SDK guides.

Download this ready-to-use sample package to explore the Vision API capabilities with preconfigured OCR settings.