Extracting structured data from images using local AI models enables teams to build offline document processing systems, implement privacy-compliant data extraction workflows, and create cost-predictable content analysis pipelines. Whether you’re processing documents in air-gapped government environments requiring complete network isolation, building HIPAA-compliant healthcare systems where sensitive data cannot leave infrastructure boundaries, implementing financial document processing for SOC 2 compliance without third-party API dependencies, or creating high-volume document analysis systems where per-request API costs become prohibitive, intelligent content recognition (ICR) provides local AI-powered document understanding without external API dependencies. The ICR engine analyzes document layout and detects semantic elements, including tables with cell structures, mathematical equations with LaTeX representations, key-value regions for form data extraction, and hierarchical document structures — all processed entirely offline using locally deployed models.

How Nutrient helps you achieve this

Nutrient Python SDK handles local AI model loading, document layout analysis, and structured JSON generation. With the SDK, you don’t need to worry about:

  • Deploying and managing local AI models for document layout detection
  • Implementing table detection algorithms and cell boundary extraction
  • Handling semantic element classification and hierarchical structure parsing
  • Complex bounding box calculation and reading order determination

Instead, Nutrient provides an API that handles all the complexity behind the scenes, letting you focus on your business logic.

Prerequisites

Before following this guide, ensure you have:

  • Python 3.8 or higher installed
  • Nutrient Python SDK installed (pip install nutrient-sdk)
  • An image file to process (PNG, JPEG, or other supported formats)
  • Basic familiarity with Python context manager(opens in a new tab) and the with statement

For initial SDK setup and configuration, refer to the getting started guide.

Complete implementation

Below is a complete working example that demonstrates extracting structured data from images using local ICR models. The vision API processes images and returns JSON-formatted structural data that can be used for further processing, storage, or analysis. The following lines set up the Python application. The import statements bring in all necessary classes from the Nutrient SDK:

from nutrient_sdk import Document, Vision, VisionEngine

Configuring ICR mode

Open the image file and configure the vision API to use the ICR engine for local-only processing. The following code uses a context manager(opens in a new tab) to open the document with automatic resource cleanup. The vision_settings.engine property is assigned the VisionEngine.ICR enumeration value to explicitly configure local AI model processing. ICR is the default engine, so this property assignment is optional but shown here for clarity. Unlike vision language model (VLM) engines that send data to external APIs, the ICR engine loads local AI models into memory and performs all document analysis on the local machine without network requests. This configuration pattern is commonly used in air-gapped environments where external API access is prohibited, or when processing sensitive documents requiring complete data sovereignty:

ICR is the default engine, so this property assignment is optional but shown here for illustration purposes.

with Document.open("input_ocr_multiple_languages.png") as document:
# Configure ICR engine for local processing (this is the default)
document.settings.vision_settings.engine = VisionEngine.ICR

Creating a vision instance and extracting content

Create a vision instance and extract the structured content to a JSON string. The following code uses the Vision.set() method to create a vision instance bound to the opened document, enabling content extraction operations. The extract_content() method invokes the local ICR engine, which loads AI models, analyzes the document layout to detect text blocks and semantic elements, determines reading order for proper content flow, classifies elements by type (paragraph, table, heading, equation), and generates a JSON-formatted string containing the complete document structure with bounding boxes in pixel coordinates. The extraction process runs entirely on the local machine without external API calls, making it suitable for offline environments and sensitive data processing:

vision = Vision.set(document)
content_json = vision.extract_content()

Write the extracted content to a JSON file for downstream processing, storage, or analysis. The following code uses Python’s built-in file handling with a context manager(opens in a new tab) to automatically close the file after writing. The content_json string contains the complete document structure in JSON format, enabling integration with data processing pipelines, database storage systems, or custom analysis tools:

with open("output.json", "w") as f:
f.write(content_json)

Understanding the output

The extract_content() method returns a JSON structure representing the complete document layout and semantic understanding. With ICR mode, the local AI models generate structured output, including:

  • Document elements — Text blocks, paragraphs, headings, tables with cell structures, figures with captions, and mathematical equations with LaTeX representations
  • Bounding boxes — Position coordinates and dimensions of each element in pixel units, enabling precise element location tracking
  • Reading order — Elements sorted by their natural reading sequence (left-to-right, top-to-bottom in Western documents) for proper content flow reconstruction
  • Element classification — Type identification for each detected region using semantic labels (paragraph, table, heading, list, equation, figure)
  • Hierarchical structure — Nested elements reflecting document organization with parent-child relationships between sections, subsections, and content blocks

The JSON format enables integration with downstream processing pipelines, including data extraction workflows, database storage systems with structured schemas, search indexing systems for content retrieval, and custom analysis tools for document understanding and classification.

Error handling

The vision API raises VisionException if content extraction fails due to image processing errors or model loading failures. Exception handling ensures robust error recovery in production environments.

Common failure scenarios include:

  • The image file can’t be read due to file system permissions or path errors
  • Image data is corrupted or truncated, preventing decoding
  • Required ICR models aren’t installed or accessible in the expected directory
  • Insufficient system memory for loading large AI models (ICR models typically require several GB of RAM)
  • Unsupported image format or encoding scheme

In production code, wrap the extraction operations in a try-except block to catch VisionException instances, providing appropriate error messages to users and logging failure details for debugging. This error handling pattern enables graceful degradation when content extraction fails, preventing application crashes and enabling retry logic or fallback processing strategies.

Conclusion

The ICR-based content extraction workflow consists of several key operations:

  1. Open the image document using a context manager(opens in a new tab) for automatic resource cleanup.
  2. Configure the vision settings with the engine property assigned to VisionEngine.ICR for local AI processing.
  3. ICR is the default engine, making this configuration optional but useful for explicit control.
  4. Create a vision instance with Vision.set() to bind content extraction operations to the document.
  5. Call extract_content() to invoke local AI models for document layout analysis.
  6. The ICR engine loads AI models, detects semantic elements (tables, equations, headings), and determines reading order.
  7. The method returns a JSON-formatted string containing complete document structure with bounding boxes in pixel coordinates.
  8. All processing occurs locally without external API calls, ensuring data privacy and offline capability.
  9. Write the JSON content to a file using Python’s built-in file handling with context manager(opens in a new tab) syntax.
  10. Handle VisionException errors for robust error recovery in production environments.
  11. The JSON output enables integration with downstream pipelines, including data extraction, database storage, and search indexing.
  12. ICR mode is ideal for air-gapped environments, sensitive document processing, and cost-predictable high-volume workflows.

Nutrient handles local AI model deployment, document layout detection algorithms, table cell boundary extraction, semantic element classification, hierarchical structure parsing, bounding box calculation, reading order determination, and JSON schema generation so you don’t need to implement computer vision algorithms or manage AI model loading manually. The ICR system provides offline document understanding for air-gapped government environments, privacy-compliant healthcare systems, financial document processing with data sovereignty requirements, and high-volume content analysis without per-request API costs.

Download this ready-to-use sample package to explore the vision API capabilities with preconfigured ICR settings.