Extracting data from images using ICR
Extracting structured data from images using local AI models enables teams to build offline document processing systems, implement privacy-compliant data extraction workflows, and create cost-predictable content analysis pipelines. Whether you’re processing documents in air-gapped government environments requiring complete network isolation, building HIPAA-compliant healthcare systems where sensitive data cannot leave infrastructure boundaries, implementing financial document processing for SOC 2 compliance without third-party API dependencies, or creating high-volume document analysis systems where per-request API costs become prohibitive, intelligent content recognition (ICR) provides local AI-powered document understanding without external API dependencies. The ICR engine analyzes document layout and detects semantic elements, including tables with cell structures, mathematical equations with LaTeX representations, key-value regions for form data extraction, and hierarchical document structures — all processed entirely offline using locally deployed models.
How Nutrient helps you achieve this
Nutrient Java SDK handles local AI model loading, document layout analysis, and structured JSON generation. With the SDK, you don’t need to worry about:
- Deploying and managing local AI models for document layout detection
- Implementing table detection algorithms and cell boundary extraction
- Handling semantic element classification and hierarchical structure parsing
- Complex bounding box calculation and reading order determination
Instead, Nutrient provides an API that handles all the complexity behind the scenes, letting you focus on your business logic.
Prerequisites
Before following this guide, ensure you have:
- Java 8 or higher installed
- Nutrient Java SDK added to your project (Maven, Gradle, or manual JAR)
- An image file to process (PNG, JPEG, or other supported formats)
- Basic familiarity with Java try-with-resources statements
For initial SDK setup and dependency configuration, refer to the getting started guide.
Complete implementation
Below is a complete working example that demonstrates extracting structured data from images using local ICR models. The vision API processes images and returns JSON-formatted structural data that can be used for further processing, storage, or analysis. The following lines set up the Java application. Start by specifying a package name and create a new class:
package io.nutrient.Sample;Import the required classes from the SDK:
import io.nutrient.sdk.Document;import io.nutrient.sdk.Vision;import io.nutrient.sdk.enums.VisionEngine;import io.nutrient.sdk.exceptions.NutrientException;
import java.io.FileWriter;import java.io.IOException;
public class ExtractDataFromImageIcr {Create the main function that can throw exceptions:
public static void main(String[] args) throws NutrientException, IOException {Configuring ICR mode
Open the image file and configure the vision API to use the ICR engine for local-only processing. The following code uses a try-with-resources statement to open the document with automatic resource cleanup. The getSettings().getVisionSettings().setEngine() method call assigns the VisionEngine.Icr enumeration value to explicitly configure local AI model processing. ICR is the default engine, so this method call is optional but shown here for clarity. Unlike vision language model (VLM) engines that send data to external APIs, the ICR engine loads local AI models into memory and performs all document analysis on the local machine without network requests. This configuration pattern is commonly used in air-gapped environments where external API access is prohibited, or when processing sensitive documents requiring complete data sovereignty:
ICR is the default engine, so this method call is optional but shown here for illustration purposes.
try (Document document = Document.open("input_ocr_multiple_languages.png")) { // Configure ICR engine for local processing (this is the default) document.getSettings().getVisionSettings().setEngine(VisionEngine.Icr);Creating a vision instance and extracting content
Create a vision instance and extract the structured content to a JSON string. The following code uses the Vision.set() method to create a vision instance bound to the opened document, enabling content extraction operations. The extractContent() method invokes the local ICR engine, which loads AI models, analyzes the document layout to detect text blocks and semantic elements, determines reading order for proper content flow, classifies elements by type (paragraph, table, heading, equation), and generates a JSON-formatted string containing the complete document structure with bounding boxes in pixel coordinates. The extraction process runs entirely on the local machine without external API calls, making it suitable for offline environments and sensitive data processing:
Vision vision = Vision.set(document); String contentJson = vision.extractContent();Write the extracted content to a JSON file for downstream processing, storage, or analysis. The following code uses a try-with-resources statement with FileWriter to automatically close the file after writing. The contentJson string contains the complete document structure in JSON format, enabling integration with data processing pipelines, database storage systems, or custom analysis tools:
try (FileWriter writer = new FileWriter("output.json")) { writer.write(contentJson); } } }}Understanding the output
The extractContent() method returns a JSON structure representing the complete document layout and semantic understanding. With ICR mode, the local AI models generate structured output, including:
- Document elements — Text blocks, paragraphs, headings, tables with cell structures, figures with captions, and mathematical equations with LaTeX representations
- Bounding boxes — Position coordinates and dimensions of each element in pixel units, enabling precise element location tracking
- Reading order — Elements sorted by their natural reading sequence (left-to-right, top-to-bottom in Western documents) for proper content flow reconstruction
- Element classification — Type identification for each detected region using semantic labels (paragraph, table, heading, list, equation, figure)
- Hierarchical structure — Nested elements reflecting document organization with parent-child relationships between sections, subsections, and content blocks
The JSON format enables integration with downstream processing pipelines, including data extraction workflows, database storage systems with structured schemas, search indexing systems for content retrieval, and custom analysis tools for document understanding and classification.
Error handling
The vision API throws VisionException if content extraction fails due to image processing errors or model loading failures. Exception handling ensures robust error recovery in production environments.
Common failure scenarios include:
- The image file can’t be read due to file system permissions or path errors
- Image data is corrupted or truncated, preventing decoding
- Required ICR models aren’t installed or accessible in the expected directory
- Insufficient system memory for loading large AI models (ICR models typically require several GB of RAM)
- Unsupported image format or encoding scheme
In production code, wrap the extraction operations in a try-catch block to catch VisionException instances, providing appropriate error messages to users and logging failure details for debugging. This error handling pattern enables graceful degradation when content extraction fails, preventing application crashes and enabling retry logic or fallback processing strategies.
Conclusion
The ICR-based content extraction workflow consists of several key operations:
- Open the image document using try-with-resources for automatic resource cleanup.
- Configure the vision settings with
setEngine()to assignVisionEngine.Icrfor local AI processing. - ICR is the default engine, making this configuration optional but useful for explicit control.
- Create a vision instance with
Vision.set()to bind content extraction operations to the document. - Call
extractContent()to invoke local AI models for document layout analysis. - The ICR engine loads AI models, detects semantic elements (tables, equations, headings), and determines reading order.
- The method returns a JSON-formatted string containing complete document structure with bounding boxes in pixel coordinates.
- All processing occurs locally without external API calls, ensuring data privacy and offline capability.
- Write the JSON content to a file using try-with-resources with
FileWriterfor automatic resource cleanup. - Handle
VisionExceptionerrors for robust error recovery in production environments. - The JSON output enables integration with downstream pipelines, including data extraction, database storage, and search indexing.
- ICR mode is ideal for air-gapped environments, sensitive document processing, and cost-predictable high-volume workflows.
Nutrient handles local AI model deployment, document layout detection algorithms, table cell boundary extraction, semantic element classification, hierarchical structure parsing, bounding box calculation, reading order determination, and JSON schema generation so you don’t need to implement computer vision algorithms or manage AI model loading manually. The ICR system provides offline document understanding for air-gapped government environments, privacy-compliant healthcare systems, financial document processing with data sovereignty requirements, and high-volume content analysis without per-request API costs.
Download this ready-to-use sample package to explore the vision API capabilities with preconfigured ICR settings.