Nutrient Java SDK
Need pricing or implementation help? Talk to Sales.
ICR EXTRACTION
import io.nutrient.sdk.Document;import io.nutrient.sdk.Vision;import io.nutrient.sdk.enums.VisionEngine;
try (Document document = Document.open("scan.png")) { // Local AI — no network calls document.getSettings().getVisionSettings() .setEngine(VisionEngine.Icr);
Vision vision = Vision.set(document); String json = vision.extractContent(); // Structured JSON: text, tables, key-value // pairs, equations, bounding boxes}Detect tables automatically and extract individual cell contents with row and column structure. Handles complex layouts with merged cells, irregular borders, and nested tables.
Identify form-like key-value regions in invoices, receipts, and structured documents. No predefined templates required — the AI adapts to any document layout.
ICR runs entirely on your infrastructure with zero external API calls. Sensitive documents never leave your servers. Meets HIPAA, GDPR, and air-gapped requirements.
Every extraction returns classified elements with bounding boxes, confidence scores, and hierarchical reading order. Parse results programmatically without post-processing.
High-speed text extraction with word-level bounding boxes for search indexing and digitization.
Extract tables, equations, and key-value regions with offline AI that understands document layout.
Maximize accuracy on complex documents by combining local AI with vision language models.
Generate natural language descriptions and WCAG-compliant alt text from document images.
Cloud-scalable image understanding with enterprise SLA guarantees.
Run vision language models on your own infrastructure for complete data privacy.
Vision API detects and classifies document elements automatically. Each element comes with bounding boxes, confidence scores, and its position in the document reading order.
OCR ICR VLM-enhanced ICR JSON Bounding boxes Confidence scores Element types INTELLIGENT DOCUMENT PROCESSING
Traditional extraction tools parse text line by line. Nutrient Vision API sees the full document layout — tables, columns, headers, reading order — and returns classified, structured data ready for your Java application to consume.
Detect tables with or without visible borders. Extract cell contents with row and column positions, even from scanned documents with irregular layouts.
Identify form-like key-value pairs without predefined templates. The AI learns document structure on the fly, adapting to invoices, receipts, and forms.
Detect and extract equations with LaTeX representations from scientific papers, textbooks, and technical documentation.
Every extracted element includes a confidence score. Set quality thresholds to filter results and flag documents that need manual review.
The ICR and VLM-enhanced ICR engines analyze document layout using AI to detect table regions, identify row and column boundaries, and extract individual cell contents. This works on both native PDFs and scanned documents, including tables without visible borders. The output is structured JSON with each cell mapped to its row and column position, making it straightforward to convert to CSV, database records, or any structured format your application needs.
No. Vision API analyzes document structure on the fly using AI. It detects tables, key-value pairs, headings, and other elements without predefined templates or training data. This means it works on invoices from any vendor, receipts in any format, and forms with any layout. The AI adapts to each document individually, eliminating the setup and maintenance cost of template-based extraction systems.
Yes. All three engines process both native PDFs and scanned documents. The OCR engine extracts text with bounding boxes from scanned images. The ICR engine goes further, applying AI to understand document structure and extract tables, key-value pairs, and semantic elements from scans. The VLM-enhanced engine adds vision language model processing for maximum accuracy on challenging scanned documents with poor quality or complex layouts.
Vision API returns structured JSON containing classified document elements. Each element includes its type (table, paragraph, heading, equation, key-value region, figure), text content, bounding box coordinates, and confidence score. Tables include cell-level data with row and column positions. The JSON also includes the document reading order, so you can reconstruct the logical flow of content. You can save output directly to a JSON file or process it in memory.
OCR provides fast, reliable text extraction and is best for simple documents where you need raw text and positions. ICR adds AI-powered structural analysis and significantly improves results on documents with tables, mixed layouts, and form-like content — all while running entirely offline. VLM-enhanced ICR delivers the highest accuracy on complex financial documents, legal contracts, and irregular table layouts by combining local AI with a vision language model.
Yes. The OCR and ICR engines run 100 percent on your infrastructure with no external API calls, and documents never leave your servers. This meets requirements for HIPAA, GDPR, SOC 2, and air-gapped environments. For VLM-enhanced extraction, you can also stay fully on-premises by connecting to a local model server like Ollama, LM Studio, or vLLM instead of cloud providers.
Tabula is an open source library that extracts tables from native PDFs by parsing the underlying text positions. It works well on digitally created PDFs with clear table structures, but it cannot process scanned documents and struggles with tables that lack visible borders. Nutrient Vision API uses AI to detect tables regardless of borders, works on both scanned and native documents, and also extracts key-value pairs, equations, and document structure — returning everything as classified JSON with confidence scores.
Vision API processes PDFs (both native and scanned) and common image formats, including PNG, JPEG, GIF, BMP, and TIFF. PDF pages are automatically rendered for processing, so you don’t need to handle conversion. All three engines accept the same input formats, making it easy to switch between them based on your accuracy and performance needs.
Yes. The ICR and VLM-enhanced engines are well suited for invoice and financial document processing. They detect tables with line items and totals, identify key-value pairs like vendor names, dates, and amounts, and understand the hierarchical structure of a document. Because the extraction is template-free, it works across different invoice formats and vendors without per-vendor configuration. Use VLM-enhanced mode for the highest accuracy on complex financial layouts.
Add the Nutrient Java SDK dependency to your project and follow the getting started guide for installation. For data extraction, open a document, create a vision instance, select your engine (OCR, ICR, or VLM-enhanced), and call the extraction method. The SDK returns structured JSON that you can parse and process immediately. The extraction guides include step-by-step examples for each engine and use case.