Nutrient Python SDK
Need pricing or implementation help? Talk to Sales.
OCR EXTRACTION
from nutrient_sdk import Document, Vision, VisionEngine
with Document.open("scan.png") as document: document.settings.vision_settings.engine = VisionEngine.OCR
vision = Vision.set(document) content_json = vision.extract_content() # Text + word-level bounding boxes in JSONOCR for speed. ICR for offline AI-powered document understanding. VLM-enhanced ICR for maximum accuracy with Claude, OpenAI, or local models.
ICR runs entirely on your infrastructure with no external API calls. Process sensitive documents without data leaving your servers. HIPAA and GDPR ready.
Get precise pixel coordinates for every extracted word. Enables document reconstruction, search indexing, and overlay positioning.
Go beyond character recognition. Detect tables with cell boundaries, mathematical equations, key-value regions, and hierarchical reading order.
Fast text extraction optimized for high-throughput document processing.
AI-powered document understanding that runs 100 percent offline on your infrastructure.
Combine local AI with vision language models for maximum accuracy on complex documents.
Generate natural language descriptions of images and documents using Anthropic Claude.
Cloud-scalable image descriptions with enterprise SLA guarantees.
Run vision language models locally with Ollama, LM Studio, or vLLM for complete data privacy.
Choose the right processing engine based on your accuracy, privacy, and performance requirements. Switch between engines with a single configuration change.
JSON Structured elements Bounding boxes Confidence scores BEYOND TRADITIONAL OCR
Traditional OCR extracts characters. Nutrient Vision API understands document layout, detects tables with cell boundaries, recognizes mathematical equations, and classifies semantic elements — all from a single API call inside your Python application.
Automatically detect tables and extract individual cell contents with row and column structure, even in documents with complex or irregular layouts.
Detect and extract mathematical equations with LaTeX representations. Process scientific papers, textbooks, and technical documentation.
Identify and extract form-like key-value pairs from invoices, receipts, and structured documents without predefined templates.
Analyze multicolumn layouts and determine the correct reading sequence. Produce structured output that preserves the logical flow of a document.
OCR is the fastest engine, optimized for high-throughput text extraction with word-level bounding boxes. It focuses on character recognition without analyzing document structure. ICR (intelligent content recognition) is an AI-powered engine that runs entirely on your infrastructure. It understands document layout, detects tables with cell structures, recognizes equations, identifies key-value regions, and determines reading order — all without external API calls. VLM-enhanced ICR combines the local ICR engine with a vision language model (Claude, OpenAI, or a local model) for the highest accuracy on complex documents, with improved table boundaries and confidence scores.
Yes. Both the OCR and ICR engines run 100 percent on your infrastructure with no external API calls. Your documents never leave your servers, which makes our engines suitable for air-gapped environments, HIPAA-compliant medical record processing, GDPR workflows, and any scenario where data sovereignty is required. The VLM-enhanced engine can also run fully on-premises when paired with a local model server like Ollama, LM Studio, or vLLM.
Accuracy depends on document quality and complexity. OCR delivers fast, reliable character recognition on clean scans and is ideal for simple text extraction and search indexing. ICR adds structural understanding and achieves significantly better results on documents with tables, equations, and mixed layouts. VLM-enhanced ICR provides the highest accuracy, particularly on complex multicolumn layouts, financial documents, and pages with overlapping visual elements. Each engine returns confidence scores so you can assess extraction quality programmatically.
pytesseract and EasyOCR are popular open source Python OCR libraries focused on character recognition. Nutrient Vision API goes significantly further. Beyond text extraction, it offers AI-powered document understanding with table detection, equation recognition, key-value extraction, and reading order analysis. The ICR engine provides these capabilities entirely offline, while VLM-enhanced ICR adds vision language models for complex documents. Unlike pytesseract, Vision API returns structured JSON output with element classification and confidence scores, reducing the post-processing code you need to write.
Vision API processes PDFs and common image formats, including PNG, JPEG, GIF, BMP, and TIFF. For PDFs, it handles both native (digitally created) and scanned documents. The API automatically renders PDF pages to images for processing, so you don’t need to handle conversion separately. All three engines work with the same input formats, making it easy to switch engines without changing your document pipeline.
Yes. The ICR and VLM-enhanced ICR engines detect tables automatically, extracting cell contents with row and column structure. They also identify key-value regions (like form fields on invoices), mathematical equations, headings, paragraphs, and figures. The output is structured JSON with element classification, bounding boxes, and reading order. This means you can extract a table from a scanned invoice and get structured data with cell-level precision without writing custom parsing logic.
This choice applies to VLM-enhanced ICR and image description features. Claude offers strong reasoning and nuanced contextual understanding. OpenAI provides cloud scalability and enterprise SLA guarantees. Local VLMs (via Ollama, LM Studio, or vLLM) give you zero per-image API costs and complete data privacy. Choose based on your priorities: maximum accuracy (Claude or OpenAI), cost efficiency at scale (local VLMs), or data sovereignty (local VLMs). You can switch providers with a single configuration change.
Yes. The ICR and VLM-enhanced ICR engines include handwriting detection as a dedicated vision feature. The engines can identify regions containing handwritten content and extract text from them. Recognition accuracy depends on writing clarity and quality. For best results on handwriting-heavy documents, use the VLM-enhanced engine, which leverages vision language models to better interpret handwritten content in context.
Word-level bounding boxes provide the exact pixel coordinates (position and dimensions) of every extracted word in a document. This enables precise text positioning for document reconstruction, search highlighting, text overlay on scanned images, and coordinate-based data extraction. All three engines return bounding box data, making it straightforward to map extracted text back to its physical location on the page.
Install Nutrient Python SDK with pip. The getting started guide walks you through installation and basic setup. For OCR, open a document, create a vision instance, and call the extraction method with your chosen engine. The SDK handles all preprocessing, model loading, and output formatting. Refer to the extraction guides for step-by-step examples covering OCR, ICR, VLM-enhanced processing, and image description with each supported provider.