AI-powered OCR for .NET

Transform scanned documents into searchable, accessible PDFs

Enterprise OCR engine that converts images and scanned PDFs to searchable text in seconds. Support for 100+ languages, automatic document preprocessing, and intelligent text extraction. Built for high-volume processing on Windows and Linux servers.

Get Started View Documentation

Contact Sales

TELL US MORE ABOUT YOUR PROJECT (OPTIONAL)

FIRST NAME

LAST NAME

COMPANY EMAIL

PHONE NUMBER (OPTIONAL)

COUNTRY

Where is your company headquartered?

By submitting this form, you agree to Nutrient’s Privacy Policy and Terms of Service.

Enterprise OCR built for .NET developers

AI-powered accuracy

Machine learning OCR engine trained on millions of documents delivers production-grade text recognition across 100+ languages.

Automatic preprocessing

Built-in deskew, denoising, line removal, and inversion correction. Process imperfect scans without manual cleanup.

Zonal text extraction

Extract text from specific document regions. Perfect for structured data capture from invoices, forms, and identity documents.

High-volume processing

Multithreaded OCR engine scales to thousands of pages per hour. Optimized for batch processing and server-side automation.

Complete OCR toolkit

PDF to searchable PDF

Convert scanned PDFs to searchable documents with embedded text layers.

VIEW GUIDE

Page-by-page OCR with progress tracking
Multi-language support in single documents
Preserve original page layout and formatting

Image to searchable PDF

Transform scanned images into searchable PDFs with automatic text recognition.

VIEW GUIDE

Support for PNG, JPEG, TIFF, and multipage TIFF
Configurable accuracy vs. speed tradeoffs
Character filtering and validation

Zonal text extraction

Extract text from specific regions for structured data capture.

VIEW GUIDE

Define regions by pixel coordinates
Extract multiple fields from single document
Perfect for forms, invoices, and ID cards

Document classification

Automatically sort and route documents based on barcode recognition.

VIEW GUIDE

1D and 2D barcode recognition
Classify by barcode type and value
Automate document routing workflows

PDF/A archiving

Create searchable PDF/A documents for long-term digital preservation.

VIEW GUIDE

PDF/A-4f conformance for archival standards
Searchable text with compliance guarantees
Self-contained files with embedded fonts

Document preprocessing

Automatic image enhancement for optimal OCR accuracy.

VIEW GUIDE

Deskew, denoise, and contrast enhancement
Line and punch hole removal
Character edge refinement for clarity

Global language support

Built-in support for 11 languages out of the box. Expand to 100+ languages with Tesseract language packs for worldwide document processing capabilities.

LANGUAGE DOCUMENTATION

Built-in languages

English German French Spanish Italian Portuguese

Also included

Arabic Hebrew Dutch Vietnamese Flemish

Expandable support

100+ languages Tesseract packs Asian scripts RTL languages

INTELLIGENT DOCUMENT PROCESSING

AI-powered extraction beyond OCR

Combine LLMs with machine learning for intelligent document classification and structured data extraction. Process invoices, resumes, contracts, and forms with natural language instructions.

EXPLORE AI PROCESSING

Automatic classification

Classify invoices, contracts, receipts, and resumes without manual labeling or training data.

Structured data extraction

Extract fields using natural language instructions. Built-in templates for common document types.

Smart validation

11 built-in validators for IBAN, credit cards, emails, phone numbers, VAT IDs, and addresses.

100+ file formats

Process PDFs, Office documents, images, emails, and CAD files with a unified API.

Frequently asked questions

What OCR accuracy can I expect from the SDK?

OCR accuracy depends on source document quality, but the SDK typically achieves 98–99 percent character accuracy on clean scans at 300 DPI. The AI-powered engine includes automatic preprocessing that corrects common scanning issues like skew, noise, and poor contrast. For optimal results, scan documents at 300 DPI in black and white or grayscale. The SDK provides accuracy vs. speed tradeoff modes, and you can improve recognition by specifying expected languages or character sets.

How many languages can the OCR engine recognize?

The SDK includes 11 languages out of the box: English, German, French, Spanish, Italian, Portuguese, Arabic, Hebrew, Dutch, Vietnamese, and Flemish. You can expand to 100+ additional languages by downloading free Tesseract language packs from GitHub. The engine supports multiple languages in a single document, making it ideal for international documents. Both left-to-right and right-to-left scripts are fully supported, including Asian character sets.

Can I extract text from specific regions of a document?

Yes. Zonal OCR allows you to define specific regions by pixel coordinates and extract text only from those areas. This is perfect for structured documents where you know field locations. For example, you can extract invoice numbers from the top-right corner, amounts from specific table cells, or dates from header regions. You can define multiple zones per document, specify allowed character sets for each zone, and optimize recognition context for single-line fields vs. multiline blocks.

How does automatic preprocessing work?

The SDK includes intelligent preprocessing that automatically detects and corrects common scanning problems. It deskews rotated documents up to 15 degrees, detects and corrects inverted or negative images, removes salt-and-pepper noise and speckles, eliminates horizontal and vertical lines, removes punch holes from binder edges, and enhances character edges for faint or over-inked text. You can apply these corrections automatically or selectively based on document type and quality.

What’s the difference between regular OCR and zonal OCR?

Regular OCR processes an entire document and extracts all visible text, creating a searchable PDF or text output. Zonal OCR focuses on specific rectangular regions you define, extracting only the text within those boundaries. Zonal OCR is faster when you need specific fields, provides better accuracy for structured data because you can specify expected character types, and enables data extraction without processing the entire page. Use zonal OCR for forms, invoices, and documents with known field positions.

Can the OCR engine process handwritten text?

The SDK is optimized for printed text and achieves best results with typed documents. Handwriting recognition has limited support and depends heavily on writing clarity. For printed text, even in various fonts and sizes, accuracy is excellent. For handwritten documents, consider specialized handwriting recognition solutions or train custom models. The SDK excels at printed invoices, forms, contracts, identity documents, and other machine-printed content.

How do I handle poor quality scans with low DPI?

The SDK includes preprocessing tools specifically for low-quality scans. First, use deskew to correct rotation issues common in poor scans. Apply noise removal to eliminate speckles and artifacts. Use character enhancement to sharpen faint or thick text. For very low DPI scans, consider upscaling the image before OCR processing. However, OCR accuracy drops significantly below 200 DPI. When possible, rescan documents at 300 DPI for optimal results. The SDK can still extract usable text from challenging scans, but expect lower accuracy.

Can I create searchable PDFs that comply with PDF/A archival standards?

Yes. The SDK supports creating searchable PDFs with PDF/A-4f conformance for long-term digital archiving. These PDFs meet regulatory requirements for document preservation, embed all fonts to ensure consistent rendering, remain searchable while maintaining visual fidelity, and comply with archival standards for legal and financial documents. This is essential for industries with retention requirements, like healthcare, finance, and legal services. The OCR output becomes part of the archival package.

What file formats can I process with OCR?

The SDK processes scanned PDFs, single-page images (JPEG, PNG, TIFF, BMP), and multipage TIFF files commonly used for document scanning. You can convert any of these to searchable PDFs with embedded text layers. The OCR engine works on black and white, grayscale, and color images, though black and white or grayscale at 300 DPI provide the best accuracy. For batch processing, the SDK efficiently handles thousands of pages with multithreaded operation.

How fast is the OCR processing for large document batches?

Processing speed depends on document complexity, image resolution, selected accuracy mode, and number of languages. On modern server hardware, expect 1–3 seconds per page at 300 DPI with default accuracy settings. The SDK includes built-in multithreading that processes multiple pages in parallel, significantly improving batch performance. For high-volume scenarios, you can optimize by choosing speed-optimized mode, processing in parallel across multiple cores, skipping preprocessing for clean scans, and distributing work across multiple servers.

Can I filter which characters the OCR engine recognizes?

Yes. The SDK provides character allowlists and denylists to constrain recognition. For example, when extracting phone numbers, specify an allowlist of digits, parentheses, and hyphens to improve accuracy. When processing invoices, exclude special characters that shouldn’t appear in vendor names. This character filtering reduces false positives and increases accuracy for structured data extraction, which is especially useful with zonal OCR.

What support is available for OCR implementation?

Our comprehensive documentation includes getting started guides for OCR implementation, preprocessing techniques for different document types, zonal OCR examples with coordinate definitions, and language configuration and character filtering. Support options range from standard email and forum support to premium packages with dedicated engineers and implementation consultation. The SDK includes code examples for common scenarios like invoice processing and identity document verification. During your trial, full documentation and support access ensures successful OCR integration.