PDF OCR server

Document Engine includes custom-built optical character recognition (OCR) technology to accurately recognize text and patterns, as well as generate searchable PDF/A files.

Looking for more advanced OCR capabilities?: Nutrient .NET SDK OCR offers additional powerful features, such as zonal OCR, key-value extraction, image preprocessing, searchable PDF/A generation with layout retention, orientation detection, confidence scoring, and more. It’s available as a separate SDK and can be used in conjunction with Document Engine. Read more

Comparing OCR SDKs — Nutrient vs. Apryse

Feature Document Engine (Server) OCR Nutrient .NET SDK OCR Apryse OCR
Multi-language support 30+ built-in languages 30+ built-in languages Six built-in languages with OCR module binary and 10 with IRIS OCR module
Searchable PDF creation
OCR with exact bounding box coordinates
Zone-based OCR/custom OCR regions
Key-value/table extraction ✅ (available through the Data Extraction API)
Orientation detection
Image preprocessing (deskew, etc.) ✅ (manual)
Performance and speed ✅ Fast ✅ Fast Depends on SDK setup (OCR module/IRIS module)
API access Three-step API call once initial setup is done Requires SDK setup Requires SDK setup

Key capabilities

Highly accurate

Completely custom-built AI- and ML-powered OCR engine

Language support

Includes English, French, German, and Spanish

Searchable PDF

Turn scans, images, and documents into searchable PDF or PDF/A documents

Extract data

Extract key-value pairs from unstructured documents

Post-processing

Add signatures, annotations, document assembly, and more

Display PDFs

Open PDFs in integrated web or mobile PDF viewers

Extendable

Add forms, signing, annotations, and more

Real-world use cases

  • Invoice OCR — Convert scanned invoices into searchable PDFs, or extract totals and vendor info using OCR.
  • Contract digitization — Turn scanned contracts into searchable, selectable PDFs for legal archiving.
  • Form processing — Use OCR to extract fields like names, dates, and signatures from scanned forms.
  • Multi-language document digitization — OCR documents in multiple languages with full Unicode support.

Which OCR SDK should I use?

Need SDK to use
Basic OCR from PDFs/images Document Engine OCR
Production-ready OCR solution without SDK setup Document Engine OCR
OCR with form data, zones, orientation detection Nutrient .NET SDK OCR
Batch processing of scanned documents Either, depending on volume
Need to preserve layout, tables Prefer Nutrient .NET SDK OCR

Free trial

Start your free trial for unlimited access and expert support.