PDF OCR server

Document Engine includes custom-built optical character recognition (OCR) technology to accurately recognize text and patterns, as well as generate searchable PDF/A files.

Looking for more advanced OCR capabilities?: Nutrient .NET SDK OCR offers additional powerful features, such as zonal OCR, key-value extraction, image preprocessing, searchable PDF/A generation with layout retention, orientation detection, confidence scoring, and more. It’s available as a separate SDK and can be used in conjunction with Document Engine.

Read more →

Comparing OCR SDKs — Nutrient vs. Apryse

FeatureDocument Engine (Server) OCRNutrient .NET SDK OCRApryse OCR
Multi-language support30+ built-in languages30+ built-in languagesSix built-in languages with OCR module binary and 10 with IRIS OCR module
Searchable PDF creation
OCR with exact bounding box coordinates
Zone-based OCR/custom OCR regions
Key-value/table extraction✅ (available through the Data Extraction API)
Orientation detection
Image preprocessing (deskew, etc.)✅ (manual)
Performance and speed✅ Fast✅ FastDepends on SDK setup (OCR module/IRIS module)
API accessThree-step API call once initial setup is doneRequires SDK setupRequires SDK setup

Key capabilities

Highly accurate

Completely custom-built AI- and ML-powered OCR engine

Language support

Includes English, French, German, and Spanish

Searchable PDF

Turn scans, images, and documents into searchable PDF or PDF/A documents

Extract data

Extract key-value pairs from unstructured documents

Post-processing

Add signatures, annotations, document assembly, and more

Display PDFs

Open PDFs in integrated web or mobile PDF viewers

Extendable

Add forms, signing, annotations, and more


Real-world use cases

  • Invoice OCR — Convert scanned invoices into searchable PDFs, or extract totals and vendor info using OCR.
  • Contract digitization — Turn scanned contracts into searchable, selectable PDFs for legal archiving.
  • Form processing — Use OCR to extract fields like names, dates, and signatures from scanned forms.
  • Multi-language document digitization — OCR documents in multiple languages with full Unicode support.

Which OCR SDK should I use?

NeedSDK to use
Basic OCR from PDFs/imagesDocument Engine OCR
Production-ready OCR solution without SDK setupDocument Engine OCR
OCR with form data, zones, orientation detectionNutrient .NET SDK OCR
Batch processing of scanned documentsEither, depending on volume
Need to preserve layout, tablesPrefer Nutrient .NET SDK OCR

Guides for OCR

Start your free trial for unlimited access and expert support.

Powering industry leaders