PDF OCR server

Document Engine includes optical character recognition (OCR) capabilities to accurately recognize text and patterns, as well as generate searchable PDF/A files. As of Document Engine 1.16.0, GdPicture OCR is the recommended engine and is used by default when available, while Core OCR is deprecated and will be removed in a future release. The x-pspdfkit-ocr-engine request header and the OCR_ENGINE=core configuration value are also deprecated, and no replacement OCR engine selector will be added.

Looking for more advanced OCR capabilities? Nutrient .NET SDK OCR offers additional powerful features, such as zonal OCR, key-value extraction, image preprocessing, searchable PDF/A generation with layout retention, orientation detection, confidence scoring, and more. It’s available as a separate SDK and can be used in conjunction with Document Engine.

Feature	Document Engine OCR	Nutrient .NET SDK OCR	Apryse OCR
Multi-language support	120+ built-in languages	30+ built-in languages	Six built-in languages with OCR module binary and 10 with IRIS OCR module
Searchable PDF creation	✅	✅	✅
OCR with exact bounding box coordinates	❌	✅	✅
Zone-based OCR/custom OCR regions	❌	✅	✅
Key-value/table extraction	✅ (available through the Data Extraction API)	✅	❌
Orientation detection	❌	✅	✅
Image preprocessing (deskew, etc.)	❌	✅	✅ (manual)
Performance and speed	✅ Fast	✅ Fast	Depends on SDK setup (OCR module/IRIS module)
API access	Simple HTTP API	Requires SDK setup	Requires SDK setup

Real-world use cases

Invoice OCR — Convert scanned invoices into searchable PDFs, or extract totals and vendor info using OCR.
Contract digitization — Turn scanned contracts into searchable, selectable PDFs for legal archiving.
Form processing — Use OCR to extract fields like names, dates, and signatures from scanned forms.
Multi-language document digitization — OCR documents in multiple languages with full Unicode support.

Which OCR SDK should I use?

Need	SDK to use
Basic OCR from PDFs/images	Document Engine OCR
Production-ready OCR solution without SDK setup	Document Engine OCR
OCR with form data, zones, orientation detection	Nutrient .NET SDK OCR
Batch processing of scanned documents	Either, depending on volume
Need to preserve layout, tables	Prefer Nutrient .NET SDK OCR

Guides for OCR

Start your free trial for unlimited access and expert support.

PDF OCR server

Comparing OCR SDKs: Nutrient vs. Apryse

Real-world use cases

Which OCR SDK should I use?

Guides for OCR

Usage

Language support

Best practices