Applying OCR to a PDF document

OCR converts image-based PDFs into searchable, selectable documents. This workflow helps you process scanned files while preserving original page appearance.

Use this sample to:

Extract text from scanned PDF pages
Add an invisible text layer for search and copy
Keep original layout and visual content intact

Project setup

Install:

The core Nutrient .NET SDK package
GdPicture.Resources for OCR language and recognition resources

Prepare the project

Register the SDK license before running OCR operations. For setup details, refer to the getting started with .NET SDK guide.

using GdPicture14;

LicenseManager licence = new LicenseManager();
licence.RegisterKEY(""); // Set your license key

Load the PDF document

Create a GdPicturePDF instance and load the source PDF:

using GdPicturePDF pdf = new GdPicturePDF();
pdf.LoadFromFile(@"input_image_based.pdf");

Apply OCR processing

Run OCR across all pages:

pdf.OcrPages("*", 0, "eng", "", "", 200);

Parameter summary:

"*" — Process all pages
0 — Use default OCR mode
"eng" — Use English OCR language data
"", "" — No character allowlist or denylist
200 — Process at 200 DPI

Save the OCRed document

Write the processed PDF with the added text layer:

pdf.SaveToFile(@"output.pdf");

The output PDF keeps its original visual content and adds searchable text for indexing, text selection, and accessibility tooling.

Applying OCR to a PDF document

Project setup

Prepare the project

Load the PDF document

Apply OCR processing

Save the OCRed document

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.