Applying OCR to a PDF document
OCR converts image-based PDFs into searchable, selectable documents. This workflow helps you process scanned files while preserving original page appearance.
Use this sample to:
- Extract text from scanned PDF pages
- Add an invisible text layer for search and copy
- Keep original layout and visual content intact
Project setup
Install:
- The core Nutrient .NET SDK package
GdPicture.Resourcesfor OCR language and recognition resources
Prepare the project
Register the SDK license before running OCR operations. For setup details, refer to the getting started with .NET SDK guide.
using GdPicture14;
LicenseManager licence = new LicenseManager();licence.RegisterKEY(""); // Set your license keyLoad the PDF document
Create a GdPicturePDF instance and load the source PDF:
using GdPicturePDF pdf = new GdPicturePDF();pdf.LoadFromFile(@"input_image_based.pdf");Apply OCR processing
Run OCR across all pages:
pdf.OcrPages("*", 0, "eng", "", "", 200);Parameter summary:
"*"— Process all pages0— Use default OCR mode"eng"— Use English OCR language data"",""— No character allowlist or denylist200— Process at 200 DPI
Save the OCRed document
Write the processed PDF with the added text layer:
pdf.SaveToFile(@"output.pdf");The output PDF keeps its original visual content and adds searchable text for indexing, text selection, and accessibility tooling.