Zonal OCR in C# .NET
This guide explains how to use Nutrient .NET SDK’s (formerly GdPicture.NET) optical character recognition (OCR) engine to recognize text in a specific area within your document. This is helpful if you know the exact position of the text in your PDF document or you only want to recognize text in a specific part of the page.
To recognize text in a specific area within a document and then save the result in another document, follow the steps outlined below:
- Create a GdPicturePDFobject and aGdPictureOCRobject.
- Load the source document by passing its path to the LoadFromFilemethod of theGdPicturePDFobject.
- Select the page where you want to recognize text with the SelectPagemethod of theGdPicturePDFobject.
- Render the selected page to a 200 dots-per-inch (DPI) image with the RenderPageToGdPictureImageExmethod of theGdPicturePDFobject.
- Pass the image to the GdPictureOCRobject with theSetImagemethod.
- Configure the OCR process with the GdPictureOCRobject in the following way:- Set the path to the OCR resource folder with the ResourceFolderproperty. The default language resources are located inGdPicture.NET 14\Redist\OCR. For more information on adding language resources, see the language support guide.
- With the AddLanguagemethod, add the language resources that Nutrient .NET SDK uses to recognize text in the image. This method takes an element of theOCRLanguageenum.
- Set whether OCR prioritizes recognition accuracy or speed with the OCRModeproperty.
- Set the character allowlist with the CharacterSetproperty. When scanning the image, the OCR engine only recognizes the characters included in the allowlist.
- Set the character denylist with the CharacterBlackListproperty. When scanning the image, the OCR engine doesn’t recognize the characters included in the denylist.
 
- Set the path to the OCR resource folder with the 
- Set the rectangular area on the page where you want to recognize text. Use the SetROImethod of theGdPictureOCRobject with the following parameters:- The distance in pixels between the left edge of the page and the left side of the rectangular area.
- The distance in pixels between the top of the page and the top of the rectangular area.
- The width of the rectangular area in pixels.
- The height of the rectangular area in pixels.
 
- Run the OCR process with the RunOCRmethod of theGdPictureOCRobject.
- Save the output in a document.
The example below recognizes text in a specific area on the first page of a document and then saves the result in a TXT file:
using GdPicturePDF gdpicturePDF = new GdPicturePDF();using GdPictureOCR gdpictureOCR = new GdPictureOCR();// Load the source document.gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");// Select the first page.gdpicturePDF.SelectPage(1);// Render the first page to a 200 DPI image.int rasterPageID = gdpicturePDF.RenderPageToGdPictureImageEx(200, true);// Pass the image to the `GdPictureOCR` object.gdpictureOCR.SetImage(rasterPageID);// Configure the OCR process.gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR";gdpictureOCR.AddLanguage(OCRLanguage.English);gdpictureOCR.SetROI(100, 100, 200, 50);// Run the OCR process.string resID = gdpictureOCR.RunOCR();// Save the output in a TXT document.gdpictureOCR.SaveAsText(resID, @"C:\temp\output.txt", OCROutputTextFormat.Utf16, true);// Release unnecessary resources.GdPictureDocumentUtilities.DisposeImage(rasterPageID);gdpicturePDF.CloseDocument();Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()Using gdpictureOCR As GdPictureOCR = New GdPictureOCR()    ' Load the source document.    gdpicturePDF.LoadFromFile("C:\temp\source.pdf")    ' Select the first page.    gdpicturePDF.SelectPage(1)    ' Render the first page to a 200 DPI image.    Dim rasterPageID As Integer = gdpicturePDF.RenderPageToGdPictureImageEx(200, True)    ' Pass the image to the `GdPictureOCR` object.    gdpictureOCR.SetImage(rasterPageID)    ' Configure the OCR process.    gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR"    gdpictureOCR.AddLanguage(OCRLanguage.English)    gdpictureOCR.SetROI(100, 100, 200, 50)    ' Run the OCR process.    Dim resID As String = gdpictureOCR.RunOCR()    ' Save the output in a TXT document.    gdpictureOCR.SaveAsText(resID, "C:\temp\output.txt", OCROutputTextFormat.Utf16, True)    ' Release unnecessary resources.    GdPictureDocumentUtilities.DisposeImage(rasterPageID)    gdpicturePDF.CloseDocument()End UsingEnd UsingThe example below recognizes phone numbers and addresses in specific areas in a document. It first recognizes numbers in a specific area on the first page of a document. Then, it recognizes any text in another area of the page. Finally, it saves the result in a TXT file:
using GdPicturePDF gdpicturePDF = new GdPicturePDF();using GdPictureOCR gdpictureOCR = new GdPictureOCR();// Load the source document.gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");// Select the first page.gdpicturePDF.SelectPage(1);// Render the first page to a 200 DPI image.int rasterPageID = gdpicturePDF.RenderPageToGdPictureImageEx(200, true);// Pass the image to the `GdPictureOCR` object.gdpictureOCR.SetImage(rasterPageID);// Create a list where OCR results are saved.List<string> results = new List<string>();// Configure the general settings of the OCR process.gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR";gdpictureOCR.AddLanguage(OCRLanguage.English);// Configure the OCR process for recognizing the phone number.gdpictureOCR.Context = OCRContext.OCRContextSingleLine;gdpictureOCR.CharacterSet = "0123456789";gdpictureOCR.SetROI(100, 100, 200, 50);// Run the OCR process to recognize the phone number.gdpictureOCR.RunOCR("PhoneNumber");results.Add("PhoneNumber");// Configure the OCR process for recognizing the address.gdpictureOCR.Context = OCRContext.OCRContextSingleBlock;gdpictureOCR.CharacterSet = "";gdpictureOCR.SetROI(300, 100, 200, 200);// Run the OCR process to recognize the address.gdpictureOCR.RunOCR("Address");results.Add("Address");// Save the output in a TXT document.gdpictureOCR.SaveAsText(results, @"C:\temp\output.txt", OCROutputTextFormat.Utf16, true);// Release unnecessary resources.GdPictureDocumentUtilities.DisposeImage(rasterPageID);gdpicturePDF.CloseDocument();Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()Using gdpictureOCR As GdPictureOCR = New GdPictureOCR()    ' Load the source document.    gdpicturePDF.LoadFromFile("C:\temp\source.pdf")    ' Select the first page.    gdpicturePDF.SelectPage(1)    ' Render the first page to a 200 DPI image.    Dim rasterPageID As Integer = gdpicturePDF.RenderPageToGdPictureImageEx(200, True)    ' Pass the image to the `GdPictureOCR` object.    gdpictureOCR.SetImage(rasterPageID)    ' Create a list where OCR results are saved.    Dim results As List(Of String) = New List(Of String)()    ' Configure the general settings of the OCR process.    gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR"    gdpictureOCR.AddLanguage(OCRLanguage.English)    ' Configure the OCR process for recognizing the phone number.    gdpictureOCR.Context = OCRContext.OCRContextSingleLine    gdpictureOCR.CharacterSet = "0123456789"    gdpictureOCR.SetROI(100, 100, 200, 50)    ' Run the OCR process to recognize the phone number.    gdpictureOCR.RunOCR("PhoneNumber")    results.Add("PhoneNumber")    ' Configure the OCR process for recognizing the address.    gdpictureOCR.Context = OCRContext.OCRContextSingleBlock    gdpictureOCR.CharacterSet = ""    gdpictureOCR.SetROI(300, 100, 200, 200)    ' Run the OCR process to recognize the address.    gdpictureOCR.RunOCR("Address")    results.Add("Address")    ' Save the output in a TXT document.    gdpictureOCR.SaveAsText(results, "C:\temp\output.txt", OCROutputTextFormat.Utf16, True)    ' Release unnecessary resources.    GdPictureDocumentUtilities.DisposeImage(rasterPageID)    gdpicturePDF.CloseDocument()End UsingEnd Using 
  
  
  
 