Scan and OCR PDFs in C#
This guide explains how to scan a physical document with a scanner and then save the scanned image in a searchable PDF. Nutrient .NET SDK’s (formerly GdPicture.NET) optical character recognition (OCR) engine enables you to recognize text in an image and then save the text in a PDF. This guide uses the TWAIN protocol(opens in a new tab).
Printing and scanning aren’t supported in the cross-platform .NET 6.0 assembly. For more information, see the system compatibility guide.
To get an image from a scanner and then save it in a searchable PDF, follow the steps below:
- Create a GdPictureImagingobject and aGdPicturePDFobject.
- Store the handle of the active windows in a variable by calling the IntPtr.Zerostructure.
- Select the scanner by passing the handle to the TwainSelectSourceand theTwainOpenDefaultSourcemethods of theGdPictureImagingobject.
- Optional: Hide the scanning user interface with the TwainSetHideUImethod of theGdPictureImagingobject. Use this setting when your application cannot communicate with the scanner.
- Create a new PDF document with the NewPDFmethod of theGdPicturePDFobject. The parameter of this method sets the conformance level of the PDF document. This parameter is a member of thePdfConformanceenumeration. For example, usePDFto create a common PDF document.
- Get the image from the scanner by passing the handle to the TwainAcquireToGdPictureImagemethod of theGdPictureImagingobject.
- Add the scanned image to a new page in the destination document with the AddImageFromGdPictureImagemethod of theGdPicturePDFobject.
- Run the OCR process with the RunOCRmethod of theGdPictureOCRobject:- Set the code of the language that Nutrient .NET SDK uses to recognize text in the source document. To specify several languages, separate the language codes with the +character. For example,eng+fra.
- Set the path to the OCR resource folder. The default language resources are located in GdPicture.NET 14\Redist\OCR. For more information on adding language resources, see the language support guide.
- Set the character allowlist. When scanning the document, the OCR engine only recognizes the characters included in the allowlist. When you set "", all characters are recognized.
- Set the dot-per-inch (DPI) resolution the OCR engine uses. It’s recommended to use 300for the best combination of speed and accuracy.
 
- Set the code of the language that Nutrient .NET SDK uses to recognize text in the source document. To specify several languages, separate the language codes with the 
- Save the result in a PDF document.
- Close the TWAIN source handle.
The example below gets an image from a scanner and then saves it in a searchable PDF:
using GdPictureImaging gdpictureImaging = new GdPictureImaging();using GdPicturePDF gdpicturePDF = new GdPicturePDF();// Store the handle of the active windows in a variable.IntPtr WINDOW_HANDLE = IntPtr.Zero;// Select the scanner.gdpictureImaging.TwainSelectSource(WINDOW_HANDLE);gdpictureImaging.TwainOpenDefaultSource(WINDOW_HANDLE);// (Optional) Hide the scanning user interface.gdpictureImaging.TwainSetHideUI(true);// Create the destination PDF document.gdpicturePDF.NewPDF(PdfConformance.PDF);// Get the image from the scanner.int imageID = gdpictureImaging.TwainAcquireToGdPictureImage(WINDOW_HANDLE);// Add the scanned image to a new page in the destination document.gdpicturePDF.AddImageFromGdPictureImage(imageID, false, true);// Run the OCR process.gdpicturePDF.OcrPage("eng", @"C:\GdPicture.NET 14\Redist\OCR", "", 300);// Save the result in a PDF document.gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");// Release unnecessary resources.gdpictureImaging.ReleaseGdPictureImage(imageID);gdpictureImaging.TwainCloseSource();Using gdpictureImaging As GdPictureImaging = New GdPictureImaging()Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()    ' Store the handle of the active windows in a variable.    Dim WINDOW_HANDLE = IntPtr.Zero    ' Select the scanner.    gdpictureImaging.TwainSelectSource(WINDOW_HANDLE)    gdpictureImaging.TwainOpenDefaultSource(WINDOW_HANDLE)    ' (Optional) Hide the scanning user interface.    gdpictureImaging.TwainSetHideUI(True)    ' Create the destination PDF document.    gdpicturePDF.NewPDF(PdfConformance.PDF)    ' Get the image from the scanner.    Dim imageID As Integer = gdpictureImaging.TwainAcquireToGdPictureImage(WINDOW_HANDLE)    ' Add the scanned image to a new page in the destination document.    gdpicturePDF.AddImageFromGdPictureImage(imageID, False, True)    ' Run the OCR process.    gdpicturePDF.OcrPage("eng", "C:\GdPicture.NET 14\Redist\OCR", "", 300)    ' Save the result in a PDF document.    gdpicturePDF.SaveToFile("C:\temp\output.pdf")    ' Release unnecessary resources.    gdpictureImaging.ReleaseGdPictureImage(imageID)    gdpictureImaging.TwainCloseSource()End UsingEnd UsingUsed methods
    Related topics
     
  
  
  
 