C# OCR Forms to Text
This guide explains how to convert scanned forms to searchable PDFs. GdPicture.NET’s optical character recognition (OCR) engine allows you to recognize text in a form and then save the text in a PDF.
Converting Forms to Searchable PDFs
To convert a form to a searchable PDF, follow these steps:
-
Create a
GdPicturePDF
object, aGdPictureImaging
object, and aGdPictureOCR
object. -
Select the scanned image of a form by passing its path to the
CreateGdPictureImageFromFile
method of theGdPictureImaging
object. -
Configure the OCR process with the
GdPictureOCR
object in the following way:-
Set the image with the
SetImage
method. -
Set the path to the OCR resource folder with the
ResourceFolder
property. The default language resources are located inGdPicture.NET 14\Redist\OCR
. For more information on adding language resources, see the language support guide. -
With the
AddLanguage
method, add the language resources that GdPicture.NET uses to recognize text in the image. This method takes a member of theOCRLanguage
enumeration.
-
-
Run the OCR process with the
RunOCR
method of theGdPictureOCR
object. -
Get the result of the OCR process as text with the
GetOCRResultText
method of theGdPictureOCR
object. -
Create the output with the
CreateFromText
method of theGdPicturePDF
object. The first parameter sets the conformance level of the PDF document. This parameter is a member of thePdfConformance
enumeration. For example, usePDF
to create a common PDF document. -
Save the output in a PDF document.
The example below converts a form to a searchable PDF:
using GdPicturePDF gdpicturePDF = new GdPicturePDF(); using GdPictureImaging gdpictureImaging = new GdPictureImaging(); using GdPictureOCR gdpictureOCR = new GdPictureOCR(); // Select the image to process. int imageID = gdpictureImaging.CreateGdPictureImageFromFile(@"C:\temp\source.png"); // Set the OCR parameters. gdpictureOCR.SetImage(imageID); gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR"; gdpictureOCR.AddLanguage(OCRLanguage.English); // Run the OCR process. string resID = gdpictureOCR.RunOCR(); // Get the result of the OCR process as text. string content = gdpictureOCR.GetOCRResultText(resID); // Save the result in a PDF document. gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10, TextAlignment.TextAlignmentNear, content, 12, "Arial", false, false, true, false); gdpicturePDF.SaveToFile(@"C:\temp\output.pdf"); gdpictureImaging.ReleaseGdPictureImage(imageID);
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() Using gdpictureImaging As GdPictureImaging = New GdPictureImaging() Using gdpictureOCR As GdPictureOCR = New GdPictureOCR() ' Select the image to process. Dim imageID As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:\temp\source.png") ' Set the OCR parameters. gdpictureOCR.SetImage(imageID) gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR" gdpictureOCR.AddLanguage(OCRLanguage.English) ' Run the OCR process. Dim resID As String = gdpictureOCR.RunOCR() ' Get the result of the OCR process as text. Dim content As String = gdpictureOCR.GetOCRResultText(resID) ' Save the result in a PDF document. gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10, TextAlignment.TextAlignmentNear, content, 12, "Arial", False, False, True, False) gdpicturePDF.SaveToFile("C:\temp\output.pdf") gdpictureImaging.ReleaseGdPictureImage(imageID) End Using End Using End Using