C# OCR Invoices to Text
This guide explains how to convert scanned invoices to searchable PDFs. GdPicture.NET’s optical character recognition (OCR) engine allows you to recognize text in an invoice and then save the text in a PDF.
Converting Invoices to Searchable PDFs
To convert an invoice to a searchable PDF, follow these steps:
-
Create a
GdPicturePDF
object, aGdPictureImaging
object, and aGdPictureOCR
object. -
Select the scanned image of an invoice by passing its path to the
CreateGdPictureImageFromFile
method of theGdPictureImaging
object. -
Configure the OCR process with the
GdPictureOCR
object in the following way:-
Set the image with the
SetImage
method. -
Set the path to the OCR resource folder with the
ResourceFolder
property. The default language resources are located inGdPicture.NET 14\Redist\OCR
. For more information on adding language resources, see the language support guide. -
With the
AddLanguage
method, add the language resources that GdPicture.NET uses to recognize text in the image. This method takes a member of theOCRLanguage
enumeration.
-
-
Run the OCR process with the
RunOCR
method of theGdPictureOCR
object. -
Get the result of the OCR process as text with the
GetOCRResultText
method of theGdPictureOCR
object. -
Create the output with the
CreateFromText
method of theGdPicturePDF
object. The first parameter sets the conformance level of the PDF document. This parameter is a member of thePdfConformance
enumeration. For example, usePDF
to create a common PDF document. -
Save the output in a PDF document.
The example below converts an invoice to a searchable PDF:
using GdPicturePDF gdpicturePDF = new GdPicturePDF(); using GdPictureImaging gdpictureImaging = new GdPictureImaging(); using GdPictureOCR gdpictureOCR = new GdPictureOCR(); // Select the image to process. int imageID = gdpictureImaging.CreateGdPictureImageFromFile(@"C:\temp\source.png"); // Set the OCR parameters. gdpictureOCR.SetImage(imageID); gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR"; gdpictureOCR.AddLanguage(OCRLanguage.English); // Run the OCR process. string resID = gdpictureOCR.RunOCR(); // Get the result of the OCR process as text. string content = gdpictureOCR.GetOCRResultText(resID); // Save the result in a PDF document. gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10, TextAlignment.TextAlignmentNear, content, 12, "Arial", false, false, true, false); gdpicturePDF.SaveToFile(@"C:\temp\output.pdf"); gdpictureImaging.ReleaseGdPictureImage(imageID);
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() Using gdpictureImaging As GdPictureImaging = New GdPictureImaging() Using gdpictureOCR As GdPictureOCR = New GdPictureOCR() ' Select the image to process. Dim imageID As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:\temp\source.png") ' Set the OCR parameters. gdpictureOCR.SetImage(imageID) gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR" gdpictureOCR.AddLanguage(OCRLanguage.English) ' Run the OCR process. Dim resID As String = gdpictureOCR.RunOCR() ' Get the result of the OCR process as text. Dim content As String = gdpictureOCR.GetOCRResultText(resID) ' Save the result in a PDF document. gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10, TextAlignment.TextAlignmentNear, content, 12, "Arial", False, False, True, False) gdpicturePDF.SaveToFile("C:\temp\output.pdf") gdpictureImaging.ReleaseGdPictureImage(imageID) End Using End Using End Using