C# OCR Invoices to Text

This guide explains how to convert scanned invoices to searchable PDFs. GdPicture.NET’s optical character recognition (OCR) engine allows you to recognize text in an invoice and then save the text in a PDF.

Converting Invoices to Searchable PDFs

To convert an invoice to a searchable PDF, follow these steps:

  1. Create a GdPicturePDF object, a GdPictureImaging object, and a GdPictureOCR object.

  2. Select the scanned image of an invoice by passing its path to the CreateGdPictureImageFromFile method of the GdPictureImaging object.

  3. Configure the OCR process with the GdPictureOCR object in the following way:

    • Set the image with the SetImage method.

    • Set the path to the OCR resource folder with the ResourceFolder property. The default language resources are located in GdPicture.NET 14\Redist\OCR. For more information on adding language resources, see the language support guide.

    • With the AddLanguage method, add the language resources that GdPicture.NET uses to recognize text in the image. This method takes a member of the OCRLanguage enumeration.

  4. Run the OCR process with the RunOCR method of the GdPictureOCR object.

  5. Get the result of the OCR process as text with the GetOCRResultText method of the GdPictureOCR object.

  6. Create the output with the CreateFromText method of the GdPicturePDF object. The first parameter sets the conformance level of the PDF document. This parameter is a member of the PdfConformance enumeration. For example, use PDF to create a common PDF document.

  7. Save the output in a PDF document.

The example below converts an invoice to a searchable PDF:

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
using GdPictureImaging gdpictureImaging = new GdPictureImaging();
using GdPictureOCR gdpictureOCR = new GdPictureOCR();
// Select the image to process.
int imageID = gdpictureImaging.CreateGdPictureImageFromFile(@"C:\temp\source.png");
// Set the OCR parameters.
gdpictureOCR.SetImage(imageID);
gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR";
gdpictureOCR.AddLanguage(OCRLanguage.English);
// Run the OCR process.
string resID = gdpictureOCR.RunOCR();
// Get the result of the OCR process as text.
string content = gdpictureOCR.GetOCRResultText(resID);
// Save the result in a PDF document.
gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10,
    TextAlignment.TextAlignmentNear, content, 12, "Arial", false, false, true, false);
gdpicturePDF.SaveToFile(@"C:\temp\output.pdf");
gdpictureImaging.ReleaseGdPictureImage(imageID);
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
Using gdpictureImaging As GdPictureImaging = New GdPictureImaging()
Using gdpictureOCR As GdPictureOCR = New GdPictureOCR()
    ' Select the image to process.
    Dim imageID As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:\temp\source.png")
    ' Set the OCR parameters.
    gdpictureOCR.SetImage(imageID)
    gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR"
    gdpictureOCR.AddLanguage(OCRLanguage.English)
    ' Run the OCR process.
    Dim resID As String = gdpictureOCR.RunOCR()
    ' Get the result of the OCR process as text.
    Dim content As String = gdpictureOCR.GetOCRResultText(resID)
    ' Save the result in a PDF document.
    gdpicturePDF.CreateFromText(PdfConformance.PDF, 595, 842, 10, 10, 10, 10,
        TextAlignment.TextAlignmentNear, content, 12, "Arial", False, False, True, False)
    gdpicturePDF.SaveToFile("C:\temp\output.pdf")
    gdpictureImaging.ReleaseGdPictureImage(imageID)
End Using
End Using
End Using
Used Methods and Properties

Related Topics