Extract invoice data with C# and OCR

Nutrient .NET SDK’s (formerly GdPicture.NET) key-value pair (KVP) extraction engine enables you to recognize related data items in a document and export them to an external destination like a spreadsheet.

To extract data items from an invoice, follow the steps below:

  1. Create a GdPictureOCR object and a GdPictureImaging object.
  2. Select the invoice by passing its path to the CreateGdPictureImageFromFile method of the GdPictureImaging object.
  3. Configure the OCR process with the GdPictureOCR object in the following way:
    • Set the invoice with the SetImage method.
    • Set the path to the OCR resource folder with the ResourceFolder property. The default language resources are located in GdPicture.NET 14\Redist\OCR. For more information on adding language resources, see the language support guide.
    • With the AddLanguage method, add the language resources that Nutrient .NET SDK uses to recognize text in the image. This method takes a member of the OCRLanguage enumeration.
  4. Run the OCR process with the RunOCR method of the GdPictureOCR object.
  5. Get the number of key-value pairs detected during the OCR process with the GetKeyValuePairCount method of the GdPictureOCR object, and loop through them.
  6. Get the key-value pairs, the data types, and the confidence scores with the following methods:
  7. Write the output to the console.
  8. Release unnecessary resources.

The example below retrieves key-value pairs from the following invoice.

Sample invoice

Download the sample invoice and run the code below, or check out our demo.

=

using GdPictureOCR gdpictureOCR = new GdPictureOCR();
using GdPictureImaging gdpictureImaging = new GdPictureImaging();
// Load the source document.
int imageId = gdpictureImaging.CreateGdPictureImageFromFile(@"C:\temp\source.png");
// Configure the OCR process.
gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR";
gdpictureOCR.AddLanguage(OCRLanguage.English);
gdpictureOCR.SetImage(imageId);
// Run the OCR process.
string ocrResultId = gdpictureOCR.RunOCR();
string keyValuePairsData = "";
for (int pairIndex = 0; pairIndex < gdpictureOCR.GetKeyValuePairCount(ocrResultId); pairIndex++)
{
keyValuePairsData += $"| Key: {gdpictureOCR.GetKeyValuePairKeyString(ocrResultId, pairIndex)} | " +
$"Value: {gdpictureOCR.GetKeyValuePairValueString(ocrResultId, pairIndex)} | " +
$"Document Type: {gdpictureOCR.GetKeyValuePairDataType(ocrResultId, pairIndex).ToString()} | " +
$"Confidence Level: {Math.Round(gdpictureOCR.GetKeyValuePairConfidence(ocrResultId, pairIndex), 1).ToString()}% |\n";
}
// Write the output to the console.
Console.WriteLine(keyValuePairsData);
// Release unnecessary resources.
gdpictureImaging.ReleaseGdPictureImage(imageId);
gdpictureOCR.ReleaseOCRResults();

=

Format the output to obtain the following table:

KeyValueDocument typeConfidence level
Billing date20/09/2022DateTime100%
Order date20/09/2022DateTime100%
Republic of PDF+100 847 738 227PhoneNumber77.2%
IBANAT13 2060 4236 6111 5994IBAN100%
CustomerVandelay Industries Around the Corner 13 NBC CityString69.8%
Delivery addressVandelay Industries Around the Corner 13 NBC CityString69.9%
Invoice numberNo 00162String70.9%
Ref. number34751Number92.9%
No00162Number100%
ReferenceP00201UID100%
Quantity total (excl. VAT)320.00€Currency59%
Subtotal1,220.00€Currency100%
Discount (10%)-122.00€Currency90.6%
VAT (5.5%)+6710€Currency66.9%
Shipping cost0.00€Currency75%
TOTAL1,165.10€Currency100%
DescriptionLake MirrorString99.6%
VAT5.5%Percentage66.6%
Price per unit (excl. VAT)320.00€Currency80%
Tax No.AT98765321UID73.8%
#[email protected]EmailAddress65.6%
#www.bruuuk.comURL65.6%