Understanding key-value pair extraction confidence score
PSPDFKit Processor has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).
PSPDFKit’s key-value pair (KVP) extraction engine calculates a confidence score that expresses how confident the engine is in the accuracy of the extracted data.
The confidence score is calculated by considering the following factors, among others:
-
The confidence in the optical character recognition (OCR) result at the character level. Some characters are more difficult to recognize than others.
-
The confidence in the OCR result at the word level. Some words are more difficult to recognize than others.
-
The data type of the key. Some data types are more difficult to recognize than others. For example, dates and IBANs are relatively easy to recognize, while phone numbers and addresses are generally more difficult.
The confidence score enables you to filter results based on their assumed accuracy. For example, you can disregard data extraction results with a low confidence score or flag them as data items that require manual checks.