Compare PDF text using JavaScript
Text Comparison is used to visually compare the text of pages of different documents. It’s helpful for comparison of different versions of the same document, or for comparing different documents that have similar content, and it allows for the analysis of textual content between different documents. It’s particularly useful for documents that have undergone edits, enabling users to spot changes swiftly. The comparison is done on a per-page basis, and the differences are highlighted in the user interface (UI).
Comparing documents and text is available when using the Web SDK in standalone operational mode.
Text comparison is possible in Nutrient Web SDK with the corresponding license component. Contact Sales if you’re interested in this functionality.
To process two documents for comparison, provide the documents to the loadTextComparison
method. The method takes an object with the following properties:
PSPDFKit.loadTextComparison({ ...defaultConfiguration, documentA: "text-comparison/static/documentA.pdf", documentB: "text-comparison/static/documentB.pdf" });
In the configuration object above, set the following properties for the comparison:
-
documentA
— The path to the first document to compare. -
documentB
— The path to the second document to compare.
Default UI
The default UI consists of the following components:
-
Primary toolbar, which contains the main actions, like showing the comparison sidebar, text comparison navigation, scroll-lock, and any other primary toolbar item from the allowed list.
-
Secondary toolbars, which contains the page navigation, pan mode, zooming, and any other secondary toolbar item from the allowed list.
-
Sidebar, which contains the text comparison navigation and the list of pages with differences.
Customizing the UI
Every part of the UI is customizable and can be hidden or shown based on a user’s requirements using the options from PSPDFKit.textComparisonDefaultToolbarItems
and PSPDFKit.textComparisonInnerToolbarItems
.
To customize the primary toolbar, use the toolbarItems
configuration option and pass the items you want to show in the toolbar. The toolbar items are defined in PSPDFKit.textComparisonDefaultToolbarItems
. The secondary inner toolbar items are defined in PSPDFKit.textComparisonInnerToolbarItems
and can be customized using the innerToolbarItems
configuration option:
PSPDFKit.loadTextComparison({ ...restOfConfigurations, toolbarItems: [ { type: "prev-change" }, { type: "next-change" }, { type: "comparison-changes" }, { type: "scroll-lock" } ] });
Customizing comparison highlights
The default colors chosen by Nutrient Web SDK show a good contrast level when overlaid. However, it’s possible to choose which colors will be used to highlight the differences between the two documents. This can be done by setting the comparisonSidebarConfig.diffColors
configuration option in the loadTextComparison
method. The diffColors
option accepts a DiffColors
object with the following properties:
PSPDFKit.loadTextComparison({ ...restOfConfigurations, comparisonSidebarConfig: { diffColors: { deletionColor: new PSPDFKit.Color({ r: 255, g: 218, b: 185 }), insertionColor: new PSPDFKit.Color({ r: 200, g: 255, b: 200 }) } } });
Programmatic text comparison
Text Comparison can also be used programmatically to compare the text of pages of different documents without loading the UI. To perform a text comparison operation, provide two documents and a set of options. The options are used to configure the comparison operation.
Describing your documents
The PSPDFKit.DocumentDescriptor
class is used to provide all the necessary details about your documents for comparison:
-
filePath
— Path to the document or anArrayBuffer
. -
password
— Optional password if the document is encrypted. -
pageIndexes
— An array of page indexes, or an array of ranges where an array is[min, max]
. If omitted, all pages will be staged for comparison.
const originalDocument = new PSPDFKit.DocumentDescriptor({ filePath: "document-comparison/static/documentA.pdf", pageIndexes: [0] }); const changedDocument = new PSPDFKit.DocumentDescriptor({ filePath: "document-comparison/static/documentB.pdf", pageIndexes: [0] });
Defining the comparison operation
The PSPDFKit.ComparisonOperation
class outlines the comparison type and optional settings:
-
type
— Type of comparison. The default isComparisonOperationType.TEXT
. UsePSPDFKit.ComparisonOperationType
to check for available comparison types. As of now, onlyComparisonOperationType.TEXT
is supported. -
options
— The settings for the operation. Currently onlynumberOfContextWords
, which specifies the number of context words for the comparison, is supported.
const textComparisonOperation = new PSPDFKit.ComparisonOperation( PSPDFKit.ComparisonOperationType.TEXT, { numberOfContextWords: 2 } );
Text comparison
The final step is to call the instance#compareDocuments
method:
const comparisonResult = await instance.compareDocuments( { originalDocument, changedDocument }, textComparisonOperation ); console.log(comparisonResult);
Understanding the comparison result
The comparison provides a PSPDFKit.DocumentComparisonResult
, which outlines:
-
type
— The type of comparison (currently onlyComparisonOperationType.TEXT
is supported). -
hunks
— Hunks of detected text changes.
A hunk groups operations that describe how to transform the original text to the changed text. For instance, if a word is replaced, the hunk will include operations to delete the original word and insert the changed word. The structure of a hunk is:
-
originalRange
— The range the hunk represents on the original page. -
changedRange
— The range the hunk represents on the changed page. -
operations
— The operations the hunk contains.
An operation represents a single insertion, a single deletion, or no change between the original and changed text. It’s composed of:
-
type
— The operation type (“insert”, “delete”, or “equal”). -
text
— The text the operation is based upon. -
originalTextBlocks
— The rectangles the operation relates to in the original document. -
changedTextBlocks
— The rectangles the operation relates to in the changed document.
A text block relates text to a specific region in a document:
-
range
— The range in the document page the text block relates to. -
rects
— The rectangles on the document page the text block refers to.
Example Result
The result will be structured similarly to the following:
[{ "documentComparisonResults": [{ "changedPageIndex": 1, "comparisonResults": [{ "hunks": [{ "changedRange": { "length": 1, "position": 1 }, "operations": [{ "changedTextBlocks": { "range": { "length": 1, "position": 0 }, "rects": [ [ 341.1, 265.2, 0, 0 ] ], }, "originalTextBlocks": { "range": { "length": 1, "position": 1 }, "rects": [ [ 341.1, 265.2, 74.4, 288.0 ] ], }, "text": "1", "type": "delete" }], "originalRange": { "length": 1, "position": 1 } }], "type": "text" }], "originalPageIndex": 0 }] }]
These steps allow you to pinpoint changes between documents with ease and to build your own custom user interface to display the results, as demonstrated in this sample project. Refer to our public API documentation to read more technical details about the Text Comparison API and learn how to use it in your implementation.