Extract metadata from PDFs on Android

Nutrient comes with DocumentPdfMetadata and DocumentXmpMetadata, which allow you to retrieve or modify a document’s metadata. This guide covers extracting metadata (to modify metadata, see our separate guide for editing metadata).

Dictionary-based metadata

Use DocumentPdfMetadata to work with the dictionary-based metadata in a PDF.

All values specified in the PdfValue are represented by the following types:

  • Boolean
  • long
  • double
  • String
  • List<PdfValue>
  • Map<String, PdfValue>

By default, the dictionary metadata may contain the following information keys:

  • Author
  • CreationDate
  • Creator
  • Keywords
  • ModDate
  • Producer
  • Title

You can, of course, add any supported key-value dictionary to the metadata. When dealing with these predefined keys, it’s recommended to use the DocumentPdfMetadata getters and setters so that you get out-of-the-box conversions from objects such as Date.

To get an entry of the metadata dictionary (e.g. the Author), you can use the following code snippet:

val document = ...
val pdfMetadata = document.getPdfMetadata()
val author = pdfMetada.getAuthor()

For any custom values, use this:

val document = ...
val pdfMetadata = document.pdfMetadata
val value = pdfMetada.get("Custom key")

XMP metadata

Use DocumentXmpMetadata to work with the metadata stream containing XMP data.

Each key in the XMP metadata stream has to have a namespace set. You can define your own namespace or use one of the already existing ones. PSPDFKit exposes two constants for common namespaces:

When setting a value, you also have to pass along a suggested namespace prefix, as this can’t be generated automatically.

Use the following code snippet to get an object from the XMP metadata:

val xmpMetadata = document.xmpMetadata
val pdfValue = xmpMetadata.get("Key", NAMESPACE)