Extract metadata from PDFs on Android

Nutrient comes with DocumentPdfMetadata and DocumentXmpMetadata, which allow you to retrieve or modify a document’s metadata. This guide covers extracting metadata (to modify metadata, see our separate guide for editing metadata).

Dictionary-based metadata

Use DocumentPdfMetadata to work with the dictionary-based metadata in a PDF.

All values specified in the PdfValue are represented by the following types:

Boolean
long
double
String
List<PdfValue>
Map<String, PdfValue>

By default, the dictionary metadata may contain the following information keys:

Author
CreationDate
Creator
Keywords
ModDate
Producer
Title

You can, of course, add any supported key-value dictionary to the metadata. When dealing with these predefined keys, it’s recommended to use the DocumentPdfMetadata getters and setters so that you get out-of-the-box conversions from objects such as Date.

To get an entry of the metadata dictionary (e.g. the Author), you can use the following code snippet:

KOTLIN
JAVA

val document = ...
val pdfMetadata = document.getPdfMetadata()
val author = pdfMetada.getAuthor()

PdfDocument document = ...
DocumentPdfMetadata pdfMetadata = document.getPdfMetadata();
String author = pdfMetadata.getAuthor();

For any custom values, use this:

KOTLIN
JAVA

val document = ...
val pdfMetadata = document.pdfMetadata
val value = pdfMetada.get("Custom key")

PdfDocument document = ...
DocumentPdfMetadata pdfMetadata = document.getPdfMetadata();
PdfValue value = pdfMetadata.get("Custom key");

XMP metadata

Use DocumentXmpMetadata to work with the metadata stream containing XMP data.

Each key in the XMP metadata stream has to have a namespace set. You can define your own namespace or use one of the already existing ones. PSPDFKit exposes two constants for common namespaces:

DocumentPdfMetadata#XMP_PDF_NAMESPACE/DocumentPdfMetadata#XMP_PDF_NAMESPACE_PREFIX — the XMP PDF namespace created by Adobe(opens in a new tab) §3.1
DocumentXmpMetadata#XMP_DC_NAMESPACE/DocumentXmpMetadata#XMP_DC_NAMESPACE_PREFIX — the Dublin Core namespace(opens in a new tab)

When setting a value, you also have to pass along a suggested namespace prefix, as this can’t be generated automatically.

Use the following code snippet to get an object from the XMP metadata:

KOTLIN
JAVA

val xmpMetadata = document.xmpMetadata
val pdfValue = xmpMetadata.get("Key", NAMESPACE)

DocumentXmpMetadata xmpMetadata = document.getXmpMetadata();
PdfValue pdfValue = xmpMetadata.get("Key", NAMESPACE);

Extract metadata from PDFs on Android

Dictionary-based metadata

XMP metadata

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.