Extract metadata from PDFs on Android
Nutrient comes with DocumentPdfMetadata
and DocumentXmpMetadata
, which allow you to retrieve or modify a document’s metadata. This guide covers extracting metadata (to modify metadata, see our separate guide for editing metadata).
Dictionary-based metadata
Use DocumentPdfMetadata
to work with the dictionary-based metadata in a PDF.
All values specified in the PdfValue
are represented by the following types:
Boolean
long
double
String
List<PdfValue>
Map<String, PdfValue>
By default, the dictionary metadata may contain the following information keys:
Author
CreationDate
Creator
Keywords
ModDate
Producer
Title
You can, of course, add any supported key-value dictionary to the metadata. When dealing with these predefined keys, it’s recommended to use the DocumentPdfMetadata
getters and setters so that you get out-of-the-box conversions from objects such as Date
.
To get an entry of the metadata dictionary (e.g. the Author
), you can use the following code snippet:
val document = ...val pdfMetadata = document.getPdfMetadata()val author = pdfMetada.getAuthor()
PdfDocument document = ...DocumentPdfMetadata pdfMetadata = document.getPdfMetadata();String author = pdfMetadata.getAuthor();
For any custom values, use this:
val document = ...val pdfMetadata = document.pdfMetadataval value = pdfMetada.get("Custom key")
PdfDocument document = ...DocumentPdfMetadata pdfMetadata = document.getPdfMetadata();PdfValue value = pdfMetadata.get("Custom key");
XMP metadata
Use DocumentXmpMetadata
to work with the metadata stream containing XMP data.
Each key in the XMP metadata stream has to have a namespace set. You can define your own namespace or use one of the already existing ones. PSPDFKit exposes two constants for common namespaces:
DocumentPdfMetadata#XMP_PDF_NAMESPACE
/DocumentPdfMetadata#XMP_PDF_NAMESPACE_PREFIX
— the XMP PDF namespace created by Adobe(opens in a new tab) §3.1DocumentXmpMetadata#XMP_DC_NAMESPACE
/DocumentXmpMetadata#XMP_DC_NAMESPACE_PREFIX
— the Dublin Core namespace(opens in a new tab)
When setting a value, you also have to pass along a suggested namespace prefix, as this can’t be generated automatically.
Use the following code snippet to get an object from the XMP metadata:
val xmpMetadata = document.xmpMetadataval pdfValue = xmpMetadata.get("Key", NAMESPACE)
DocumentXmpMetadata xmpMetadata = document.getXmpMetadata();PdfValue pdfValue = xmpMetadata.get("Key", NAMESPACE);