Extract the text position from PDFs on iOS

Nutrient’s TextParser API exposes various helpers and data structures for working with text. These include information about the location of a given text element at different granularities — at the glyph, word, or text block level.

To get a general overview of the available text APIs, check out the parsing guide.

All main Nutrient text classes expose a frame property that can be used to query the location of a given text element on a PDF page.

Property Description
Glyph.frame Location of a single character (glyph, quad) on the PDF page.
Word.frame Location of a single word (multiple glyphs) on the PDF page.
TextBlock.frame Location of a text block (e.g. a column of text) on the PDF page.

Those properties return coordinates in normalized PDF coordinates. To learn more about coordinate spaces and how to convert them, see the Coordinate Space Conversions guide.

Here’s an example that will output the individual positions for all words on the first page of a document:

let document = ...

guard let parser = document.textParserForPage(at: 0) else {
    print("Parsing failed.")
    return controller
}

parser.words.forEach { word in
    print("The location of \(word.stringValue) is \(word.frame)")
}