Extract the text position from PDFs on iOS
Nutrient’s TextParser
API exposes various helpers and data structures for working with text. These include information about the location of a given text element at different granularities — at the glyph, word, or text block level.
To get a general overview of the available text APIs, check out the parsing guide.
All main Nutrient text classes expose a frame
property that can be used to query the location of a given text element on a PDF page.
Property | Description |
---|---|
Glyph.frame |
Location of a single character (glyph, quad) on the PDF page. |
Word.frame |
Location of a single word (multiple glyphs) on the PDF page. |
TextBlock.frame |
Location of a text block (e.g. a column of text) on the PDF page. |
Those properties return coordinates in normalized PDF coordinates. To learn more about coordinate spaces and how to convert them, see the Coordinate Space Conversions guide.
Here’s an example that will output the individual positions for all words on the first page of a document:
let document = ... guard let parser = document.textParserForPage(at: 0) else { print("Parsing failed.") return controller } parser.words.forEach { word in print("The location of \(word.stringValue) is \(word.frame)") }