---
title: "iOS PDF parsing library — Reliable PDF parser for iOS | Nutrient"
canonical_url: "https://www.nutrient.io/guides/ios/extraction/parse-content/"
md_url: "https://www.nutrient.io/guides/ios/extraction/parse-content.md"
last_updated: "2026-05-30T02:20:01.313Z"
description: "Parsing text and other content from a PDF can be a complex task, so we offer several abstractions to make this simpler. In a PDF."
---

# Parse PDF content on iOS

Parsing text and other content from a PDF can be a complex task, so we offer several abstractions to make this simpler. In a PDF, the text usually consists of glyphs that are positioned at absolute coordinates without any relative association with neighboring glyphs. Nutrient heuristically splits these glyphs up into words and blocks of the text. Our user interface leverages this information to allow users to select and annotate text. You can read more about this in our [text selection](https://www.nutrient.io/../../features/text-selection/) guide.

## Text parser

[`TextParser`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser) offers APIs to get the [`text`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser/text), [`glyphs`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser/glyphs) ([`Glyph`](https://www.nutrient.io/api/ios/documentation/pspdfkit/glyph)), [`words`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser/words) ([`Word`](https://www.nutrient.io/api/ios/documentation/pspdfkit/word)), [`textBlocks`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser/textblocks) ([`TextBlock`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textblock)), and even [`images`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser/images) ([`ImageInfo`](https://www.nutrient.io/api/ios/documentation/pspdfkit/imageinfo)) from a given PDF page. Every page of a PDF has a text parser that returns information about the text on a page:

### SWIFT

```swift

let document: Document =...
let textParser = document.textParserForPage(at: 0)!
let glyphs = textParser.glyphs

```

### OBJECTIVE-C

```objc

PSPDFDocument *document =...;
PSPDFTextParser *textParser = [document textParserForPageAtIndex:0];
NSArray<PSPDFGlyph *> *glyphs = textParser.glyphs;

```

[`TextParser`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser) also ensures that the text it extracts from a PDF follows any logical structure defined in the PDF (see section 14.7 of the [PDF specification](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf)), thereby enabling support for correctly handling different text-writing directions.

## Glyphs, text blocks, words, and images

### Glyphs

The [`Glyph`](https://www.nutrient.io/api/ios/documentation/pspdfkit/glyph) object is the building block for all text extraction in Nutrient. It represents a single glyph on a PDF page. Its [`frame`](https://www.nutrient.io/api/ios/documentation/pspdfkit/glyph/frame) property specifies, in PDF coordinates, where it’s located on the page, and its [`content`](https://www.nutrient.io/api/ios/documentation/pspdfkit/glyph/content) property returns the text it contains. The [`indexOnPage`](https://www.nutrient.io/api/ios/documentation/pspdfkit/glyph/indexonpage) property specifies the index of the glyph on the page, in reading order. Consider a page with the following text:

```

The quick brown fox jumps over the lazy dog.
--------------------------^

```

The [`Glyph`](https://www.nutrient.io/api/ios/documentation/pspdfkit/glyph) that represents the o in over will have an `indexOnPage` of 26. This index is unique to this glyph, and it can be used to directly access it from the [`glyphs`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser/glyphs) array of a [`TextParser`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser):

### SWIFT

```swift

let document: Document =...
let textParser = document.textParserForPage(at: 0)!
let glyphs = textParser.glyphs
let glyph = glyphs[26]

// Guaranteed to be `true`.
let indexEqualTo26 = (glyph.indexOnPage == 26)

```

### OBJECTIVE-C

```objc

PSPDFDocument *document =...;
PSPDFTextParser *textParser = [document textParserForPageAtIndex:0];
NSArray<PSPDFGlyph *> *glyphs = textParser.glyphs;
PSPDFGlyph *glyph = glyphs[26];
// Guaranteed to be `YES`.
BOOL indexEqualTo26 = (glyph.indexOnPage == 26);

```

This makes getting a particular glyph (and glyphs near it) much faster, as you already know the index. Ordering glyphs correctly is important if, for example, you wish to combine multiple glyphs and display something to the user.

### Text blocks

A [`TextBlock`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textblock) returned from the [`TextParser`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser) represents a contiguous group of glyphs, usually in a single line. For PDFs with multiple columns of text, a text block is a single line in a given column. [`TextBlock`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textblock) is backed by an `NSRange` ([`TextBlock.range`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textblock/range)) that describes the range of glyphs in [`TextParser.glyphs`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser/glyphs) that the block represents. The same information is available for a [`Word`](https://www.nutrient.io/api/ios/documentation/pspdfkit/word) via [`Word.range`](https://www.nutrient.io/api/ios/documentation/pspdfkit/word/range).

To fetch the glyphs associated with a given text block, retrieve them from [`TextParser.glyphs`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser/glyphs):

### SWIFT

```swift

let block: TextBlock? =...
let parser: TextParser =...
let glyphs: [Glyph] = parser.glyphs(in: block.range)

```

### OBJECTIVE-C

```objc

PSPDFTextBlock *block =...;
PSPDFTextParser *parser =...;
NSArray<PSPDFGlyph *> *glyphsInBlock = [parser glyphsInRange:block.range];

```

### Words

A [`Word`](https://www.nutrient.io/api/ios/documentation/pspdfkit/word), as the name suggests, represents a single word in a PDF. [`TextParser`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser) automatically generates these words when parsing the text blocks, and they can be retrieved via the [`words`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser/words) property. You can also access the words in a particular text block via the [`TextBlock.words`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textblock/words) property.

### Images

The [`TextParser`](https://www.nutrient.io/api/ios/documentation/pspdfkit/textparser) API also supports extracting images from PDF pages. To learn more about how to do that, refer to the [image extraction](https://www.nutrient.io/guides/ios/extraction/image-extraction.md) guide.
---

## Related pages

- [PDF extraction library for iOS](/guides/ios/extraction.md)
- [Extract selected text from PDFs on iOS](/guides/ios/extraction/selected-text.md)
- [Extract metadata from PDFs on iOS](/guides/ios/extraction/metadata.md)
- [Extract pages from PDFs on iOS](/guides/ios/extraction/page-extraction.md)
- [Extract text from PDFs on iOS](/guides/ios/features/text-extraction.md)
- [Extract the text position from PDFs on iOS](/guides/ios/extraction/text-position.md)
- [Extract images from PDFs on iOS](/guides/ios/extraction/image-extraction.md)

