---
title: "Extract document elements"
canonical_url: "https://www.nutrient.io/guides/dws-data-extraction/parsing/extract-document-elements/"
md_url: "https://www.nutrient.io/guides/dws-data-extraction/parsing/extract-document-elements.md"
last_updated: "2026-05-26T22:37:31.557Z"
description: "Extract typed document elements with bounding boxes, confidence scores, and reading order from PDFs, images, and Office files."
---

# Extract document elements

When `output.format` is set to `spatial`, the Data Extraction API returns a flat list of typed document elements. Each element includes its type, text content, spatial coordinates, detection confidence, and page reference.

## Basic element extraction

Send a document and receive structured spatial elements.

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"understand","output":{"format":"spatial"}}'
  # Extract data from the JSON response.

```

### Python

```python

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"understand","output":{"format":"spatial"}}'
    },
)

result = response.json()
for element in result["output"]["elements"]:
    print(f'{element["type"]}: {element.get("text", "")}')

```

### JavaScript

```javascript

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "understand", output: { format: "spatial" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
result.output.elements.forEach((el) => {
  console.log(`${el.type}: ${el.text || ""}`);
});

```

## Element types

The API returns six element types. All types are available in `structure`, `understand`, and `agentic` modes (`text` mode does not support spatial output).

| Type             | Description                     | Key fields                         |
| ---------------- | ------------------------------- | ---------------------------------- |
| `paragraph`      | Text content with semantic role | `text`, `role`, `words`            |
| `table`          | Structured table with cell data | `rowCount`, `columnCount`, `cells` |
| `formula`        | Mathematical expression         | `latex`                            |
| `picture`        | Image, chart, or diagram        | `classification`, `altDescription` |
| `keyValueRegion` | Form fields and key-value pairs | `pairs`                            |
| `handwriting`    | Handwritten text content        | `text`, `words`                    |

### Common fields

Every element includes these fields:

| Field          | Type    | Description                                                                                   |
| -------------- | ------- | --------------------------------------------------------------------------------------------- |
| `id`           | string  | Unique identifier (UUID)                                                                      |
| `type`         | string  | Element type (`paragraph`, `table`, `formula`, `picture`, `keyValueRegion`, `handwriting`)    |
| `bounds`       | object  | Bounding box with `x`, `y`, `width`, `height`. Origin at top-left. See [coordinate spaces](https://www.nutrient.io/guides/dws-data-extraction/parsing/coordinate-spaces.md). |
| `confidence`   | number  | Detection confidence between 0 and 1                                                          |
| `readingOrder` | integer | Position in the page-reading sequence                                                         |
| `page`         | object  | Source page with `pageIndex` (0-based), `pageNumber` (1-based integer), `width`, and `height` |

### Paragraph elements

Paragraphs cover all text content. The `role` field identifies the semantic function:

```json

{
  "type": "paragraph",
  "role": "SectionHeader",
  "text": "Revenue Summary",
  "confidence": 0.95,
  "readingOrder": 0,
  "bounds": { "x": 100, "y": 50, "width": 400, "height": 35 },
  "page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 }
}

```

Available roles:

| Role                 | Description                       |
| -------------------- | --------------------------------- |
| `Text`               | Body paragraphs                   |
| `Title`              | Document title                    |
| `SectionHeader`      | Section headings                  |
| `Header`             | Running page headers              |
| `Footer`             | Running page footers              |
| `Caption`            | Figure or table captions          |
| `Footnote`           | Footnotes                         |
| `ListItem`           | List items (ordered or unordered) |
| `PageNumber`         | Page number labels                |
| `Code`               | Code blocks                       |
| `CheckboxSelected`   | Selected checkbox                 |
| `CheckboxUnselected` | Unselected checkbox               |

The role is `null` when the API cannot determine the semantic function.

### Table elements

Tables include row and column counts, plus cell-level data with text, bounds, and span information:

```json

{
  "type": "table",
  "confidence": 0.92,
  "readingOrder": 2,
  "bounds": { "x": 100, "y": 150, "width": 600, "height": 120 },
  "page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
  "rowCount": 3,
  "columnCount": 3,
  "cells": [
    {
      "id": "c-001",
      "bounds": { "x": 100, "y": 150, "width": 200, "height": 30 },
      "confidence": 0.94,
      "row": 0,
      "column": 0,
      "rowSpan": 1,
      "colSpan": 1,
      "text": "Region"
    }
  ],
  "captionIds": null,
  "footnoteIds": null
}

```

Each cell includes `row`, `column`, `rowSpan`, and `colSpan` for reconstructing the table layout. `captionIds` and `footnoteIds` reference associated paragraph elements by their `id`.

### Formula elements

Formulas contain a LaTeX representation of the detected mathematical expression:

```json

{
  "type": "formula",
  "confidence": 0.88,
  "readingOrder": 3,
  "bounds": { "x": 100, "y": 300, "width": 250, "height": 40 },
  "page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
  "latex": "r = r_0 e^{kt}"
}

```

### Picture elements

Pictures include classification, confidence, and an AI-generated alt text description:

```json

{
  "type": "picture",
  "confidence": 0.89,
  "readingOrder": 2,
  "bounds": { "x": 100, "y": 300, "width": 400, "height": 300 },
  "page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
  "classification": "chart",
  "classificationConfidence": 0.91,
  "altDescription": "Bar chart showing quarterly revenue growth across regions",
  "captionIds": ["d1e2f3a4-4444-4000-8000-000000000004"],
  "footnoteIds": null
}

```

### Key-value region elements

Key-value regions detect form fields and structured label-value pairs:

```json

{
  "type": "keyValueRegion",
  "confidence": 0.87,
  "readingOrder": 4,
  "bounds": { "x": 100, "y": 700, "width": 500, "height": 100 },
  "page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
  "pairs": [
    {
      "id": "kvp-001",
      "key": {
        "id": "kve-001",
        "bounds": { "x": 100, "y": 700, "width": 150, "height": 25 },
        "confidence": 0.92,
        "entityType": "QUESTION",
        "value": "Invoice Number"
      },
      "value": {
        "id": "kve-002",
        "bounds": { "x": 260, "y": 700, "width": 200, "height": 25 },
        "confidence": 0.95,
        "entityType": "ANSWER",
        "value": "INV-2024-0042"
      },
      "relationshipConfidence": 0.93
    }
  ]
}

```

### Handwriting elements

Handwriting elements contain extracted handwritten text. Like paragraphs, they support optional word-level OCR data via `includeWords`:

```json

{
  "type": "handwriting",
  "confidence": 0.78,
  "readingOrder": 5,
  "bounds": { "x": 30, "y": 320, "width": 200, "height": 30 },
  "page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
  "text": "John Doe",
  "words": null
}

```

When `includeWords` is `true`, the `words` array contains per-word bounds and confidence — the same format as paragraph word-level data.

## Word-level data

Set `output.includeWords` to `true` to get word-level OCR data nested inside paragraph and table cell elements.

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"understand","output":{"format":"spatial","includeWords":true}}'

```

### Python

```python

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"understand","output":{"format":"spatial","includeWords":true}}'
    },
)

result = response.json()
for element in result["output"]["elements"]:
    if element.get("words"):
        for word in element["words"]:
            print(f'{word["text"]} (confidence: {word["confidence"]})')

```

### JavaScript

```javascript

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({
    mode: "understand",
    output: { format: "spatial", includeWords: true },
  }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
result.output.elements.forEach((el) => {
  if (el.words) {
    el.words.forEach((w) =>
      console.log(`${w.text} (confidence: ${w.confidence})`),
    );
  }
});

```

Each word object includes:

| Field        | Type   | Description                               |
| ------------ | ------ | ----------------------------------------- |
| `text`       | string | The word text                             |
| `bounds`     | object | Bounding box in document coordinate space |
| `confidence` | number | OCR confidence between 0 and 1            |

## Comparing spatial modes

The `structure`, `understand`, and `agentic` modes all return the same element types and output structure. The difference is in extraction depth and cost.

| Aspect   | `structure`                                | `understand`                   | `agentic`                              |
| -------- | ------------------------------------------ | ------------------------------ | -------------------------------------- |
| Speed    | Fast                                       | Slower                         | Slowest                                |
| Cost     | 1.5 credits per page                       | 9 credits per page             | 18 credits per page                    |
| Pipeline | OCR-based segmentation                     | AI-augmented layout analysis   | Hybrid (AI + VLM) layout analysis      |
| Best for | Scanned documents, straightforward layouts | Complex layouts, tables, forms | The most complex documents needing VLM |

Use `structure` mode when you need spatial elements and the documents have straightforward layouts. Use `understand` mode for complex documents with tables, multicolumn layouts, or mixed content types. Use `agentic` mode for the most complex documents that benefit from VLM-augmented extraction. See [processing modes](https://www.nutrient.io/guides/dws-data-extraction/parsing/processing-modes.md) for a full comparison, including `text` mode.
---

## Related pages

- [API returns render-space pixels; display at 850 px wide.](/guides/dws-data-extraction/parsing/coordinate-spaces.md)
- [Multilingual extraction](/guides/dws-data-extraction/parsing/multilingual-extraction.md)
- [Extract Markdown](/guides/dws-data-extraction/parsing/extract-markdown.md)
- [Parse endpoint](/guides/dws-data-extraction/parsing.md)
- [Processing modes](/guides/dws-data-extraction/parsing/processing-modes.md)

