---
title: "Processing modes"
canonical_url: "https://www.nutrient.io/guides/dws-data-extraction/parsing/processing-modes/"
md_url: "https://www.nutrient.io/guides/dws-data-extraction/parsing/processing-modes.md"
last_updated: "2026-05-26T22:37:31.557Z"
description: "Compare text, structure, understand, and agentic processing modes for the Data Extraction API. Choose the right mode for cost, speed, and extraction depth."
---

# Processing modes

The Data Extraction API offers four processing modes that trade off cost, speed, and extraction depth. Every request uses exactly one mode, set via the `mode` parameter in the instructions.

## Mode comparison

|                     | `text`        | `structure`        | `understand`       | `agentic`          |
| ------------------- | ------------- | ------------------ | ------------------ | ------------------ |
| **Cost per page**   | 1 credit      | 1.5 credits        | 9 credits          | 18 credits         |
| **Speed**           | Fastest       | Fast               | Slower             | Slowest            |
| **Output formats**  | Markdown only | Spatial, Markdown  | Spatial, Markdown  | Spatial, Markdown  |
| **OCR**             | No            | Yes                | Yes                | Yes                |
| **AI augmentation** | No            | No                 | Yes                | Hybrid (AI + VLM)  |
| **Layout analysis** | No            | Basic segmentation | Full AI-augmented  | Hybrid (AI + VLM)  |
| **Word-level data** | —             | Yes (spatial only) | Yes (spatial only) | Yes (spatial only) |

## Text mode

Text mode extracts Markdown from digital-born documents. It doesn’t run OCR or AI augmentation, making it the fastest and cheapest option.

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"text"}'

```

### Python

```python

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"text","output":{"format":"markdown"}}'
    },
)

print(response.json()["output"]["markdown"])

```

### JavaScript

```javascript

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "text", output: { format: "markdown" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
console.log(result.output.markdown);

```

**When to use text mode:**

- RAG ingestion and search indexing where you need clean Markdown from digital-born documents

- High-throughput pipelines where cost and speed matter more than spatial data

**Limitations:**

- Only supports `markdown` output format

- No OCR — text in scanned documents or images won’t be returned

## Structure mode

Structure mode runs OCR-based segmentation to extract typed document elements with bounding boxes and confidence scores. It handles scanned documents, images, and any file requiring optical character recognition.

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"structure","output":{"format":"spatial"}}'

```

### Python

```python

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"structure","output":{"format":"spatial"}}'
    },
)

result = response.json()
for element in result["output"]["elements"]:
    print(f'{element["type"]}: {element.get("text", "")}')

```

### JavaScript

```javascript

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "structure", output: { format: "spatial" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
result.output.elements.forEach((el) => {
  console.log(`${el.type}: ${el.text || ""}`);
});

```

**When to use structure mode:**

- Scanned documents and images that require OCR

- Workflows that need spatial data (bounding boxes, coordinates) at lower cost than understand mode

- Documents with straightforward layouts where AI augmentation isn’t necessary

## Understand mode

Understand mode runs the full extraction pipeline with AI augmentation on top of OCR. It produces the most accurate results for complex documents with tables, multicolumn layouts, nested structures, formulas, and form fields.

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"understand","output":{"format":"spatial"}}'

```

### Python

```python

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"understand","output":{"format":"spatial"}}'
    },
)

result = response.json()
for element in result["output"]["elements"]:
    print(f'{element["type"]}: {element.get("text", "")}')

```

### JavaScript

```javascript

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "understand", output: { format: "spatial" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
result.output.elements.forEach((el) => {
  console.log(`${el.type}: ${el.text || ""}`);
});

```

**When to use understand mode:**

- Complex documents with tables, multicolumn layouts, or nested structures

- Invoice and form processing where accurate data extraction matters

- Documents with formulas, tables, handwriting, or mixed content types

- Any workflow where extraction accuracy is more important than cost

## Agentic mode

Agentic mode builds on the understand pipeline and augments it with a vision language model (VLM). The VLM improves results in areas like image descriptions, complex layout analysis, and semantic understanding. It’s designed for the most complex documents that require the deepest visual understanding.

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"agentic","output":{"format":"spatial"}}'

```

### Python

```python

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"agentic","output":{"format":"spatial"}}'
    },
)

result = response.json()
for element in result["output"]["elements"]:
    print(f'{element["type"]}: {element.get("text", "")}')

```

### JavaScript

```javascript

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "agentic", output: { format: "spatial" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
result.output.elements.forEach((el) => {
  console.log(`${el.type}: ${el.text || ""}`);
});

```

**When to use agentic mode:**

- The most complex documents that need the deepest visual understanding

- Documents where understand mode results need improvement in areas like image descriptions, complex layouts, or semantic classification

- Workflows where VLM-augmented extraction provides better accuracy than standard AI augmentation alone

**Limitations:**

- Slowest processing mode

- 18 credits per page

## Choosing the right mode

The default mode is understand, which handles most documents well. Move to a different mode when you have a specific reason:

1. **Do you only need Markdown from born-digital documents?**
   - Yes — Use **text** mode. It’s the fastest and cheapest option (1 credit per page), but has no OCR.
   - No — Continue to step 2.

2. **Are your documents straightforward (simple layouts, no tables or forms)?**
   - Yes — Use **structure** mode. OCR-based extraction at lower cost (1.5 credits per page).
   - No — Continue to step 3.

3. **Do your documents need VLM-augmented extraction?**
   - No — Stay with **understand** mode (default). AI augmentation handles most complex documents.
   - Yes — Use **agentic** mode. VLM augmentation on top of the understand pipeline provides the deepest visual understanding for the most complex documents.

For mixed-complexity pipelines, route documents by type: Use text mode for born-digital PDFs, structure mode for scanned documents with simple layouts, understand mode for most complex documents, and agentic mode when VLM-augmented extraction is needed.

## Credit costs

| Mode         | Cost per page | 10-page document |
| ------------ | ------------- | ---------------- |
| `text`       | 1 credit      | 10 credits       |
| `structure`  | 1.5 credits   | 15 credits       |
| `understand` | 9 credits     | 90 credits       |
| `agentic`    | 18 credits    | 180 credits      |

See [pricing](https://www.nutrient.io/guides/dws-data-extraction/pricing.md) for more details and FAQs.
---

## Related pages

- [API returns render-space pixels; display at 850 px wide.](/guides/dws-data-extraction/parsing/coordinate-spaces.md)
- [Multilingual extraction](/guides/dws-data-extraction/parsing/multilingual-extraction.md)
- [Extract document elements](/guides/dws-data-extraction/parsing/extract-document-elements.md)
- [Extract Markdown](/guides/dws-data-extraction/parsing/extract-markdown.md)
- [Parse endpoint](/guides/dws-data-extraction/parsing.md)

