Processing modes

The Data Extraction API offers four processing modes that trade off cost, speed, and extraction depth. Every request uses exactly one mode, set via the mode parameter in the instructions.

Mode comparison

	`text`	`structure`	`understand`	`agentic`
Cost per page	1 credit	1.5 credits	9 credits	18 credits
Speed	Fastest	Fast	Slower	Slowest
Output formats	Markdown only	Spatial, Markdown	Spatial, Markdown	Spatial, Markdown
OCR	No	Yes	Yes	Yes
AI augmentation	No	No	Yes	Hybrid (AI + VLM)
Layout analysis	No	Basic segmentation	Full AI-augmented	Hybrid (AI + VLM)
Word-level data	—	Yes (spatial only)	Yes (spatial only)	Yes (spatial only)

Text mode

Text mode extracts Markdown from born-digital documents. It doesn’t run optical character recognition (OCR) or AI augmentation, making it the fastest and cheapest option.

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"text"}'

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"text","output":{"format":"markdown"}}'
    },
)

print(response.json()["output"]["markdown"])

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "text", output: { format: "markdown" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
console.log(result.output.markdown);

When to use text mode:

Retrieval-augmented generation (RAG) ingestion and search indexing where you need clean Markdown from born-digital documents
High-throughput pipelines where cost and speed matter more than spatial data

Limitations:

Only supports markdown output format
No OCR — text in scanned documents or images won’t be returned

Structure mode

Structure mode runs OCR-based segmentation to extract typed document elements with bounding boxes and confidence scores. It handles scanned documents, images, and any file requiring optical character recognition.

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"structure","output":{"format":"spatial"}}'

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"structure","output":{"format":"spatial"}}'
    },
)

result = response.json()
for element in result["output"]["elements"]:
    print(f'{element["type"]}: {element.get("text", "")}')

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "structure", output: { format: "spatial" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
result.output.elements.forEach((el) => {
  console.log(`${el.type}: ${el.text || ""}`);
});

When to use structure mode:

Scanned documents and images that require OCR
Workflows that need spatial data (bounding boxes, coordinates) at lower cost than understand mode
Documents with straightforward layouts where AI augmentation isn’t necessary

Understand mode

Understand mode runs the full extraction pipeline with AI augmentation on top of OCR. It produces the most accurate results for complex documents with tables, multicolumn layouts, nested structures, formulas, and form fields.

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"understand","output":{"format":"spatial"}}'

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"understand","output":{"format":"spatial"}}'
    },
)

result = response.json()
for element in result["output"]["elements"]:
    print(f'{element["type"]}: {element.get("text", "")}')

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "understand", output: { format: "spatial" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
result.output.elements.forEach((el) => {
  console.log(`${el.type}: ${el.text || ""}`);
});

When to use understand mode:

Complex documents with tables, multicolumn layouts, or nested structures
Invoice and form processing where accurate data extraction matters
Documents with formulas, tables, printed-style handwriting, or mixed content types
Any workflow where extraction accuracy is more important than cost

Agentic mode

Agentic mode builds on the understand pipeline and augments it with a vision language model (VLM). The VLM improves results in areas like image descriptions, complex layout analysis, and semantic understanding. It’s designed for the most complex documents that require the deepest visual understanding.

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"agentic","output":{"format":"spatial"}}'

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"agentic","output":{"format":"spatial"}}'
    },
)

result = response.json()
for element in result["output"]["elements"]:
    print(f'{element["type"]}: {element.get("text", "")}')

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "agentic", output: { format: "spatial" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
result.output.elements.forEach((el) => {
  console.log(`${el.type}: ${el.text || ""}`);
});

When to use agentic mode:

Documents with embedded images, charts, or diagrams where you need generated descriptions, not just classification
Degraded scans, faxes, low-quality images, or documents where understand mode produces visible gaps in extracted text
Cursive, connected, or freeform handwriting — along with dense or annotated handwriting (forms with handwritten markup, filled-in government forms) — where understand mode’s character-level handwriting recognition isn’t sufficient
Workflows where you’ve tested understand mode on a representative sample and the quality isn’t sufficient for your use case

Limitations:

Slowest processing mode
18 credits per page

Choosing the right mode

The default mode is understand, which handles most documents well. Move to a different mode when you have a specific reason:

Do you only need Markdown from born-digital documents?
- Yes — Use text mode. It’s the fastest and cheapest option (1 credit per page), but has no OCR.
- No — Continue to step 2.
Are your documents straightforward (simple layouts, no tables or forms)?
- Yes — Use structure mode. OCR-based extraction at lower cost (1.5 credits per page).
- No — Continue to step 3.
Do your documents need image descriptions or contain cursive handwriting, or has understand mode produced insufficient quality on a representative sample?
- No — Stay with understand mode (default). It already handles tables, forms, formulas, printed-style handwriting, and multicolumn layouts without VLM.
- Yes — Use agentic mode. VLM augmentation on top of the understand pipeline adds descriptions for embedded images and a quality lift on degraded scans, cursive or freeform handwriting, and other hard-to-read content where understand mode falls short.

For mixed-complexity pipelines, route documents by type: Use text mode for born-digital PDFs, structure mode for scanned documents with simple layouts, understand mode for most complex documents, and agentic mode when VLM-augmented extraction is needed.

Handwriting

Handwriting recognition depends on both the writing style and the quality of the input image.

Match the mode to the writing style:

Printed-style handwriting — Clearly separated letters and short entries such as names, dates, and filled-in form fields. Understand mode handles this well with character-level OCR.
Cursive, connected, or freeform handwriting — Use agentic mode. Understand mode reads handwriting one character at a time, so connected or stylized writing produces frequent errors. The VLM in agentic mode interprets whole words and lines and is substantially more reliable for these documents.

Even in agentic mode, recognition of an ambiguous word can be confident but wrong — the model may settle on a plausible word that differs from what was actually written. For high-stakes fields, check the per-element confidence scores and add a human review step where accuracy is critical.

Credit costs

Mode	Cost per page	10-page document
`text`	1 credit	10 credits
`structure`	1.5 credits	15 credits
`understand`	9 credits	90 credits
`agentic`	18 credits	180 credits

See pricing for more details and FAQs.

Processing modes

Mode comparison

Text mode

Structure mode

Understand mode

Agentic mode

Choosing the right mode

Handwriting

Credit costs

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.