Processing modes
The Data Extraction API offers four processing modes that trade off cost, speed, and extraction depth. Every request uses exactly one mode, set via the mode parameter in the instructions.
Mode comparison
text | structure | understand | agentic | |
|---|---|---|---|---|
| Cost per page | 1 credit | 1.5 credits | 9 credits | 18 credits |
| Speed | Fastest | Fast | Slower | Slowest |
| Output formats | Markdown only | Spatial, Markdown | Spatial, Markdown | Spatial, Markdown |
| OCR | No | Yes | Yes | Yes |
| AI augmentation | No | No | Yes | Hybrid (AI + VLM) |
| Layout analysis | No | Basic segmentation | Full AI-augmented | Hybrid (AI + VLM) |
| Word-level data | — | Yes (spatial only) | Yes (spatial only) | Yes (spatial only) |
Text mode
Text mode extracts Markdown from born-digital documents. It doesn’t run optical character recognition (OCR) or AI augmentation, making it the fastest and cheapest option.
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"text"}'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={"Authorization": "Bearer your_api_key_goes_here"}, files={"file": open("document.pdf", "rb")}, data={ "instructions": '{"mode":"text","output":{"format":"markdown"}}' },)
print(response.json()["output"]["markdown"])import fs from "node:fs";
const form = new FormData();form.append("file", fs.createReadStream("document.pdf"));form.append( "instructions", JSON.stringify({ mode: "text", output: { format: "markdown" } }),);
const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here" }, body: form,});
const result = await response.json();console.log(result.output.markdown);When to use text mode:
- Retrieval-augmented generation (RAG) ingestion and search indexing where you need clean Markdown from born-digital documents
- High-throughput pipelines where cost and speed matter more than spatial data
Limitations:
- Only supports
markdownoutput format - No OCR — text in scanned documents or images won’t be returned
Structure mode
Structure mode runs OCR-based segmentation to extract typed document elements with bounding boxes and confidence scores. It handles scanned documents, images, and any file requiring optical character recognition.
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"structure","output":{"format":"spatial"}}'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={"Authorization": "Bearer your_api_key_goes_here"}, files={"file": open("document.pdf", "rb")}, data={ "instructions": '{"mode":"structure","output":{"format":"spatial"}}' },)
result = response.json()for element in result["output"]["elements"]: print(f'{element["type"]}: {element.get("text", "")}')import fs from "node:fs";
const form = new FormData();form.append("file", fs.createReadStream("document.pdf"));form.append( "instructions", JSON.stringify({ mode: "structure", output: { format: "spatial" } }),);
const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here" }, body: form,});
const result = await response.json();result.output.elements.forEach((el) => { console.log(`${el.type}: ${el.text || ""}`);});When to use structure mode:
- Scanned documents and images that require OCR
- Workflows that need spatial data (bounding boxes, coordinates) at lower cost than understand mode
- Documents with straightforward layouts where AI augmentation isn’t necessary
Understand mode
Understand mode runs the full extraction pipeline with AI augmentation on top of OCR. It produces the most accurate results for complex documents with tables, multicolumn layouts, nested structures, formulas, and form fields.
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"understand","output":{"format":"spatial"}}'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={"Authorization": "Bearer your_api_key_goes_here"}, files={"file": open("document.pdf", "rb")}, data={ "instructions": '{"mode":"understand","output":{"format":"spatial"}}' },)
result = response.json()for element in result["output"]["elements"]: print(f'{element["type"]}: {element.get("text", "")}')import fs from "node:fs";
const form = new FormData();form.append("file", fs.createReadStream("document.pdf"));form.append( "instructions", JSON.stringify({ mode: "understand", output: { format: "spatial" } }),);
const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here" }, body: form,});
const result = await response.json();result.output.elements.forEach((el) => { console.log(`${el.type}: ${el.text || ""}`);});When to use understand mode:
- Complex documents with tables, multicolumn layouts, or nested structures
- Invoice and form processing where accurate data extraction matters
- Documents with formulas, tables, printed-style handwriting, or mixed content types
- Any workflow where extraction accuracy is more important than cost
Agentic mode
Agentic mode builds on the understand pipeline and augments it with a vision language model (VLM). The VLM improves results in areas like image descriptions, complex layout analysis, and semantic understanding. It’s designed for the most complex documents that require the deepest visual understanding.
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"agentic","output":{"format":"spatial"}}'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={"Authorization": "Bearer your_api_key_goes_here"}, files={"file": open("document.pdf", "rb")}, data={ "instructions": '{"mode":"agentic","output":{"format":"spatial"}}' },)
result = response.json()for element in result["output"]["elements"]: print(f'{element["type"]}: {element.get("text", "")}')import fs from "node:fs";
const form = new FormData();form.append("file", fs.createReadStream("document.pdf"));form.append( "instructions", JSON.stringify({ mode: "agentic", output: { format: "spatial" } }),);
const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here" }, body: form,});
const result = await response.json();result.output.elements.forEach((el) => { console.log(`${el.type}: ${el.text || ""}`);});When to use agentic mode:
- Documents with embedded images, charts, or diagrams where you need generated descriptions, not just classification
- Degraded scans, faxes, low-quality images, or documents where understand mode produces visible gaps in extracted text
- Cursive, connected, or freeform handwriting — along with dense or annotated handwriting (forms with handwritten markup, filled-in government forms) — where understand mode’s character-level handwriting recognition isn’t sufficient
- Workflows where you’ve tested understand mode on a representative sample and the quality isn’t sufficient for your use case
Limitations:
- Slowest processing mode
- 18 credits per page
Choosing the right mode
The default mode is understand, which handles most documents well. Move to a different mode when you have a specific reason:
- Do you only need Markdown from born-digital documents?
- Yes — Use text mode. It’s the fastest and cheapest option (1 credit per page), but has no OCR.
- No — Continue to step 2.
- Are your documents straightforward (simple layouts, no tables or forms)?
- Yes — Use structure mode. OCR-based extraction at lower cost (1.5 credits per page).
- No — Continue to step 3.
- Do your documents need image descriptions or contain cursive handwriting, or has understand mode produced insufficient quality on a representative sample?
- No — Stay with understand mode (default). It already handles tables, forms, formulas, printed-style handwriting, and multicolumn layouts without VLM.
- Yes — Use agentic mode. VLM augmentation on top of the understand pipeline adds descriptions for embedded images and a quality lift on degraded scans, cursive or freeform handwriting, and other hard-to-read content where understand mode falls short.
For mixed-complexity pipelines, route documents by type: Use text mode for born-digital PDFs, structure mode for scanned documents with simple layouts, understand mode for most complex documents, and agentic mode when VLM-augmented extraction is needed.
Handwriting
Handwriting recognition depends on both the writing style and the quality of the input image.
Match the mode to the writing style:
- Printed-style handwriting — Clearly separated letters and short entries such as names, dates, and filled-in form fields. Understand mode handles this well with character-level OCR.
- Cursive, connected, or freeform handwriting — Use agentic mode. Understand mode reads handwriting one character at a time, so connected or stylized writing produces frequent errors. The VLM in agentic mode interprets whole words and lines and is substantially more reliable for these documents.
Even in agentic mode, recognition of an ambiguous word can be confident but wrong — the model may settle on a plausible word that differs from what was actually written. For high-stakes fields, check the per-element confidence scores and add a human review step where accuracy is critical.
Credit costs
| Mode | Cost per page | 10-page document |
|---|---|---|
text | 1 credit | 10 credits |
structure | 1.5 credits | 15 credits |
understand | 9 credits | 90 credits |
agentic | 18 credits | 180 credits |
See pricing for more details and FAQs.