Processing modes
The Data Extraction API offers four processing modes that trade off cost, speed, and extraction depth. Every request uses exactly one mode, set via the mode parameter in the instructions.
Mode comparison
text | structure | understand | agentic | |
|---|---|---|---|---|
| Cost per page | 1 credit | 1.5 credits | 9 credits | 18 credits |
| Speed | Fastest | Fast | Slower | Slowest |
| Output formats | Markdown only | Spatial, Markdown | Spatial, Markdown | Spatial, Markdown |
| OCR | No | Yes | Yes | Yes |
| AI augmentation | No | No | Yes | Hybrid (AI + VLM) |
| Layout analysis | No | Basic segmentation | Full AI-augmented | Hybrid (AI + VLM) |
| Word-level data | — | Yes (spatial only) | Yes (spatial only) | Yes (spatial only) |
Text mode
Text mode extracts Markdown from digital-born documents. It doesn’t run OCR or AI augmentation, making it the fastest and cheapest option.
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"text"}'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={"Authorization": "Bearer your_api_key_goes_here"}, files={"file": open("document.pdf", "rb")}, data={ "instructions": '{"mode":"text","output":{"format":"markdown"}}' },)
print(response.json()["output"]["markdown"])import fs from "node:fs";
const form = new FormData();form.append("file", fs.createReadStream("document.pdf"));form.append( "instructions", JSON.stringify({ mode: "text", output: { format: "markdown" } }),);
const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here" }, body: form,});
const result = await response.json();console.log(result.output.markdown);When to use text mode:
- RAG ingestion and search indexing where you need clean Markdown from digital-born documents
- High-throughput pipelines where cost and speed matter more than spatial data
Limitations:
- Only supports
markdownoutput format - No OCR — text in scanned documents or images won’t be returned
Structure mode
Structure mode runs OCR-based segmentation to extract typed document elements with bounding boxes and confidence scores. It handles scanned documents, images, and any file requiring optical character recognition.
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"structure","output":{"format":"spatial"}}'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={"Authorization": "Bearer your_api_key_goes_here"}, files={"file": open("document.pdf", "rb")}, data={ "instructions": '{"mode":"structure","output":{"format":"spatial"}}' },)
result = response.json()for element in result["output"]["elements"]: print(f'{element["type"]}: {element.get("text", "")}')import fs from "node:fs";
const form = new FormData();form.append("file", fs.createReadStream("document.pdf"));form.append( "instructions", JSON.stringify({ mode: "structure", output: { format: "spatial" } }),);
const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here" }, body: form,});
const result = await response.json();result.output.elements.forEach((el) => { console.log(`${el.type}: ${el.text || ""}`);});When to use structure mode:
- Scanned documents and images that require OCR
- Workflows that need spatial data (bounding boxes, coordinates) at lower cost than understand mode
- Documents with straightforward layouts where AI augmentation isn’t necessary
Understand mode
Understand mode runs the full extraction pipeline with AI augmentation on top of OCR. It produces the most accurate results for complex documents with tables, multicolumn layouts, nested structures, formulas, and form fields.
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"understand","output":{"format":"spatial"}}'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={"Authorization": "Bearer your_api_key_goes_here"}, files={"file": open("document.pdf", "rb")}, data={ "instructions": '{"mode":"understand","output":{"format":"spatial"}}' },)
result = response.json()for element in result["output"]["elements"]: print(f'{element["type"]}: {element.get("text", "")}')import fs from "node:fs";
const form = new FormData();form.append("file", fs.createReadStream("document.pdf"));form.append( "instructions", JSON.stringify({ mode: "understand", output: { format: "spatial" } }),);
const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here" }, body: form,});
const result = await response.json();result.output.elements.forEach((el) => { console.log(`${el.type}: ${el.text || ""}`);});When to use understand mode:
- Complex documents with tables, multicolumn layouts, or nested structures
- Invoice and form processing where accurate data extraction matters
- Documents with formulas, tables, handwriting, or mixed content types
- Any workflow where extraction accuracy is more important than cost
Agentic mode
Agentic mode builds on the understand pipeline and augments it with a vision language model (VLM). The VLM improves results in areas like image descriptions, complex layout analysis, and semantic understanding. It’s designed for the most complex documents that require the deepest visual understanding.
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"agentic","output":{"format":"spatial"}}'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={"Authorization": "Bearer your_api_key_goes_here"}, files={"file": open("document.pdf", "rb")}, data={ "instructions": '{"mode":"agentic","output":{"format":"spatial"}}' },)
result = response.json()for element in result["output"]["elements"]: print(f'{element["type"]}: {element.get("text", "")}')import fs from "node:fs";
const form = new FormData();form.append("file", fs.createReadStream("document.pdf"));form.append( "instructions", JSON.stringify({ mode: "agentic", output: { format: "spatial" } }),);
const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here" }, body: form,});
const result = await response.json();result.output.elements.forEach((el) => { console.log(`${el.type}: ${el.text || ""}`);});When to use agentic mode:
- The most complex documents that need the deepest visual understanding
- Documents where understand mode results need improvement in areas like image descriptions, complex layouts, or semantic classification
- Workflows where VLM-augmented extraction provides better accuracy than standard AI augmentation alone
Limitations:
- Slowest processing mode
- 18 credits per page
Choosing the right mode
The default mode is understand, which handles most documents well. Move to a different mode when you have a specific reason:
- Do you only need Markdown from born-digital documents?
- Yes — Use text mode. It’s the fastest and cheapest option (1 credit per page), but has no OCR.
- No — Continue to step 2.
- Are your documents straightforward (simple layouts, no tables or forms)?
- Yes — Use structure mode. OCR-based extraction at lower cost (1.5 credits per page).
- No — Continue to step 3.
- Do your documents need VLM-augmented extraction?
- No — Stay with understand mode (default). AI augmentation handles most complex documents.
- Yes — Use agentic mode. VLM augmentation on top of the understand pipeline provides the deepest visual understanding for the most complex documents.
For mixed-complexity pipelines, route documents by type: Use text mode for born-digital PDFs, structure mode for scanned documents with simple layouts, understand mode for most complex documents, and agentic mode when VLM-augmented extraction is needed.
Credit costs
| Mode | Cost per page | 10-page document |
|---|---|---|
text | 1 credit | 10 credits |
structure | 1.5 credits | 15 credits |
understand | 9 credits | 90 credits |
agentic | 18 credits | 180 credits |
See pricing for more details and FAQs.