This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /guides/dws-data-extraction/parsing/processing-modes.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. Processing modes

The Data Extraction API offers four processing modes that trade off cost, speed, and extraction depth. Every request uses exactly one mode, set via the mode parameter in the instructions.

Mode comparison

textstructureunderstandagentic
Cost per page1 credit1.5 credits9 credits18 credits
SpeedFastestFastSlowerSlowest
Output formatsMarkdown onlySpatial, MarkdownSpatial, MarkdownSpatial, Markdown
OCRNoYesYesYes
AI augmentationNoNoYesHybrid (AI + VLM)
Layout analysisNoBasic segmentationFull AI-augmentedHybrid (AI + VLM)
Word-level dataYes (spatial only)Yes (spatial only)Yes (spatial only)

Text mode

Text mode extracts Markdown from born-digital documents. It doesn’t run optical character recognition (OCR) or AI augmentation, making it the fastest and cheapest option.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"text"}'

When to use text mode:

  • Retrieval-augmented generation (RAG) ingestion and search indexing where you need clean Markdown from born-digital documents
  • High-throughput pipelines where cost and speed matter more than spatial data

Limitations:

  • Only supports markdown output format
  • No OCR — text in scanned documents or images won’t be returned

Structure mode

Structure mode runs OCR-based segmentation to extract typed document elements with bounding boxes and confidence scores. It handles scanned documents, images, and any file requiring optical character recognition.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"structure","output":{"format":"spatial"}}'

When to use structure mode:

  • Scanned documents and images that require OCR
  • Workflows that need spatial data (bounding boxes, coordinates) at lower cost than understand mode
  • Documents with straightforward layouts where AI augmentation isn’t necessary

Understand mode

Understand mode runs the full extraction pipeline with AI augmentation on top of OCR. It produces the most accurate results for complex documents with tables, multicolumn layouts, nested structures, formulas, and form fields.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"understand","output":{"format":"spatial"}}'

When to use understand mode:

  • Complex documents with tables, multicolumn layouts, or nested structures
  • Invoice and form processing where accurate data extraction matters
  • Documents with formulas, tables, printed-style handwriting, or mixed content types
  • Any workflow where extraction accuracy is more important than cost

Agentic mode

Agentic mode builds on the understand pipeline and augments it with a vision language model (VLM). The VLM improves results in areas like image descriptions, complex layout analysis, and semantic understanding. It’s designed for the most complex documents that require the deepest visual understanding.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"agentic","output":{"format":"spatial"}}'

When to use agentic mode:

  • Documents with embedded images, charts, or diagrams where you need generated descriptions, not just classification
  • Degraded scans, faxes, low-quality images, or documents where understand mode produces visible gaps in extracted text
  • Cursive, connected, or freeform handwriting — along with dense or annotated handwriting (forms with handwritten markup, filled-in government forms) — where understand mode’s character-level handwriting recognition isn’t sufficient
  • Workflows where you’ve tested understand mode on a representative sample and the quality isn’t sufficient for your use case

Limitations:

  • Slowest processing mode
  • 18 credits per page

Choosing the right mode

The default mode is understand, which handles most documents well. Move to a different mode when you have a specific reason:

  1. Do you only need Markdown from born-digital documents?
    • Yes — Use text mode. It’s the fastest and cheapest option (1 credit per page), but has no OCR.
    • No — Continue to step 2.
  2. Are your documents straightforward (simple layouts, no tables or forms)?
    • Yes — Use structure mode. OCR-based extraction at lower cost (1.5 credits per page).
    • No — Continue to step 3.
  3. Do your documents need image descriptions or contain cursive handwriting, or has understand mode produced insufficient quality on a representative sample?
    • No — Stay with understand mode (default). It already handles tables, forms, formulas, printed-style handwriting, and multicolumn layouts without VLM.
    • Yes — Use agentic mode. VLM augmentation on top of the understand pipeline adds descriptions for embedded images and a quality lift on degraded scans, cursive or freeform handwriting, and other hard-to-read content where understand mode falls short.

For mixed-complexity pipelines, route documents by type: Use text mode for born-digital PDFs, structure mode for scanned documents with simple layouts, understand mode for most complex documents, and agentic mode when VLM-augmented extraction is needed.

Handwriting

Handwriting recognition depends on both the writing style and the quality of the input image.

Match the mode to the writing style:

  • Printed-style handwriting — Clearly separated letters and short entries such as names, dates, and filled-in form fields. Understand mode handles this well with character-level OCR.
  • Cursive, connected, or freeform handwriting — Use agentic mode. Understand mode reads handwriting one character at a time, so connected or stylized writing produces frequent errors. The VLM in agentic mode interprets whole words and lines and is substantially more reliable for these documents.

Even in agentic mode, recognition of an ambiguous word can be confident but wrong — the model may settle on a plausible word that differs from what was actually written. For high-stakes fields, check the per-element confidence scores and add a human review step where accuracy is critical.

Credit costs

ModeCost per page10-page document
text1 credit10 credits
structure1.5 credits15 credits
understand9 credits90 credits
agentic18 credits180 credits

See pricing for more details and FAQs.