This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /guides/dws-data-extraction/parsing/processing-modes.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. Processing modes

The Data Extraction API offers four processing modes that trade off cost, speed, and extraction depth. Every request uses exactly one mode, set via the mode parameter in the instructions.

Mode comparison

textstructureunderstandagentic
Cost per page1 credit1.5 credits9 credits18 credits
SpeedFastestFastSlowerSlowest
Output formatsMarkdown onlySpatial, MarkdownSpatial, MarkdownSpatial, Markdown
OCRNoYesYesYes
AI augmentationNoNoYesHybrid (AI + VLM)
Layout analysisNoBasic segmentationFull AI-augmentedHybrid (AI + VLM)
Word-level dataYes (spatial only)Yes (spatial only)Yes (spatial only)

Text mode

Text mode extracts Markdown from digital-born documents. It doesn’t run OCR or AI augmentation, making it the fastest and cheapest option.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"text"}'

When to use text mode:

  • RAG ingestion and search indexing where you need clean Markdown from digital-born documents
  • High-throughput pipelines where cost and speed matter more than spatial data

Limitations:

  • Only supports markdown output format
  • No OCR — text in scanned documents or images won’t be returned

Structure mode

Structure mode runs OCR-based segmentation to extract typed document elements with bounding boxes and confidence scores. It handles scanned documents, images, and any file requiring optical character recognition.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"structure","output":{"format":"spatial"}}'

When to use structure mode:

  • Scanned documents and images that require OCR
  • Workflows that need spatial data (bounding boxes, coordinates) at lower cost than understand mode
  • Documents with straightforward layouts where AI augmentation isn’t necessary

Understand mode

Understand mode runs the full extraction pipeline with AI augmentation on top of OCR. It produces the most accurate results for complex documents with tables, multicolumn layouts, nested structures, formulas, and form fields.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"understand","output":{"format":"spatial"}}'

When to use understand mode:

  • Complex documents with tables, multicolumn layouts, or nested structures
  • Invoice and form processing where accurate data extraction matters
  • Documents with formulas, tables, handwriting, or mixed content types
  • Any workflow where extraction accuracy is more important than cost

Agentic mode

Agentic mode builds on the understand pipeline and augments it with a vision language model (VLM). The VLM improves results in areas like image descriptions, complex layout analysis, and semantic understanding. It’s designed for the most complex documents that require the deepest visual understanding.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"agentic","output":{"format":"spatial"}}'

When to use agentic mode:

  • The most complex documents that need the deepest visual understanding
  • Documents where understand mode results need improvement in areas like image descriptions, complex layouts, or semantic classification
  • Workflows where VLM-augmented extraction provides better accuracy than standard AI augmentation alone

Limitations:

  • Slowest processing mode
  • 18 credits per page

Choosing the right mode

The default mode is understand, which handles most documents well. Move to a different mode when you have a specific reason:

  1. Do you only need Markdown from born-digital documents?
    • Yes — Use text mode. It’s the fastest and cheapest option (1 credit per page), but has no OCR.
    • No — Continue to step 2.
  2. Are your documents straightforward (simple layouts, no tables or forms)?
    • Yes — Use structure mode. OCR-based extraction at lower cost (1.5 credits per page).
    • No — Continue to step 3.
  3. Do your documents need VLM-augmented extraction?
    • No — Stay with understand mode (default). AI augmentation handles most complex documents.
    • Yes — Use agentic mode. VLM augmentation on top of the understand pipeline provides the deepest visual understanding for the most complex documents.

For mixed-complexity pipelines, route documents by type: Use text mode for born-digital PDFs, structure mode for scanned documents with simple layouts, understand mode for most complex documents, and agentic mode when VLM-augmented extraction is needed.

Credit costs

ModeCost per page10-page document
text1 credit10 credits
structure1.5 credits15 credits
understand9 credits90 credits
agentic18 credits180 credits

See pricing for more details and FAQs.