This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /guides/dws-data-extraction/parsing/extract-document-elements.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. Extract document elements

When output.format is set to spatial, the Data Extraction API returns a flat list of typed document elements. Each element includes its type, text content, spatial coordinates, detection confidence, and page reference.

Basic element extraction

Send a document and receive structured spatial elements.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"understand","output":{"format":"spatial"}}'
# Extract data from the JSON response.

Element types

The API returns six element types. All types are available in structure, understand, and agentic modes (text mode does not support spatial output).

TypeDescriptionKey fields
paragraphText content with semantic roletext, role, words
tableStructured table with cell datarowCount, columnCount, cells
formulaMathematical expressionlatex
pictureImage, chart, or diagramclassification, altDescription
keyValueRegionForm fields and key-value pairspairs
handwritingHandwritten text contenttext, words

Common fields

Every element includes these fields:

FieldTypeDescription
idstringUnique identifier (UUID)
typestringElement type (paragraph, table, formula, picture, keyValueRegion, handwriting)
boundsobjectBounding box with x, y, width, height. Origin at top-left. See coordinate spaces.
confidencenumberDetection confidence between 0 and 1
readingOrderintegerPosition in the page-reading sequence
pageobjectSource page with pageIndex (0-based), pageNumber (1-based integer), width, and height

Paragraph elements

Paragraphs cover all text content. The role field identifies the semantic function:

{
"type": "paragraph",
"role": "SectionHeader",
"text": "Revenue Summary",
"confidence": 0.95,
"readingOrder": 0,
"bounds": { "x": 100, "y": 50, "width": 400, "height": 35 },
"page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 }
}

Available roles:

RoleDescription
TextBody paragraphs
TitleDocument title
SectionHeaderSection headings
HeaderRunning page headers
FooterRunning page footers
CaptionFigure or table captions
FootnoteFootnotes
ListItemList items (ordered or unordered)
PageNumberPage number labels
CodeCode blocks
CheckboxSelectedSelected checkbox
CheckboxUnselectedUnselected checkbox

The role is null when the API cannot determine the semantic function.

Table elements

Tables include row and column counts, plus cell-level data with text, bounds, and span information:

{
"type": "table",
"confidence": 0.92,
"readingOrder": 2,
"bounds": { "x": 100, "y": 150, "width": 600, "height": 120 },
"page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
"rowCount": 3,
"columnCount": 3,
"cells": [
{
"id": "c-001",
"bounds": { "x": 100, "y": 150, "width": 200, "height": 30 },
"confidence": 0.94,
"row": 0,
"column": 0,
"rowSpan": 1,
"colSpan": 1,
"text": "Region"
}
],
"captionIds": null,
"footnoteIds": null
}

Each cell includes row, column, rowSpan, and colSpan for reconstructing the table layout. captionIds and footnoteIds reference associated paragraph elements by their id.

Formula elements

Formulas contain a LaTeX representation of the detected mathematical expression:

{
"type": "formula",
"confidence": 0.88,
"readingOrder": 3,
"bounds": { "x": 100, "y": 300, "width": 250, "height": 40 },
"page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
"latex": "r = r_0 e^{kt}"
}

Picture elements

Pictures include classification, confidence, and an AI-generated alt text description:

{
"type": "picture",
"confidence": 0.89,
"readingOrder": 2,
"bounds": { "x": 100, "y": 300, "width": 400, "height": 300 },
"page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
"classification": "chart",
"classificationConfidence": 0.91,
"altDescription": "Bar chart showing quarterly revenue growth across regions",
"captionIds": ["d1e2f3a4-4444-4000-8000-000000000004"],
"footnoteIds": null
}

Key-value region elements

Key-value regions detect form fields and structured label-value pairs:

{
"type": "keyValueRegion",
"confidence": 0.87,
"readingOrder": 4,
"bounds": { "x": 100, "y": 700, "width": 500, "height": 100 },
"page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
"pairs": [
{
"id": "kvp-001",
"key": {
"id": "kve-001",
"bounds": { "x": 100, "y": 700, "width": 150, "height": 25 },
"confidence": 0.92,
"entityType": "QUESTION",
"value": "Invoice Number"
},
"value": {
"id": "kve-002",
"bounds": { "x": 260, "y": 700, "width": 200, "height": 25 },
"confidence": 0.95,
"entityType": "ANSWER",
"value": "INV-2024-0042"
},
"relationshipConfidence": 0.93
}
]
}

Handwriting elements

Handwriting elements contain extracted handwritten text. Like paragraphs, they support optional word-level OCR data via includeWords:

{
"type": "handwriting",
"confidence": 0.78,
"readingOrder": 5,
"bounds": { "x": 30, "y": 320, "width": 200, "height": 30 },
"page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 },
"text": "John Doe",
"words": null
}

When includeWords is true, the words array contains per-word bounds and confidence — the same format as paragraph word-level data.

Word-level data

Set output.includeWords to true to get word-level OCR data nested inside paragraph and table cell elements.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"understand","output":{"format":"spatial","includeWords":true}}'

Each word object includes:

FieldTypeDescription
textstringThe word text
boundsobjectBounding box in document coordinate space
confidencenumberOCR confidence between 0 and 1

Comparing spatial modes

The structure, understand, and agentic modes all return the same element types and output structure. The difference is in extraction depth and cost.

Aspectstructureunderstandagentic
SpeedFastSlowerSlowest
Cost1.5 credits per page9 credits per page18 credits per page
PipelineOCR-based segmentationAI-augmented layout analysisHybrid (AI + VLM) layout analysis
Best forScanned documents, straightforward layoutsComplex layouts, tables, formsThe most complex documents needing VLM

Use structure mode when you need spatial elements and the documents have straightforward layouts. Use understand mode for complex documents with tables, multicolumn layouts, or mixed content types. Use agentic mode for the most complex documents that benefit from VLM-augmented extraction. See processing modes for a full comparison, including text mode.