This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /guides/dws-data-extraction/parsing/extract-markdown.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. Extract Markdown

When output.format is set to markdown, the Data Extraction API converts the document into a Markdown string. This is useful for RAG (retrieval-augmented generation) pipelines, search indexing, and content migration workflows where structured text is more practical than spatial element data.

Basic Markdown extraction

Send a document to the extraction endpoint and request markdown output to receive the converted text.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"text","output":{"format":"markdown"}}'

Response format

The Markdown output is returned in output.markdown as a single string:

{
"status": 200,
"requestId": "req_a1b2c3d4",
"output": {
"markdown": "# Document Title\n\nFirst paragraph of text...\n\n## Section Two\n\nMore content here..."
},
"metrics": {
"processingTimeMs": 312,
"pagesProcessed": 1
},
"usage": {
"data_extraction_credits": {
"cost": 1,
"remainingCredits": 850
}
},
"configuration": {
"mode": "text",
"outputFormat": "markdown"
}
}

The Markdown preserves document structure, including headings, paragraphs, lists, tables, and code blocks.

From a URL

You can also extract Markdown from a document hosted at a public URL:

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-H "Content-Type: application/json" \
-d '{
"url": "https://storage.example.com/report.pdf",
"mode": "text",
"output": { "format": "markdown" }
}'

When to use Markdown vs. spatial

Use caseRecommended format
RAG/LLM ingestionMarkdown
Search indexingMarkdown
Content migrationMarkdown
Document analysis with spatial dataSpatial
Table extraction with cell coordinatesSpatial
Form field extractionSpatial
Building document viewers or overlaysSpatial

Markdown and spatial element output are mutually exclusive in a single request. If you need both, send two separate requests.

Markdown with other modes

The examples above use text mode, which is the fastest and cheapest option for Markdown extraction (1 credit per page). All modes support Markdown output — use a higher mode when you need more accurate structure:

ModeCost per pageBest for
text1 creditBorn-digital documents with simple layouts
structure1.5 creditsScanned documents requiring OCR
understand9 creditsComplex tables, formulas, and multicolumn layouts
agentic18 creditsThe most complex documents needing VLM augmentation
Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"understand","output":{"format":"markdown"}}'

Refer to the processing modes guide for a full comparison of all modes.