Extract Markdown
When output.format is set to markdown, the Data Extraction API converts the document into a Markdown string. This is useful for RAG (retrieval-augmented generation) pipelines, search indexing, and content migration workflows where structured text is more practical than spatial element data.
Basic Markdown extraction
Send a document to the extraction endpoint and request markdown output to receive the converted text.
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"text","output":{"format":"markdown"}}'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={"Authorization": "Bearer your_api_key_goes_here"}, files={"file": open("document.pdf", "rb")}, data={ "instructions": '{"mode":"text","output":{"format":"markdown"}}' },)
result = response.json()print(result["output"]["markdown"])import fs from "node:fs";
const form = new FormData();form.append("file", fs.createReadStream("document.pdf"));form.append( "instructions", JSON.stringify({ mode: "text", output: { format: "markdown" } }),);
const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here" }, body: form,});
const result = await response.json();console.log(result.output.markdown);Response format
The Markdown output is returned in output.markdown as a single string:
{ "status": 200, "requestId": "req_a1b2c3d4", "output": { "markdown": "# Document Title\n\nFirst paragraph of text...\n\n## Section Two\n\nMore content here..." }, "metrics": { "processingTimeMs": 312, "pagesProcessed": 1 }, "usage": { "data_extraction_credits": { "cost": 1, "remainingCredits": 850 } }, "configuration": { "mode": "text", "outputFormat": "markdown" }}The Markdown preserves document structure, including headings, paragraphs, lists, tables, and code blocks.
From a URL
You can also extract Markdown from a document hosted at a public URL:
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -H "Content-Type: application/json" \ -d '{ "url": "https://storage.example.com/report.pdf", "mode": "text", "output": { "format": "markdown" } }'import requests
response = requests.post( "https://api.nutrient.io/extraction/parse", headers={ "Authorization": "Bearer your_api_key_goes_here", "Content-Type": "application/json", }, json={ "url": "https://storage.example.com/report.pdf", "mode": "text", "output": {"format": "markdown"}, },)
result = response.json()print(result["output"]["markdown"])const response = await fetch("https://api.nutrient.io/extraction/parse", { method: "POST", headers: { Authorization: "Bearer your_api_key_goes_here", "Content-Type": "application/json", }, body: JSON.stringify({ url: "https://storage.example.com/report.pdf", mode: "text", output: { format: "markdown" }, }),});
const result = await response.json();console.log(result.output.markdown);When to use Markdown vs. spatial
| Use case | Recommended format |
|---|---|
| RAG/LLM ingestion | Markdown |
| Search indexing | Markdown |
| Content migration | Markdown |
| Document analysis with spatial data | Spatial |
| Table extraction with cell coordinates | Spatial |
| Form field extraction | Spatial |
| Building document viewers or overlays | Spatial |
Markdown and spatial element output are mutually exclusive in a single request. If you need both, send two separate requests.
Markdown with other modes
The examples above use text mode, which is the fastest and cheapest option for Markdown extraction (1 credit per page). All modes support Markdown output — use a higher mode when you need more accurate structure:
| Mode | Cost per page | Best for |
|---|---|---|
text | 1 credit | Born-digital documents with simple layouts |
structure | 1.5 credits | Scanned documents requiring OCR |
understand | 9 credits | Complex tables, formulas, and multicolumn layouts |
agentic | 18 credits | The most complex documents needing VLM augmentation |
curl -X POST https://api.nutrient.io/extraction/parse \ -H "Authorization: Bearer your_api_key_goes_here" \ -F "file=@document.pdf" \ -F 'instructions={"mode":"understand","output":{"format":"markdown"}}'Refer to the processing modes guide for a full comparison of all modes.