Extract Markdown

When output.format is set to markdown, the Data Extraction API converts the document into a Markdown string. This is useful for RAG (retrieval-augmented generation) pipelines, search indexing, and content migration workflows where structured text is more practical than spatial element data.

Basic Markdown extraction

Send a document to the extraction endpoint and request markdown output to receive the converted text.

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"text","output":{"format":"markdown"}}'

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"text","output":{"format":"markdown"}}'
    },
)

result = response.json()
print(result["output"]["markdown"])

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "text", output: { format: "markdown" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
console.log(result.output.markdown);

Response format

The Markdown output is returned in output.markdown as a single string:

{
  "status": 200,
  "requestId": "req_a1b2c3d4",
  "output": {
    "markdown": "# Document Title\n\nFirst paragraph of text...\n\n## Section Two\n\nMore content here..."
  },
  "metrics": {
    "processingTimeMs": 312,
    "pagesProcessed": 1
  },
  "usage": {
    "data_extraction_credits": {
      "cost": 1,
      "remainingCredits": 850
    }
  },
  "configuration": {
    "mode": "text",
    "outputFormat": "markdown"
  }
}

The Markdown preserves document structure, including headings, paragraphs, lists, tables, and code blocks.

From a URL

You can also extract Markdown from a document hosted at a public URL:

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://storage.example.com/report.pdf",
    "mode": "text",
    "output": { "format": "markdown" }
  }'

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={
        "Authorization": "Bearer your_api_key_goes_here",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://storage.example.com/report.pdf",
        "mode": "text",
        "output": {"format": "markdown"},
    },
)

result = response.json()
print(result["output"]["markdown"])

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: {
    Authorization: "Bearer your_api_key_goes_here",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://storage.example.com/report.pdf",
    mode: "text",
    output: { format: "markdown" },
  }),
});

const result = await response.json();
console.log(result.output.markdown);

When to use Markdown vs. spatial

Use case	Recommended format
RAG/LLM ingestion	Markdown
Search indexing	Markdown
Content migration	Markdown
Document analysis with spatial data	Spatial
Table extraction with cell coordinates	Spatial
Form field extraction	Spatial
Building document viewers or overlays	Spatial

Markdown and spatial element output are mutually exclusive in a single request. If you need both, send two separate requests.

Markdown with other modes

The examples above use text mode, which is the fastest and cheapest option for Markdown extraction (1 credit per page). All modes support Markdown output — use a higher mode when you need more accurate structure:

Mode	Cost per page	Best for
`text`	1 credit	Born-digital documents with simple layouts
`structure`	1.5 credits	Scanned documents requiring OCR
`understand`	9 credits	Complex tables, formulas, and multicolumn layouts
`agentic`	18 credits	The most complex documents needing VLM augmentation

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"understand","output":{"format":"markdown"}}'

Refer to the processing modes guide for a full comparison of all modes.

Extract Markdown

Basic Markdown extraction

Response format

From a URL

When to use Markdown vs. spatial

Markdown with other modes

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.