---
title: "Extract Markdown"
canonical_url: "https://www.nutrient.io/guides/dws-data-extraction/parsing/extract-markdown/"
md_url: "https://www.nutrient.io/guides/dws-data-extraction/parsing/extract-markdown.md"
last_updated: "2026-05-26T22:37:31.557Z"
description: "Convert documents to whole-document Markdown using the Nutrient Data Extraction API. Ideal for RAG pipelines, search indexing, and content migration."
---

# Extract Markdown

When `output.format` is set to `markdown`, the Data Extraction API converts the document into a Markdown string. This is useful for RAG (retrieval-augmented generation) pipelines, search indexing, and content migration workflows where structured text is more practical than spatial element data.

## Basic Markdown extraction

Send a document to the extraction endpoint and request `markdown` output to receive the converted text.

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"text","output":{"format":"markdown"}}'

```

### Python

```python

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": '{"mode":"text","output":{"format":"markdown"}}'
    },
)

result = response.json()
print(result["output"]["markdown"])

```

### JavaScript

```javascript

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({ mode: "text", output: { format: "markdown" } }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
console.log(result.output.markdown);

```

## Response format

The Markdown output is returned in `output.markdown` as a single string:

```json

{
  "status": 200,
  "requestId": "req_a1b2c3d4",
  "output": {
    "markdown": "# Document Title\n\nFirst paragraph of text...\n\n## Section Two\n\nMore content here..."

  },
  "metrics": {
    "processingTimeMs": 312,
    "pagesProcessed": 1
  },
  "usage": {
    "data_extraction_credits": {
      "cost": 1,
      "remainingCredits": 850
    }
  },
  "configuration": {
    "mode": "text",
    "outputFormat": "markdown"
  }
}

```

The Markdown preserves document structure, including headings, paragraphs, lists, tables, and code blocks.

## From a URL

You can also extract Markdown from a document hosted at a public URL:

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://storage.example.com/report.pdf",
    "mode": "text",
    "output": { "format": "markdown" }
  }'

```

### Python

```python

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={
        "Authorization": "Bearer your_api_key_goes_here",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://storage.example.com/report.pdf",
        "mode": "text",
        "output": {"format": "markdown"},
    },
)

result = response.json()
print(result["output"]["markdown"])

```

### JavaScript

```javascript

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: {
    Authorization: "Bearer your_api_key_goes_here",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://storage.example.com/report.pdf",
    mode: "text",
    output: { format: "markdown" },
  }),
});

const result = await response.json();
console.log(result.output.markdown);

```

## When to use Markdown vs. spatial

| Use case                               | Recommended format |
| -------------------------------------- | ------------------ |
| RAG/LLM ingestion                      | Markdown           |
| Search indexing                        | Markdown           |
| Content migration                      | Markdown           |
| Document analysis with spatial data    | Spatial            |
| Table extraction with cell coordinates | Spatial            |
| Form field extraction                  | Spatial            |
| Building document viewers or overlays  | Spatial            |

Markdown and spatial element output are mutually exclusive in a single request. If you need both, send two separate requests.

## Markdown with other modes

The examples above use `text` mode, which is the fastest and cheapest option for Markdown extraction (1 credit per page). All modes support Markdown output — use a higher mode when you need more accurate structure:

| Mode         | Cost per page | Best for                                            |
| ------------ | ------------- | --------------------------------------------------- |
| `text`       | 1 credit      | Born-digital documents with simple layouts          |
| `structure`  | 1.5 credits   | Scanned documents requiring OCR                     |
| `understand` | 9 credits     | Complex tables, formulas, and multicolumn layouts   |
| `agentic`    | 18 credits    | The most complex documents needing VLM augmentation |

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"understand","output":{"format":"markdown"}}'

```

Refer to the [processing modes](https://www.nutrient.io/guides/dws-data-extraction/parsing/processing-modes.md) guide for a full comparison of all modes.
---

## Related pages

- [API returns render-space pixels; display at 850 px wide.](/guides/dws-data-extraction/parsing/coordinate-spaces.md)
- [Multilingual extraction](/guides/dws-data-extraction/parsing/multilingual-extraction.md)
- [Extract document elements](/guides/dws-data-extraction/parsing/extract-document-elements.md)
- [Parse endpoint](/guides/dws-data-extraction/parsing.md)
- [Processing modes](/guides/dws-data-extraction/parsing/processing-modes.md)

