This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /guides/dws-data-extraction/getting-started.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. Get started with DWS Data Extraction API

If you already have an API key, run this command to extract content from a PDF:

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf"

This sends a document to the Data Extraction API using the default settings (understand mode with spatial element output) and returns structured elements with bounding boxes and confidence scores.

Step-by-step setup

Follow these steps to create an account, get an API key, and run your first request.

1. Sign up

Go to the Nutrient dashboard(opens in a new tab) and create an account. If you already have a Nutrient DWS account, skip to step 2.

2. Get your API key

Navigate to the Data Extraction API keys page(opens in a new tab) in the dashboard. Copy your live API key — it starts with pdf_live_.

3. Send your first request

Use the API key to extract content from a document. The examples below upload a local PDF and return structured spatial elements.

Terminal window
curl -X POST https://api.nutrient.io/extraction/parse \
-H "Authorization: Bearer your_api_key_goes_here" \
-F "file=@document.pdf" \
-F 'instructions={"mode":"understand","output":{"format":"spatial"}}'

4. Review the response

The API returns a JSON response with extracted document elements:

{
"status": 200,
"requestId": "req_e5f6g7h8",
"output": {
"elements": [
{
"id": "a1b2c3d4-1111-4000-8000-000000000001",
"type": "paragraph",
"role": "Title",
"text": "Quarterly Report",
"confidence": 0.95,
"readingOrder": 0,
"bounds": { "x": 100, "y": 50, "width": 400, "height": 35 },
"page": { "pageIndex": 0, "pageNumber": 1, "width": 1818, "height": 2422 }
}
]
},
"metrics": {
"processingTimeMs": 4200,
"pagesProcessed": 1
},
"configuration": {
"mode": "understand",
"outputFormat": "spatial"
}
}

Each element includes its type, text content, spatial coordinates (bounds), detection confidence, and page reference. Refer to extract document elements for the full element schema.

Next steps