This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /guides/dws-data-extraction/api-overview.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. API overview

Nutrient hosts the DWS Data Extraction API at https://api.nutrient.io. Use this HTTP API to extract structured content and domain-specific data from documents.

Base URL

Use this base URL for all Data Extraction API endpoints:

https://api.nutrient.io

All endpoints are relative to this base URL.

Authentication

Include your API key in the Authorization header with every request:

Authorization: Bearer pdf_live_...

Get API keys from the Data Extraction API dashboard(opens in a new tab). Use keys that start with pdf_live_ for production. Use keys that start with pdf_test_ for testing with limitations.

Available endpoints

The API provides the endpoints below for parsing documents and extracting schema-shaped data.

EndpointDescription
POST /extraction/parseExtracts structured elements or Markdown from documents. Supports four processing modes: text, structure, understand, and agentic. Supports spatial elements and Markdown output.
POST /extraction/extractExtracts domain-specific JSON data from documents and maps it to your JSON Schema, with optional per-field citations.

Further details

Use these guides to continue configuring the Data Extraction API:

  • Refer to the supported languages guide for the full list of 100+ optical character recognition (OCR) languages with ISO codes and aliases.
  • Refer to the supported file types guide for PDFs, images, and Office files accepted by the API.
  • Refer to the error handling guide for HTTP status codes, error response formats, and troubleshooting.