The Data Extraction API is an HTTP API hosted at https://api.nutrient.io. It provides endpoint-specific capabilities for extracting structured content from documents.
Base URL
https://api.nutrient.ioAll endpoints are relative to this base URL.
Authentication
Include your API key in the Authorization header with every request:
Authorization: Bearer pdf_live_...API keys are available in the Data Extraction API dashboard(opens in a new tab). Keys starting with pdf_live_ are for production use. Keys starting with pdf_test_ are for testing with limitations.
Available endpoints
| Endpoint | Description |
|---|---|
POST /extraction/parse | Extract structured elements or Markdown from documents. Supports four processing modes (text, structure, understand, agentic) and two output formats (spatial elements, Markdown). |
Further details
- Supported languages — Full list of 100+ OCR languages with ISO codes and aliases.
- Supported file types — PDFs, images, and Office files accepted by the API.
- Error handling — HTTP status codes, error response format, and troubleshooting.