This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /guides/dws-data-extraction.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. DWS Data Extraction API

Extract structured content from documents through a simple HTTP API. Upload a PDF, image, or Office file and receive typed document elements with spatial data, or get a whole-document Markdown representation.

What it does

DWS Data Extraction API helps you to:

  • Extract paragraphs, tables, formulas, pictures, and key-value pairs from documents, with bounding box coordinates and confidence scores
  • Convert documents to structured Markdown for RAG pipelines, search indexing, and content migration
  • Choose between four processing modes: fast text extraction, OCR-based structure extraction, AI-augmented document understanding, and VLM-augmented agentic extraction
  • Process documents in more than 100 languages with multilingual OCR support

DWS Data Extraction API is part of Nutrient Document Web Services (DWS). It focuses on content extraction workflows, while DWS Processor API covers document generation, conversion, and editing actions.


Processing modes

Choose the processing pipeline that fits your use case.

Output formats

The API returns one of two formats, depending on what your downstream system needs.


Essential guides

Start with these guides to set up your first request, explore the API, or review pricing.