Extract structured content from documents through a simple HTTP API. Upload a PDF, image, or Office file and receive typed document elements with spatial data, or get a whole-document Markdown representation.
What it does
DWS Data Extraction API helps you to:
- Extract paragraphs, tables, formulas, pictures, and key-value pairs from documents, with bounding box coordinates and confidence scores
- Convert documents to structured Markdown for RAG pipelines, search indexing, and content migration
- Choose between four processing modes: fast text extraction, OCR-based structure extraction, AI-augmented document understanding, and VLM-augmented agentic extraction
- Process documents in more than 100 languages with multilingual OCR support
DWS Data Extraction API is part of Nutrient Document Web Services (DWS). It focuses on content extraction workflows, while DWS Processor API covers document generation, conversion, and editing actions.
Processing modes
Choose the processing pipeline that fits your use case.
Fast Markdown extraction from born-digital documents. No OCR or AI. 1 credit per page.
OCR-based extraction with typed spatial elements and bounding boxes. 1.5 credits per page.
Full AI-augmented pipeline with layout analysis, table detection, and semantic classification. 9 credits per page.
VLM-augmented extraction building on understand mode. The deepest visual understanding of document content. 18 credits per page.
Output formats
The API returns one of two formats, depending on what your downstream system needs.
Typed document elements (paragraphs, tables, formulas, pictures, key-value pairs) with bounding boxes, confidence scores, and reading order.
Whole-document Markdown representation. Ideal for RAG, search indexing, and content pipelines.
Essential guides
Start with these guides to set up your first request, explore the API, or review pricing.