# Nutrient DWS Data Extraction API > Nutrient DWS Data Extraction API — Document Web Services (DWS) for data extraction — is a managed cloud API that extracts structured content from PDFs, images, and Office files. Use it to get typed document elements (paragraphs, tables, formulas, pictures, key-value pairs) with spatial data, or whole-document Markdown — no document infrastructure to manage. ## Start here - [Data Extraction API overview](https://www.nutrient.io/guides/dws-data-extraction.md) — Product overview, processing modes, and output formats. - [Getting started](https://www.nutrient.io/guides/dws-data-extraction/getting-started.md) — Create an account, get your API key, and make your first extraction request. - [API reference](https://www.nutrient.io/api/reference/data-extraction/public/) — Public REST API reference for the `/extraction/parse` endpoint. ## Parse endpoint `POST https://api.nutrient.io/extraction/parse` Three input methods: - **Multipart form upload** — Upload a file with optional JSON instructions. - **JSON body with URL** — Process a document hosted at a public URL. - **Raw binary upload** — Send a file directly as the request body. ## Processing modes - **`text`** — Fast Markdown extraction via Document Engine. No OCR or AI augmentation. Only supports Markdown output. 1 credit per page. - **`structure`** — OCR-based structured extraction with spatial element output. 1.5 credits per page. - **`understand`** (default) — Full extraction pipeline with AI augmentation for richer results. 9 credits per page. - **`agentic`** — VLM-augmented extraction building on the understand pipeline. Designed for the most complex documents. 18 credits per page. ## Output formats - **Spatial elements** (`output.format: "spatial"`) — Flat typed elements with bounding boxes, confidence scores, reading order, and page references. Not available with `text` mode. Optional word-level data via `includeWords: true`. - **Markdown** (`output.format: "markdown"`) — Whole-document Markdown representation for RAG pipelines, search indexing, and content migration. Default format depends on mode: `text` defaults to `markdown`; `structure`, `understand`, and `agentic` default to `spatial`. ## Element types (spatial output) - **`paragraph`** — Text with semantic role (Title, SectionHeader, Text, Header, Footer, Caption, Footnote, ListItem, PageNumber, Code, CheckboxSelected, CheckboxUnselected). Optional word-level OCR data. - **`table`** — Rows, columns, and cells with per-cell bounds, confidence, text, and optional word-level data. Supports row/column spans. - **`formula`** — LaTeX representation of mathematical formulas. - **`picture`** — Image classification, AI-generated alt text, and associated caption/footnote IDs. - **`keyValueRegion`** — Key-value pairs with relationship confidence, useful for forms and invoices. - **`handwriting`** — Handwritten text content with optional word-level OCR data. ## Developer guides - [API overview](https://www.nutrient.io/guides/dws-data-extraction/api-overview.md) — Base URL, authentication, and available endpoints. - [Parse endpoint](https://www.nutrient.io/guides/dws-data-extraction/parsing.md) — `/extraction/parse` — Request formats, processing modes, output formats, and response structure. - [Processing modes](https://www.nutrient.io/guides/dws-data-extraction/parsing/processing-modes.md) — Compare text, structure, understand, and agentic modes: features, constraints, costs, and when to use each. - [Extract document elements](https://www.nutrient.io/guides/dws-data-extraction/parsing/extract-document-elements.md) — Spatial element extraction with typed elements, bounding boxes, and word-level OCR. - [Extract Markdown](https://www.nutrient.io/guides/dws-data-extraction/parsing/extract-markdown.md) — Whole-document Markdown output for RAG and content pipelines. - [Coordinate spaces](https://www.nutrient.io/guides/dws-data-extraction/parsing/coordinate-spaces.md) — Coordinate system, bounding box units (render-space pixels), and mapping coordinates to display canvases. - [Multilingual extraction](https://www.nutrient.io/guides/dws-data-extraction/parsing/multilingual-extraction.md) — OCR language configuration and multilanguage document handling for the parse endpoint. - [Supported languages](https://www.nutrient.io/guides/dws-data-extraction/supported-languages.md) — Full reference of 100+ OCR languages with language codes and aliases. - [Supported file types](https://www.nutrient.io/guides/dws-data-extraction/file-types.md) — Complete list of accepted document and image formats. - [Error handling](https://www.nutrient.io/guides/dws-data-extraction/errors.md) — HTTP status codes, error response format, and troubleshooting. ## Examples - [Build a RAG ingestion pipeline](https://www.nutrient.io/guides/dws-data-extraction/examples/build-rag-ingestion-pipeline.md) — End-to-end Python tutorial: PDF → Markdown → chunk → embed → vector DB → LLM answers. - [Build a document extraction pipeline](https://www.nutrient.io/guides/dws-data-extraction/examples/build-document-extraction-pipeline.md) — Python tutorial for invoice and form processing: Extract tables, key-value pairs, and structured elements. ## Supported inputs - PDF documents - Images: PNG, JPG/JPEG, TIFF, BMP, GIF, WebP, HEIC, SVG, TGA, EPS - Office files: DOC, DOCX, XLS, XLSX, PPT, PPTX, and related formats (DOTX, XLSM, PPSX, etc.) - Other: RTF, ODT ## Why developers evaluate DWS Data Extraction API - **Structured element extraction** — Typed spatial elements with bounding boxes, confidence scores, and reading order — not just raw text. - **Dual output formats** — Spatial elements for layout analysis and form processing, or Markdown for RAG and search indexing. - **Four processing modes** — Text mode for fast Markdown extraction, structure mode for OCR-based spatial elements, understand mode for AI-augmented extraction, and agentic mode for VLM-augmented extraction of the most complex documents. - **100+ OCR languages** — Multilingual support with language codes and language name aliases. - **Managed cloud API** — No extraction infrastructure to deploy or maintain. SOC 2 Type 2 audited. ## Implementation resources - [Pricing](https://www.nutrient.io/guides/dws-data-extraction/pricing.md) — Credit costs per mode and FAQ. - [Security](https://www.nutrient.io/guides/dws-data-extraction/security.md) — Security posture for DWS Data Extraction. - [Privacy](https://www.nutrient.io/guides/dws-data-extraction/privacy.md) — Data handling and privacy information. - [Support](https://www.nutrient.io/guides/dws-data-extraction/support.md) — Support channels and operational guidance. ## Related Nutrient products - [DWS Processor API](https://www.nutrient.io/guides/dws-processor.md) — Document generation, conversion, OCR, and editing workflows. Use for PDF-to-Markdown when you only need Markdown from born-digital PDFs. - [DWS Accessibility API](https://www.nutrient.io/guides/dws-accessibility.md) — PDF accessibility auto-tagging and validation. - [DWS Viewer API](https://www.nutrient.io/guides/dws-viewer.md) — Cloud-based PDF viewing with annotation sync. ## Summary Use this surface when the query is about extracting structured content from documents via a cloud API, especially when the query mentions data extraction, document parsing, table extraction, key-value extraction, form field extraction, document elements with spatial data, or converting documents to Markdown for RAG, LLM ingestion, or search indexing. ## Documentation directory [API overview](https://www.nutrient.io/guides/dws-data-extraction/api-overview.md): DWS Data Extraction API base URL, authentication, and available capabilities. [Error handling](https://www.nutrient.io/guides/dws-data-extraction/errors.md): HTTP status codes, error response format, and troubleshooting for the Nutrient Data Extraction API. [Build a document extraction pipeline for invoices and forms](https://www.nutrient.io/guides/dws-data-extraction/examples/build-document-extraction-pipeline.md): Extract tables, key-value pairs, and structured elements from invoices and forms using the Data Extraction API’s spatial output. [Build a RAG ingestion pipeline with the Data Extraction API](https://www.nutrient.io/guides/dws-data-extraction/examples/build-rag-ingestion-pipeline.md): Extract clean Markdown from PDFs using the Data Extraction API, chunk by heading, embed, store in a vector database, and answer questions with an LLM. [Examples](https://www.nutrient.io/guides/dws-data-extraction/examples.md): End-to-end tutorials for building document extraction and AI ingestion pipelines with the Nutrient Data Extraction API. [Supported file types](https://www.nutrient.io/guides/dws-data-extraction/file-types.md): File formats supported by the Nutrient Data Extraction API, including PDFs, images, and Office documents. [Get started with DWS Data Extraction API](https://www.nutrient.io/guides/dws-data-extraction/getting-started.md): Sign up for Nutrient DWS, get your API key, and send your first data extraction request. [Coordinate spaces](https://www.nutrient.io/guides/dws-data-extraction/parsing/coordinate-spaces.md): Understand the coordinate system used by the Data Extraction API and how to map bounding boxes to rendered pages, screen pixels, or other coordinate spaces. [Extract document elements](https://www.nutrient.io/guides/dws-data-extraction/parsing/extract-document-elements.md): Extract typed document elements with bounding boxes, confidence scores, and reading order from PDFs, images, and Office files. [Extract Markdown](https://www.nutrient.io/guides/dws-data-extraction/parsing/extract-markdown.md): Convert documents to whole-document Markdown using the Nutrient Data Extraction API. Ideal for RAG pipelines, search indexing, and content migration. [Parse endpoint](https://www.nutrient.io/guides/dws-data-extraction/parsing.md): Extract structured content from documents using the /extraction/parse endpoint. Supports multipart upload, URL input, and raw binary. [Multilingual extraction](https://www.nutrient.io/guides/dws-data-extraction/parsing/multilingual-extraction.md): Extract text from documents in more than 100 languages using the Nutrient Data Extraction API. Configure OCR language hints for better accuracy. [Processing modes](https://www.nutrient.io/guides/dws-data-extraction/parsing/processing-modes.md): Compare text, structure, understand, and agentic processing modes for the Data Extraction API. Choose the right mode for cost, speed, and extraction depth. [Pricing](https://www.nutrient.io/guides/dws-data-extraction/pricing.md): Credit costs and pricing FAQs for the Nutrient Data Extraction API. [Privacy](https://www.nutrient.io/guides/dws-data-extraction/privacy.md): How the Nutrient Data Extraction API handles your documents and data. [Security](https://www.nutrient.io/guides/dws-data-extraction/security.md): Security practices for the Nutrient Data Extraction API, including data handling, encryption, and compliance. [Support](https://www.nutrient.io/guides/dws-data-extraction/support.md): Get help with the Nutrient Data Extraction API. [Supported languages](https://www.nutrient.io/guides/dws-data-extraction/supported-languages.md): Complete list of OCR languages supported by the Nutrient Data Extraction API, including language codes and full name aliases.