---
title: "Extract endpoint"
canonical_url: "https://www.nutrient.io/guides/dws-data-extraction/extract/"
md_url: "https://www.nutrient.io/guides/dws-data-extraction/extract.md"
last_updated: "2026-06-11T00:00:00.000Z"
description: "Extract domain-specific JSON data from documents and map it to your own JSON Schema using the /extraction/extract endpoint."
---

# Extract endpoint

The Nutrient DWS Data Extraction API extract endpoint returns domain-specific data from a document as JSON shaped to the schema you provide:

```

POST https://api.nutrient.io/extraction/extract

```

Use the extract endpoint when you need specific values from a document. To return document structure, such as typed spatial elements or whole-document Markdown, refer to the [parse endpoint](https://www.nutrient.io/guides/dws-data-extraction/parsing.md) guide. Provide a schema for fields such as `invoice_number` and `total_amount`, and the response returns those values from the document. You can also include per-field citations that point back to the source. To define the schema, refer to the [define a schema](https://www.nutrient.io/guides/dws-data-extraction/extract/define-a-schema.md) guide. To configure citations, refer to the [citations and confidence](https://www.nutrient.io/guides/dws-data-extraction/extract/citations-and-confidence.md) guide.

## When to use extract vs. parse

Choose the endpoint based on the output your application needs.

| Use case                                                                                      | Endpoint              |
| --------------------------------------------------------------------------------------------- | --------------------- |
| Pull known fields into a typed JSON object, such as invoices or forms                         | `/extraction/extract` |
| Get the full document as typed elements or Markdown                                           | `/extraction/parse`   |
| Map data to a downstream database or API contract                                             | `/extraction/extract` |
| Support retrieval-augmented generation (RAG) ingestion, search indexing, or content migration | `/extraction/parse`   |

If you need both raw structure and specific fields, use the extract endpoint. It runs a parse stage internally. To configure that stage, refer to the [parse configuration](https://www.nutrient.io/guides/dws-data-extraction/extract/parse-configuration.md) guide.

## Request formats

Every extract request must include a `schema`. You can send the document in two ways.

### Multipart form upload

Upload a file with the JSON-serialized extraction instructions:

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/extract \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@invoice.pdf" \
  -F 'instructions={"schema":{"type":"object","properties":{"invoice_number":{"type":"string","description":"Invoice identifier"},"total_amount":{"type":"number","description":"Total amount including tax"}},"required":["invoice_number","total_amount"]}}'

```

### Python

```python

import json

import requests

schema = {
    "type": "object",
    "properties": {
        "invoice_number": {"type": "string", "description": "Invoice identifier"},
        "total_amount": {"type": "number", "description": "Total amount including tax"},
    },
    "required": ["invoice_number", "total_amount"],
}

response = requests.post(
    "https://api.nutrient.io/extraction/extract",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("invoice.pdf", "rb")},
    data={"instructions": json.dumps({"schema": schema})},
)

print(response.json()["output"]["data"])

```

### JavaScript

```javascript

import fs from "node:fs";

const schema = {
  type: "object",
  properties: {
    invoice_number: { type: "string", description: "Invoice identifier" },
    total_amount: { type: "number", description: "Total amount including tax" },
  },
  required: ["invoice_number", "total_amount"],
};

const form = new FormData();
form.append("file", fs.createReadStream("invoice.pdf"));
form.append("instructions", JSON.stringify({ schema }));

const response = await fetch("https://api.nutrient.io/extraction/extract", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

const result = await response.json();
console.log(result.output.data);

```

### JSON body with URL

Process a document hosted at a public URL by sending the schema and a `url` field as JSON:

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/extract \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://storage.example.com/invoice.pdf",
    "schema": {
      "type": "object",
      "properties": {
        "invoice_number": { "type": "string", "description": "Invoice identifier" },
        "total_amount": { "type": "number", "description": "Total amount including tax" }
      },
      "required": ["invoice_number", "total_amount"]
    },
    "parseConfig": { "mode": "understand" }
  }'

```

### Python

```python

import requests

response = requests.post(
    "https://api.nutrient.io/extraction/extract",
    headers={
        "Authorization": "Bearer your_api_key_goes_here",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://storage.example.com/invoice.pdf",
        "schema": {
            "type": "object",
            "properties": {
                "invoice_number": {"type": "string", "description": "Invoice identifier"},
                "total_amount": {"type": "number", "description": "Total amount including tax"},
            },
            "required": ["invoice_number", "total_amount"],
        },
        "parseConfig": {"mode": "understand"},
    },
)

print(response.json()["output"]["data"])

```

### JavaScript

```javascript

const response = await fetch("https://api.nutrient.io/extraction/extract", {
  method: "POST",
  headers: {
    Authorization: "Bearer your_api_key_goes_here",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://storage.example.com/invoice.pdf",
    schema: {
      type: "object",
      properties: {
        invoice_number: { type: "string", description: "Invoice identifier" },
        total_amount: { type: "number", description: "Total amount including tax" },
      },
      required: ["invoice_number", "total_amount"],
    },
    parseConfig: { mode: "understand" },
  }),
});

const result = await response.json();
console.log(result.output.data);

```

## Instructions

The request accepts these instruction fields.

| Field          | Type   | Description                                                                                                                            |
| -------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------- |
| `schema`       | object | **Required** — JSON Schema that describes the data to extract. The root type must be `object`. Refer to the [define a schema](https://www.nutrient.io/guides/dws-data-extraction/extract/define-a-schema.md) guide. |
| `instructions` | string | Optional free-text guidance for the extraction model, up to 10,000 characters.                                                         |
| `parseConfig`  | object | Optional configuration for the parse stage that runs before extraction. Refer to the [parse configuration](https://www.nutrient.io/guides/dws-data-extraction/extract/parse-configuration.md) guide.                    |
| `options`      | object | Extract-specific response options, such as `includeCitations`. Refer to the [citations and confidence](https://www.nutrient.io/guides/dws-data-extraction/extract/citations-and-confidence.md) guide.                        |

## Response structure

A successful response returns extracted `data`, optional per-field `metadata` citations, and `pages`:

```json

{
  "status": 200,
  "requestId": "req_x1y2z3w4",
  "output": {
    "data": {
      "invoice_number": "INV-2024-0042",
      "total_amount": 1547.5
    },
    "metadata": {},
    "pages": [{ "page": 1, "width": 1200, "height": 1697 }]
  },
  "metrics": {
    "processingTimeMs": 4800,
    "pagesProcessed": 1
  },
  "usage": {
    "data_extraction_credits": {
      "cost": 27,
      "remainingCredits": 832
    },
    "price_composition": {
      "parse": { "units": 1, "unit_cost": 9, "cost": 9, "currency": "data_extraction_credits" },
      "extract": { "units": 1, "unit_cost": 18, "cost": 18, "currency": "data_extraction_credits" }
    }
  }
}

```

The response includes these top-level output fields:

- `output.data` — Extracted values shaped to your schema. The API returns only declared properties.

- `output.metadata` — Per-field citation metadata that mirrors the structure of `output.data`. This object is empty when citations are disabled.

- `output.pages` — Page metadata, including the dimensions citation coordinates use.

- `metrics` — Processing time and pages processed.

- `usage` — Total credits consumed, broken down into parse and extract components. For credit details, refer to the [pricing](https://www.nutrient.io/guides/dws-data-extraction/pricing.md) guide.

## Next steps

Use these guides to continue working with the extract endpoint.

- Refer to the [define a schema](https://www.nutrient.io/guides/dws-data-extraction/extract/define-a-schema.md) guide for supported JSON Schema keywords, constraints, and size limits.

- Refer to the [parse configuration](https://www.nutrient.io/guides/dws-data-extraction/extract/parse-configuration.md) guide to control the parse stage with `parseConfig.mode` and language hints.

- Refer to the [citations and confidence](https://www.nutrient.io/guides/dws-data-extraction/extract/citations-and-confidence.md) guide to ground extracted values back to the source document.

- Refer to the [error handling](https://www.nutrient.io/guides/dws-data-extraction/errors.md) guide for status codes and error response formats.
---

## Related pages

- [Define a schema](/guides/dws-data-extraction/extract/define-a-schema.md)
- [Citations and confidence](/guides/dws-data-extraction/extract/citations-and-confidence.md)
- [Parse configuration](/guides/dws-data-extraction/extract/parse-configuration.md)

