---
title: "Parse configuration"
canonical_url: "https://www.nutrient.io/guides/dws-data-extraction/extract/parse-configuration/"
md_url: "https://www.nutrient.io/guides/dws-data-extraction/extract/parse-configuration.md"
last_updated: "2026-06-11T00:00:00.000Z"
description: "Control the parse stage of the Data Extraction API extract endpoint with parseConfig modes, language hints, and free-text instructions."
---

# Parse configuration

The Nutrient DWS Data Extraction API extract endpoint runs in two stages. It first parses the document into structured context, and then extracts your schema’s fields from that context.

Use the `parseConfig` object and the top-level `instructions` string to configure the parse stage.

## Parse modes

`parseConfig.mode` selects the vision pipeline that runs before extraction. The extract endpoint doesn’t support `text` mode because schema extraction requires structured, spatial context. To compare this behavior with parsing, refer to the [parse endpoint](https://www.nutrient.io/guides/dws-data-extraction/parsing.md) guide.

| Mode         | Pipeline                          | When to use                                                                             |
| ------------ | --------------------------------- | --------------------------------------------------------------------------------------- |
| `structure`  | OCR-backed structured extraction  | Clean, simple layouts where processing time and cost matter most.                       |
| `understand` | ICR-backed document understanding | Default. Most documents, including tables, forms, and multicolumn layouts.              |
| `agentic`    | VLM-enhanced analysis             | The most complex documents — degraded scans, cursive handwriting, dense visual content. |

The default mode is `understand`. The parse mode affects extraction quality and the parse component of the request cost. For credit details, refer to the [pricing](https://www.nutrient.io/guides/dws-data-extraction/pricing.md) guide.

```json

{
  "schema": { "type": "object", "properties": { "total": { "type": "number" } } },
  "parseConfig": { "mode": "structure" }
}

```

## Language hints

Set `parseConfig.options.language` to guide OCR for non-English documents. It accepts these values:

- A lowercase language name, such as `"english"` or `"german"`.

- An ISO 639-2 code, such as `"eng"` or `"deu"`.

- For multilingual documents, an array such as `["eng", "spa"]` or a `+`-joined string such as `"eng+spa"`.

The following example configures OCR for English and German:

```json

{
  "schema": { "type": "object", "properties": { "total": { "type": "number" } } },
  "parseConfig": {
    "mode": "understand",
    "options": { "language": ["eng", "deu"] }
  }
}

```

For the full list of codes and aliases, refer to the [supported languages](https://www.nutrient.io/guides/dws-data-extraction/supported-languages.md) guide.

## Free-text instructions

The top-level `instructions` string gives the extraction model document-wide guidance that doesn’t belong on a single schema field. It accepts up to 10,000 characters:

```json

{
  "schema": {
    "type": "object",
    "properties": {
      "line_items": {
        "type": "array",
        "items": { "type": "object", "properties": { "description": { "type": "string" } } }
      }
    }
  },
  "instructions": "Extract all line items exactly as they appear in the invoice table. Treat shipping and handling as separate line items.",
  "parseConfig": { "mode": "understand" }
}

```

Use `instructions` for cross-field rules and disambiguation. Use a field’s `description` for per-field guidance. For field-level descriptions, refer to the [define a schema](https://www.nutrient.io/guides/dws-data-extraction/extract/define-a-schema.md) guide.

## Choose a parse mode

Start with the default `understand` mode. Move to a different mode only when your documents or cost requirements call for it.

- **Drop to `structure`** when documents have clean, predictable layouts and you need lower cost or shorter processing time.

- **Move to `agentic`** when `understand` produces visible gaps, such as degraded scans, cursive or freeform handwriting, or dense visual content that requires visual reasoning.

Test a representative sample at each mode before you commit a pipeline to a mode. Compare the extracted `data` and citation confidence. To configure citations, refer to the [citations and confidence](https://www.nutrient.io/guides/dws-data-extraction/extract/citations-and-confidence.md) guide.

## Next steps

Use these guides to continue configuring extraction:

- Refer to the [define a schema](https://www.nutrient.io/guides/dws-data-extraction/extract/define-a-schema.md) guide for field types, supported keywords, and limits.

- Refer to the [citations and confidence](https://www.nutrient.io/guides/dws-data-extraction/extract/citations-and-confidence.md) guide for per-field grounding and confidence signals.

- Refer to the [processing modes](https://www.nutrient.io/guides/dws-data-extraction/parsing/processing-modes.md) guide for a detailed comparison of parse-stage modes.
---

## Related pages

- [Extract endpoint](/guides/dws-data-extraction/extract.md)
- [Define a schema](/guides/dws-data-extraction/extract/define-a-schema.md)
- [Citations and confidence](/guides/dws-data-extraction/extract/citations-and-confidence.md)

