---
title: "Multilingual extraction"
canonical_url: "https://www.nutrient.io/guides/dws-data-extraction/parsing/multilingual-extraction/"
md_url: "https://www.nutrient.io/guides/dws-data-extraction/parsing/multilingual-extraction.md"
last_updated: "2026-05-26T22:37:31.557Z"
description: "Extract text from documents in more than 100 languages using the Nutrient Data Extraction API. Configure OCR language hints for better accuracy."
---

# Multilingual extraction

The Data Extraction API supports more than 100 languages for OCR. By default, the API uses English (`eng`). You can specify one or more languages using the `options.language` parameter to improve extraction accuracy for non-English documents.

The `language` option only applies to `structure`, `understand`, and `agentic` modes, which run OCR. It has no effect in `text` mode.

## Specifying a language

Set `options.language` in the instructions to tell the OCR engine which language to expect.

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@document.pdf" \
  -F 'instructions={"mode":"understand","output":{"format":"spatial"},"options":{"language":"german"}}'

```

### Python

```python

import requests
import json

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("document.pdf", "rb")},
    data={
        "instructions": json.dumps({
            "mode": "understand",
            "output": {"format": "spatial"},
            "options": {"language": "german"},
        })
    },
)

print(response.json())

```

### JavaScript

```javascript

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("document.pdf"));
form.append(
  "instructions",
  JSON.stringify({
    mode: "understand",
    output: { format: "spatial" },
    options: { language: "german" },
  }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

console.log(await response.json());

```

## Language format

You can specify languages in three ways:

| Format                | Example                   | Description                   |
| --------------------- | ------------------------- | ----------------------------- |
| Full name (lowercase) | `"english"`, `"german"`   | Common languages only         |
| Language code         | `"eng"`, `"deu"`          | All languages                 |
| Code with variant     | `"chi_sim"`, `"deu_frak"` | Script or historical variants |

The API normalizes full language names to language codes internally.

## Multilanguage documents

For documents that contain text in multiple languages, specify all relevant languages as an array or a `+`-joined string.

### Array syntax

### curl

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@multilingual.pdf" \
  -F 'instructions={"mode":"understand","output":{"format":"spatial"},"options":{"language":["eng","spa","fra"]}}'

```

### Python

```python

import requests
import json

response = requests.post(
    "https://api.nutrient.io/extraction/parse",
    headers={"Authorization": "Bearer your_api_key_goes_here"},
    files={"file": open("multilingual.pdf", "rb")},
    data={
        "instructions": json.dumps({
            "mode": "understand",
            "output": {"format": "spatial"},
            "options": {"language": ["eng", "spa", "fra"]},
        })
    },
)

print(response.json())

```

### JavaScript

```javascript

import fs from "node:fs";

const form = new FormData();
form.append("file", fs.createReadStream("multilingual.pdf"));
form.append(
  "instructions",
  JSON.stringify({
    mode: "understand",
    output: { format: "spatial" },
    options: { language: ["eng", "spa", "fra"] },
  }),
);

const response = await fetch("https://api.nutrient.io/extraction/parse", {
  method: "POST",
  headers: { Authorization: "Bearer your_api_key_goes_here" },
  body: form,
});

console.log(await response.json());

```

### Plus-joined string syntax

You can also use a `+`-joined string instead of an array:

```shell

curl -X POST https://api.nutrient.io/extraction/parse \
  -H "Authorization: Bearer your_api_key_goes_here" \
  -F "file=@multilingual.pdf" \
  -F 'instructions={"mode":"understand","output":{"format":"spatial"},"options":{"language":"eng+spa+fra"}}'

```

Both formats are equivalent. The API accepts either one.

## Tips for better accuracy

- Always specify the document language when it isn’t English. This helps the OCR engine load the correct character models and dictionaries.

- For multilanguage documents, list all languages present. The OCR engine handles language switching within the document.

- Use language codes when working with languages that don’t have a full-name alias.

- For Chinese, Japanese, and Korean, use the specific variants (`chi_sim`, `chi_tra`, `jpn`, `kor`) to select the correct character set.

## Supported languages

The Data Extraction API supports more than 100 OCR languages. See the [supported languages](https://www.nutrient.io/guides/dws-data-extraction/supported-languages.md) reference for the full list of language codes and aliases.
---

## Related pages

- [API returns render-space pixels; display at 850 px wide.](/guides/dws-data-extraction/parsing/coordinate-spaces.md)
- [Extract document elements](/guides/dws-data-extraction/parsing/extract-document-elements.md)
- [Extract Markdown](/guides/dws-data-extraction/parsing/extract-markdown.md)
- [Parse endpoint](/guides/dws-data-extraction/parsing.md)
- [Processing modes](/guides/dws-data-extraction/parsing/processing-modes.md)