OCR server supported languages
The Nutrient OCR component supports a wide range of languages, enabling precise text recognition based on linguistic characteristics such as ligatures, punctuation rules, and symbol variations. To ensure accurate text extraction, specify the language of the document during OCR configuration.
Languages aren’t region-specific. For example, English applies to both American English and British English.
If your required language isn’t listed, contact Support for assistance.
Supported languages
| Description | Language code | Full language name |
|---|---|---|
| Afrikaans | afr | |
| Albanian | sqi | |
| Amharic | amh | |
| Arabic | ara | |
| Armenian | hye | |
| Assamese | asm | |
| Azerbaijani | aze | |
| Azerbaijani - Cyrillic | aze_cyrl | |
| Basque | eus | |
| Belarusian | bel | |
| Bengali | ben | |
| Bosnian | bos | |
| Breton | bre | |
| Bulgarian | bul | |
| Burmese | mya | |
| Catalan; Valencian | cat | |
| Cebuano | ceb | |
| Central Khmer | khm | |
| Cherokee | chr | |
| Chinese - Simplified | chi_sim | |
| Chinese - Simplified (Vertical) | chi_sim_vert | |
| Chinese - Traditional | chi_tra | |
| Chinese - Traditional (Vertical) | chi_tra_vert | |
| Corsican | cos | |
| Croatian | hrv | croatian |
| Czech | ces | czech |
| Danish | dan | danish |
| Danish - Fraktur | dan_frak | |
| Dhivehi; Maldivian | div | |
| Dutch; Flemish | nld | dutch |
| Dzongkha | dzo | |
| English | eng | english |
| English, Middle (1100–1500) | enm | |
| Esperanto | epo | |
| Estonian | est | |
| Faroese | fao | |
| Filipino | fil | |
| Finnish | fin | finnish |
| French | fra | french |
| French, Middle (ca. 1400–1600) | frm | |
| Galician | glg | |
| Georgian | kat | |
| Georgian - Old | kat_old | |
| German | deu | german |
| German - Fraktur | deu_frak | |
| German Fraktur | frk | |
| Greek, Ancient | grc | |
| Greek, Modern | ell | |
| Gujarati | guj | |
| Haitian; Haitian Creole | hat | |
| Hebrew | heb | |
| Hindi | hin | |
| Hungarian | hun | |
| Icelandic | isl | |
| Indonesian | ind | indonesian |
| Inuktitut | iku | |
| Irish | gle | |
| Italian | ita | italian |
| Italian - Old | ita_old | |
| Japanese | jpn | |
| Japanese (Vertical) | jpn_vert | |
| Javanese | jav | |
| Kannada | kan | |
| Kazakh | kaz | |
| Kirghiz; Kyrgyz | kir | |
| Korean | kor | |
| Korean (Vertical) | kor_vert | |
| Kurdish | kur | |
| Kurmanji (Kurdish) | kmr | |
| Lao | lao | |
| Latin | lat | |
| Latvian | lav | |
| Lithuanian | lit | |
| Luxembourgish | ltz | |
| Macedonian | mkd | |
| Malay | msa | malay |
| Malayalam | mal | |
| Maltese | mlt | |
| Maori | mri | |
| Marathi | mar | |
| Math/Equation detection | equ | |
| Mongolian | mon | |
| Nepali | nep | |
| Norwegian | nor | norwegian |
| Occitan | oci | |
| Oriya | ori | |
| Panjabi; Punjabi | pan | |
| Persian | fas | |
| Polish | pol | polish |
| Portuguese | por | portuguese |
| Pushto; Pashto | pus | |
| Quechua | que | |
| Romanian; Moldavian | ron | |
| Russian | rus | |
| Sanskrit | san | |
| Scottish Gaelic | gla | |
| Serbian | srp | serbian |
| Serbian - Latin | srp_latn | |
| Sindhi | snd | |
| Sinhala; Sinhalese | sin | |
| Slovak | slk | slovak |
| Slovak - Fraktur | slk_frak | |
| Slovenian | slv | slovenian |
| Spanish; Castilian | spa | spanish |
| Spanish - Old | spa_old | |
| Sundanese | sun | |
| Swahili | swa | |
| Swedish | swe | swedish |
| Syriac | syr | |
| Tagalog | tgl | |
| Tajik | tgk | |
| Tamil | tam | |
| Tatar | tat | |
| Telugu | tel | |
| Thai | tha | |
| Tibetan | bod | |
| Tigrinya | tir | |
| Tonga | ton | |
| Turkish | tur | turkish |
| Uighur; Uyghur | uig | |
| Ukrainian | ukr | |
| Urdu | urd | |
| Uzbek | uzb | |
| Uzbek - Cyrillic | uzb_cyrl | |
| Vietnamese | vie | |
| Welsh | cym | |
| Western Frisian | fry | |
| Yiddish | yid | |
| Yoruba | yor |
Usage
OCR capabilities are exposed through the following API endpoints:
- OCR when uploading documents
- OCR to process and download a document
- OCR to edit and save a previously uploaded document
You can specify language of your document using either:
- Full language name e.g.
english,german— available for commonly used languages - ISO 639-2 language code e.g.
eng,deu— available for all languages - ISO 639-2 language code with variant e.g.
chi_sim_vertordeu_frak
OCR a document with one language
Below is an example of a curl request to Document Engine to OCR a document in Japanese (ISO 639-2 code jpn):
# Assuming Document Engine is running on `localhost:5000`.
curl -X POST http://localhost:5000/api/build \ -H 'Authorization: Token token=secret' \ -o jpn-ocr-result.pdf \ --fail \ -H 'Content-Type: multipart/form-data' \ -F 'scanned=@/path/to/japanese-document.png' \ -F instructions='{ "parts": [ { "file": "scanned" } ], "actions": [ { "type": "ocr", "language": "jpn" } ], "output": { "type": "pdf" } }'OCR a document with multiple languages
To perform OCR on a document containing multiple languages, specify a list of desired languages (or their ISO 639-2 codes). Below is an example of a curl request to Document Engine to OCR a document with two languages, English and French:
# Assuming Document Engine is running on `localhost:5000`.
curl -X POST http://localhost:5000/api/build \ -H 'Authorization: Token token=secret' \ -o french-english-ocr-result.pdf \ --fail \ -H 'Content-Type: multipart/form-data' \ -F 'scanned=@/path/to/english-french-document.png' \ -F instructions='{ "parts": [ { "file": "scanned" } ], "actions": [ { "type": "ocr", "language": ["french", "eng"] } ], "output": { "type": "pdf" } }'