Supported languages
The Data Extraction API supports more than 100 languages for OCR. Specify languages using the options.language parameter in your request instructions.
Languages marked with a full name alias can use either format. All other languages require the language code. Codes are based on ISO 639-2, with script and variant suffixes for some languages (e.g. chi_sim, deu_frak).
| Language | Code | Full name alias |
|---|---|---|
| Afrikaans | afr | |
| Albanian | sqi | |
| Amharic | amh | |
| Arabic | ara | |
| Armenian | hye | |
| Assamese | asm | |
| Azerbaijani | aze | |
| Azerbaijani — Cyrillic | aze_cyrl | |
| Basque | eus | |
| Belarusian | bel | |
| Bengali | ben | |
| Bosnian | bos | |
| Breton | bre | |
| Bulgarian | bul | |
| Burmese | mya | |
| Catalan | cat | |
| Cebuano | ceb | |
| Central Khmer | khm | |
| Cherokee | chr | |
| Chinese — Simplified | chi_sim | |
| Chinese — Simplified (vertical) | chi_sim_vert | |
| Chinese — Traditional | chi_tra | |
| Chinese — Traditional (vertical) | chi_tra_vert | |
| Corsican | cos | |
| Croatian | hrv | croatian |
| Czech | ces | czech |
| Danish | dan | danish |
| Danish — Fraktur | dan_frak | |
| Dhivehi | div | |
| Dutch | nld | dutch |
| Dzongkha | dzo | |
| English | eng | english |
| English, Middle (1100–1500) | enm | |
| Esperanto | epo | |
| Estonian | est | |
| Faroese | fao | |
| Filipino | fil | |
| Finnish | fin | finnish |
| French | fra | french |
| French, Middle (ca. 1400–1600) | frm | |
| Galician | glg | |
| Georgian | kat | |
| Georgian — Old | kat_old | |
| German | deu | german |
| German — Fraktur | deu_frak | |
| German Fraktur | frk | |
| Greek, Ancient | grc | |
| Greek, Modern | ell | |
| Gujarati | guj | |
| Haitian Creole | hat | |
| Hebrew | heb | |
| Hindi | hin | |
| Hungarian | hun | |
| Icelandic | isl | |
| Indonesian | ind | indonesian |
| Inuktitut | iku | |
| Irish | gle | |
| Italian | ita | italian |
| Italian — Old | ita_old | |
| Japanese | jpn | |
| Japanese (vertical) | jpn_vert | |
| Javanese | jav | |
| Kannada | kan | |
| Kazakh | kaz | |
| Kirghiz | kir | |
| Korean | kor | |
| Korean (vertical) | kor_vert | |
| Kurdish | kur | |
| Kurmanji | kmr | |
| Lao | lao | |
| Latin | lat | |
| Latvian | lav | |
| Lithuanian | lit | |
| Luxembourgish | ltz | |
| Macedonian | mkd | |
| Malay | msa | malay |
| Malayalam | mal | |
| Maltese | mlt | |
| Maori | mri | |
| Marathi | mar | |
| Math / equation detection | equ | |
| Mongolian | mon | |
| Nepali | nep | |
| Norwegian | nor | norwegian |
| Occitan | oci | |
| Oriya | ori | |
| Panjabi | pan | |
| Persian | fas | |
| Polish | pol | polish |
| Portuguese | por | portuguese |
| Pashto | pus | |
| Quechua | que | |
| Romanian | ron | |
| Russian | rus | |
| Sanskrit | san | |
| Scottish Gaelic | gla | |
| Serbian | srp | serbian |
| Serbian — Latin | srp_latn | |
| Sindhi | snd | |
| Sinhala | sin | |
| Slovak | slk | slovak |
| Slovak — Fraktur | slk_frak | |
| Slovenian | slv | slovenian |
| Spanish | spa | spanish |
| Spanish — Old | spa_old | |
| Sundanese | sun | |
| Swahili | swa | |
| Swedish | swe | swedish |
| Syriac | syr | |
| Tagalog | tgl | |
| Tajik | tgk | |
| Tamil | tam | |
| Tatar | tat | |
| Telugu | tel | |
| Thai | tha | |
| Tibetan | bod | |
| Tigrinya | tir | |
| Tonga | ton | |
| Turkish | tur | turkish |
| Uighur | uig | |
| Ukrainian | ukr | |
| Urdu | urd | |
| Uzbek | uzb | |
| Uzbek — Cyrillic | uzb_cyrl | |
| Vietnamese | vie | |
| Welsh | cym | |
| Western Frisian | fry | |
| Yiddish | yid | |
| Yoruba | yor |
Language format
You can specify languages in three ways:
| Format | Example | Description |
|---|---|---|
| Full name (lowercase) | "english", "german" | Common languages only |
| Language code | "eng", "deu" | All languages |
| Code with variant | "chi_sim", "deu_frak" | Script or historical variants |
The API normalizes full language names to language codes internally.