---
title: "PDF OCR API"
canonical_url: "https://www.nutrient.io/guides/dws-processor/tools-and-api/pdf-ocr-api/"
md_url: "https://www.nutrient.io/guides/dws-processor/tools-and-api/pdf-ocr-api.md"
last_updated: "2026-05-27T15:15:53.462Z"
description: "Extract text from images and scanned PDFs using the OCR API. The API supports more than 80 languages with reliable accuracy."
---

# PDF OCR API

This guide explains how to use all the functionality the OCR API provides, including OCR for scanned invoices and other document types. For an overview of the OCR API with signup, pricing, and code examples, see the [OCR API task page](https://www.nutrient.io/api/pdf-ocr-api/).

## Basic OCR usage

The OCR API enables you to take files in any supported format and extract text to make it selectable and searchable. This is useful for images and scanned documents. To learn more about OCR itself, refer to the [Wikipedia article on optical character recognition](https://en.wikipedia.org/wiki/Optical_character_recognition).

To run OCR on a single image file, add a `page1.jpg` file to the same folder as your code. You can use any image containing text, or use the [sample page](https://www.nutrient.io/api/assets/downloads/samples/ocr/page1.jpg/).

Run the code to get a `result.pdf` with your page OCRed. The example sets the OCR language to English. If your content is in a different language, update the `language` property accordingly. Refer to the [supported languages](#supported-languages) section for a list of all available languages.

Below is the code to perform OCR:

### Shell

### Shell (Windows)

### Java

### C#

### JavaScript

### Python

### PHP

### HTTP

## Running OCR on multiple pages

While running OCR on a single page is useful, often you’ll have a folder full of scanned pages that you want to both run OCR on and merge into a single searchable PDF.

Pass in multiple images — one for each page in your request — and Nutrient DWS Processor API will merge all of them into a PDF before running OCR on it.

Add more files in the same folder as your code and run the updated code. You can duplicate and rename the existing file you have, or you can add some other images containing text.

Below is the code to perform OCR on multiple pages:

### Shell

### Shell (Windows)

### Java

### C#

### JavaScript

### Python

### PHP

### HTTP

## Supported languages

The OCR action supports more than 80 languages for text extraction. You can specify languages using one of the following:

- Full language name (lowercase, e.g. `english`, `german`) — available for commonly used languages

- ISO 639-2 language code (e.g. `eng`, `deu`) — available for all languages

- ISO 639-2 language code with variant (e.g. `chi_sim_vert` or `deu_frak`)

| Description                           | Language code  | Full language name |
| ------------------------------------- | -------------- | ------------------ |
| 🇿🇦 Afrikaans                        | `afr`          |                    |
| 🇦🇱 Albanian                         | `sqi`          |                    |
| 🇪🇹 Amharic                          | `amh`          |                    |
| 🇸🇦 Arabic                           | `ara`          |                    |
| 🇦🇲 Armenian                         | `hye`          |                    |
| 🇮🇳 Assamese                         | `asm`          |                    |
| 🇦🇿 Azerbaijani                      | `aze`          |                    |
| 🇦🇿 Azerbaijani — Cyrillic           | `aze_cyrl`     |                    |
| 🇪🇸 Basque                           | `eus`          |                    |
| 🇧🇾 Belarusian                       | `bel`          |                    |
| 🇧🇩 Bengali                          | `ben`          |                    |
| 🇧🇦 Bosnian                          | `bos`          |                    |
| 🇫🇷 Breton                           | `bre`          |                    |
| 🇧🇬 Bulgarian                        | `bul`          |                    |
| 🇲🇲 Burmese                          | `mya`          |                    |
| 🇪🇸 Catalan; Valencian               | `cat`          |                    |
| 🇵🇭 Cebuano                          | `ceb`          |                    |
| 🇰🇭 Central Khmer                    | `khm`          |                    |
| 🇺🇸 Cherokee                         | `chr`          |                    |
| 🇨🇳 Chinese — Simplified             | `chi_sim`      |                    |
| 🇨🇳 Chinese — Simplified (Vertical)  | `chi_sim_vert` |                    |
| 🇹🇼 Chinese — Traditional            | `chi_tra`      |                    |
| 🇹🇼 Chinese — Traditional (Vertical) | `chi_tra_vert` |                    |
| 🇫🇷 Corsican                         | `cos`          |                    |
| 🇭🇷 Croatian                         | `hrv`          | croatian           |
| 🇨🇿 Czech                            | `ces`          | czech              |
| 🇩🇰 Danish                           | `dan`          | danish             |
| 🇩🇰 Danish — Fraktur                 | `dan_frak`     |                    |
| 🇲🇻 Dhivehi; Maldivian               | `div`          |                    |
| 🇳🇱 Dutch; Flemish                   | `nld`          | dutch              |
| 🇧🇹 Dzongkha                         | `dzo`          |                    |
| 🇬🇧 English                          | `eng`          | english            |
| 🇬🇧 English, Middle (1100–1500)      | `enm`          |                    |
| Esperanto                             | `epo`          |                    |
| 🇪🇪 Estonian                         | `est`          |                    |
| 🇫🇴 Faroese                          | `fao`          |                    |
| 🇵🇭 Filipino                         | `fil`          |                    |
| 🇫🇮 Finnish                          | `fin`          | finnish            |
| 🇫🇷 French                           | `fra`          | french             |
| 🇫🇷 French, Middle (ca. 1400–1600)   | `frm`          |                    |
| 🇪🇸 Galician                         | `glg`          |                    |
| 🇬🇪 Georgian                         | `kat`          |                    |
| 🇬🇪 Georgian — Old                   | `kat_old`      |                    |
| 🇩🇪 German                           | `deu`          | german             |
| 🇩🇪 German — Fraktur                 | `deu_frak`     |                    |
| 🇩🇪 German Fraktur                   | `frk`          |                    |
| 🇬🇷 Greek, Ancient                   | `grc`          |                    |
| 🇬🇷 Greek, Modern                    | `ell`          |                    |
| 🇮🇳 Gujarati                         | `guj`          |                    |
| 🇭🇹 Haitian; Haitian Creole          | `hat`          |                    |
| 🇮🇱 Hebrew                           | `heb`          |                    |
| 🇮🇳 Hindi                            | `hin`          |                    |
| 🇭🇺 Hungarian                        | `hun`          |                    |
| 🇮🇸 Icelandic                        | `isl`          |                    |
| 🇮🇩 Indonesian                       | `ind`          | indonesian         |
| 🇨🇦 Inuktitut                        | `iku`          |                    |
| 🇮🇪 Irish                            | `gle`          |                    |
| 🇮🇹 Italian                          | `ita`          | italian            |
| 🇮🇹 Italian — Old                    | `ita_old`      |                    |
| 🇯🇵 Japanese                         | `jpn`          |                    |
| 🇯🇵 Japanese (Vertical)              | `jpn_vert`     |                    |
| 🇮🇩 Javanese                         | `jav`          |                    |
| 🇮🇳 Kannada                          | `kan`          |                    |
| 🇰🇿 Kazakh                           | `kaz`          |                    |
| 🇰🇬 Kirghiz; Kyrgyz                  | `kir`          |                    |
| 🇰🇷 Korean                           | `kor`          |                    |
| 🇰🇷 Korean (Vertical)                | `kor_vert`     |                    |
| 🇮🇶 Kurdish                          | `kur`          |                    |
| 🇹🇷 Kurmanji (Kurdish)               | `kmr`          |                    |
| 🇱🇦 Lao                              | `lao`          |                    |
| 🇻🇦 Latin                            | `lat`          |                    |
| 🇱🇻 Latvian                          | `lav`          |                    |
| 🇱🇹 Lithuanian                       | `lit`          |                    |
| 🇱🇺 Luxembourgish                    | `ltz`          |                    |
| 🇲🇰 Macedonian                       | `mkd`          |                    |
| 🇲🇾 Malay                            | `msa`          | malay              |
| 🇮🇳 Malayalam                        | `mal`          |                    |
| 🇲🇹 Maltese                          | `mlt`          |                    |
| 🇳🇿 Maori                            | `mri`          |                    |
| 🇮🇳 Marathi                          | `mar`          |                    |
| Math/equation detection               | `equ`          |                    |
| 🇲🇳 Mongolian                        | `mon`          |                    |
| 🇳🇵 Nepali                           | `nep`          |                    |
| 🇳🇴 Norwegian                        | `nor`          | norwegian          |
| 🇫🇷 Occitan                          | `oci`          |                    |
| 🇮🇳 Oriya                            | `ori`          |                    |
| 🇮🇳 Panjabi; Punjabi                 | `pan`          |                    |
| 🇮🇷 Persian                          | `fas`          |                    |
| 🇵🇱 Polish                           | `pol`          | polish             |
| 🇵🇹 Portuguese                       | `por`          | portuguese         |
| 🇦🇫 Pushto; Pashto                   | `pus`          |                    |
| 🇵🇪 Quechua                          | `que`          |                    |
| 🇷🇴 Romanian; Moldavian              | `ron`          |                    |
| 🇷🇺 Russian                          | `rus`          |                    |
| 🇮🇳 Sanskrit                         | `san`          |                    |
| 🇬🇧 Scottish Gaelic                  | `gla`          |                    |
| 🇷🇸 Serbian                          | `srp`          | serbian            |
| 🇷🇸 Serbian – Latin                  | `srp_latn`     |                    |
| 🇵🇰 Sindhi                           | `snd`          |                    |
| 🇱🇰 Sinhala; Sinhalese               | `sin`          |                    |
| 🇸🇰 Slovak                           | `slk`          | slovak             |
| 🇸🇰 Slovak – Fraktur                 | `slk_frak`     |                    |
| 🇸🇮 Slovenian                        | `slv`          | slovenian          |
| 🇪🇸 Spanish; Castilian               | `spa`          | spanish            |
| 🇪🇸 Spanish – Old                    | `spa_old`      |                    |
| 🇮🇩 Sundanese                        | `sun`          |                    |
| 🇰🇪 Swahili                          | `swa`          |                    |
| 🇸🇪 Swedish                          | `swe`          | swedish            |
| 🇸🇾 Syriac                           | `syr`          |                    |
| 🇵🇭 Tagalog                          | `tgl`          |                    |
| 🇹🇯 Tajik                            | `tgk`          |                    |
| 🇮🇳 Tamil                            | `tam`          |                    |
| 🇷🇺 Tatar                            | `tat`          |                    |
| 🇮🇳 Telugu                           | `tel`          |                    |
| 🇹🇭 Thai                             | `tha`          |                    |
| 🇨🇳 Tibetan                          | `bod`          |                    |
| 🇪🇷 Tigrinya                         | `tir`          |                    |
| 🇹🇴 Tonga                            | `ton`          |                    |
| 🇹🇷 Turkish                          | `tur`          | turkish            |
| 🇨🇳 Uighur; Uyghur                   | `uig`          |                    |
| 🇺🇦 Ukrainian                        | `ukr`          |                    |
| 🇵🇰 Urdu                             | `urd`          |                    |
| 🇺🇿 Uzbek                            | `uzb`          |                    |
| 🇺🇿 Uzbek - Cyrillic                 | `uzb_cyrl`     |                    |
| 🇻🇳 Vietnamese                       | `vie`          |                    |
| 🇬🇧 Welsh                            | `cym`          |                    |
| 🇳🇱 Western Frisian                  | `fry`          |                    |
| 🇮🇱 Yiddish                          | `yid`          |                    |
| 🇳🇬 Yoruba                           | `yor`          |                    |
---

## Related pages

- [Tools and APIs](/guides/dws-processor/tools-and-api.md)
- [Document-to-image API](/guides/dws-processor/tools-and-api/document-to-image-api.md)
- [DOCX templating API](/guides/dws-processor/tools-and-api/docx-templating-api.md)
- [Image-to-PDF API](/guides/dws-processor/tools-and-api/image-to-pdf-api.md)
- [Markdown-to-PDF API](/guides/dws-processor/tools-and-api/markdown-to-pdf-api.md)
- [Office-to-PDF API](/guides/dws-processor/tools-and-api/office-to-pdf-api.md)
- [PDF security API](/guides/dws-processor/tools-and-api/pdf-security-api.md)
- [PDF-to-PDF/A API](/guides/dws-processor/tools-and-api/pdf-to-pdfa-api.md)
- [PDF generator API](/guides/dws-processor/tools-and-api/pdf-generator-api.md)
- [PDF watermark API](/guides/dws-processor/tools-and-api/pdf-watermark-api.md)
- [PDF-to-image API](/guides/dws-processor/tools-and-api/pdf-to-image-api.md)
- [PDF/UA auto-tagging API](/guides/dws-processor/tools-and-api/pdfua-api.md)
- [Redaction API](/guides/dws-processor/tools-and-api/redaction-api.md)

