---
title: "Perform optical character recognition (OCR) on PDF documents | Nutrient Web SDK"
canonical_url: "https://www.nutrient.io/guides/web/ocr/"
md_url: "https://www.nutrient.io/guides/web/ocr.md"
last_updated: "2026-05-15T19:10:05.088Z"
description: "Learn how to use Nutrient Web SDK to perform OCR on PDF documents in multiple languages. Unlock text for search and selection with simple JavaScript commands."
---

# Perform OCR on PDF documents

Nutrient Web SDK enables you to run [optical character recognition](https://en.wikipedia.org/wiki/Optical_character_recognition) (OCR) on PDF documents, accurately recognize text and patterns, and generate searchable PDF/A files.

**OCR** is available when using **Nutrient Web SDK with Document Engine**. To understand the difference between using only the Web SDK and combining it with Document Engine, refer to the [operational mode](https://www.nutrient.io/guides/web/about/operational-modes.md) guide. If you’re looking for more **advanced OCR** capabilities, [**Nutrient.NET SDK OCR**](https://www.nutrient.io/guides/dotnet/ocr/usage/image-to-searchable-pdf.md) offers additional powerful features, such as zonal OCR, key-value extraction, image preprocessing, searchable PDF/A generation with layout retention, orientation detection, confidence scoring, and more. It’s available as a separate SDK and can be used in conjunction with Document Engine.

## Comparing OCR SDKs — Nutrient vs. Apryse

| Feature                                 | Web SDK + Document Engine OCR                                   | Nutrient.NET SDK OCR  | Apryse OCR                                                                |
| --------------------------------------- | --------------------------------------------------------------- | ---------------------- | ------------------------------------------------------------------------- |
| Multi-language support                  | 30+ built-in languages                                          | 30+ built-in languages | Six built-in languages with OCR module binary and 10 with IRIS OCR module |
| Searchable PDF creation                 | ✅                                                               | ✅                      | ✅                                                                         |
| OCR with exact bounding box coordinates | ❌                                                               | ✅                      | ✅                                                                         |
| Zone-based OCR/custom OCR regions       | ❌                                                               | ✅                      | ✅                                                                         |
| Key-value/table extraction              | ✅ (available through Document Engine’s [data extraction API](https://www.nutrient.io/api/data-extraction-api/)) | ✅                      | ❌                                                                         |
| Orientation detection                   | ❌                                                               | ✅                      | ✅                                                                         |
| Image preprocessing (deskew, etc.)      | ❌                                                               | ✅                      | ✅ (manual)                                                                |
| Performance and speed                   | ✅ Fast                                                          | ✅ Fast                 | Depends on SDK setup (OCR module/IRIS module)                             |
| API access                              | Three-step API call once initial setup is done                  | Requires SDK setup     | Requires SDK setup                                                        |

## Real-world use cases

- **Invoice OCR** — Convert scanned invoices into searchable PDFs, or extract totals and vendor information using OCR.

- **Contract digitization** — Turn scanned contracts into searchable, selectable PDFs for legal archiving.

- **Form processing** — Use OCR to extract fields such as names, dates, and signatures from scanned forms.

- **Multi-language document digitization** — OCR documents in multiple languages with full Unicode support.

## How to perform OCR

To perform OCR on a PDF, [open the document from Document Engine](https://www.nutrient.io/guides/web/open-a-document/from-document-engine.md) and apply the `performOcr` operation using the [`Instance.applyOperations`](https://www.nutrient.io/api/web/classes/NutrientViewer.Instance.html#applyoperations) method, as demonstrated in the code snippet below:

```js

await instance.applyOperations([
  { type: "performOcr", language: "english", pageIndexes: "all" }
]);

```

This operation detects English text in the document and makes it searchable and selectable.

For information on how to use Document Engine’s OCR API, refer to the [how to use the OCR server](https://www.nutrient.io/guides/document-engine/ocr/usage.md) guide.

### Performing OCR in a language other than English

If your document is written in a language other than English, you can extract its text by modifying the `language` parameter. For example, to perform OCR in Spanish, run the following code snippet:

```js

await instance.applyOperations([
  { type: "performOcr", language: "spanish", pageIndexes: "all" }
]);

```

## OCR supported languages

Nutrient Web SDK’s OCR component supports a wide range of languages, enabling precise text recognition based on linguistic characteristics such as ligatures, punctuation rules, and symbol variations. To ensure accurate text extraction, specify the language of the document during OCR configuration.

Nutrient Web SDK can perform OCR in the following languages:

- Croatian

- Czech

- Danish

- Dutch

- English

- Finnish

- French

- German

- Indonesian

- Italian

- Malay

- Norwegian

- Polish

- Portuguese

- Serbian

- Slovak

- Slovenian

- Spanish

- Swedish

- Turkish

- Welsh

Languages aren’t region-specific. For example, English applies to both American English and British English.

If your required language isn’t listed above, [contact Support](https://support.nutrient.io/hc/en-us/requests/new) for assistance.

## Best practices when performing OCR

To learn about the best practices when performing OCR using Document Engine’s OCR API, refer to the [getting the best OCR accuracy](https://www.nutrient.io/guides/document-engine/ocr/best-practices.md) guide.

## Try OCR

Test how OCR works using our [free online demo](https://www.nutrient.io/demo/ocr). Upload your own PDF or image file, select your preferred OCR language, and see how text recognition makes content searchable and selectable.

## FAQs

#### Can I perform OCR on specific pages of a PDF using Nutrient Web SDK?

Yes, you can specify which pages to OCR by setting the `pageIndexes` parameter in the `performOcr` operation. This helps limit OCR to relevant pages and optimize performance.

#### Should I manually preprocess images before performing OCR?

Manual preprocessing isn’t recommended. Nutrient Web SDK’s OCR engine applies automatic preprocessing steps, often delivering better results than manual adjustments.

#### How do I configure OCR for documents with multiple languages?

Use the `language` parameter and specify multiple languages separated by “+” (for example, deu+fra+spa). This ensures accurate recognition for multilingual documents.

#### What is the recommended image resolution and font size for accurate OCR?

For best results, use images with a resolution of 200–300 DPI. Font sizes between 10 pt and 30 px are optimal. Avoid large fonts or excessively high resolutions, as they may reduce accuracy.

#### Does Nutrient Web SDK with Document Engine support advanced OCR features such as layout retention or zonal OCR?

Basic OCR features are exposed through the Web SDK with Document Engine. For advanced capabilities such as layout retention, zonal OCR, and confidence scoring, refer to [Nutrient.NET SDK OCR](https://www.nutrient.io/guides/dotnet/ocr/usage/image-to-searchable-pdf.md).
---

## Related pages

- [Appian PDF viewer](/guides/web/appian.md)
- [JavaScript barcode library: Scan, read, and generate barcodes](/guides/web/barcodes.md)
- [Best practices](/guides/web/best-practice.md)
- [Samples](/guides/web/samples.md)
- [Contributing to Nutrient projects](/guides/web/miscellaneous/contributing.md)
- [Explore interactive JavaScript PDF demos](/guides/web/demo.md)
- [Download our JavaScript library](/guides/web/downloads.md)
- [Changelog for Web](/guides/web/changelog.md)
- [JavaScript PDF library](/guides/web.md)
- [Developer guides for JavaScript PDF library](/guides/web/intro.md)
- [Knowledge base](/guides/web/kb.md)
- [Mendix PDF viewer](/guides/web/mendix.md)
- [OutSystems PDF viewer](/guides/web/outsystems.md)
- [Self-host assets in Web SDK](/guides/web/self-host-assets.md)
- [Troubleshooting](/guides/web/troubleshoot.md)