PDF OCR API

Use this OCR API for scanned invoices, forms, PDFs, and document images to create searchable, selectable PDFs. Nutrient DWS is built for API-first OCR workflows across records, archives, intake pipelines, and document-processing systems instead of consumer OCR apps.

Try for free

Built for PDF workflows

Unlike generic OCR tools, Nutrient is optimized for PDFs, preserving layout, handling embedded fonts, and supporting searchable PDF output for seamless integration into document pipelines.

Accurate OCR in 80+ languages

With support for more than 80 OCR languages, Nutrient helps teams extract text from multilingual scanned documents, forms, invoices, receipts, and archived records.

Built for API-driven OCR workflows

Use REST, SDKs, or Postman to add OCR to document-processing pipelines where searchable output, PDF fidelity, and predictable automation matter more than one-off consumer OCR tools.

Try it out

This example will run English language OCR on your uploaded document, making any text in the document selectable and searchable.

Try it out in three steps

Add a scanned PDF named document.pdf to your project folder.
Run the code from the same folder.
Open result.pdf to see the output.

Advanced API

curl -X POST https://api.nutrient.io/processor/ocr \
  -H "Authorization: Bearer your_api_key_here" \
  -o result.pdf \
  --fail \
  -F file=@document.pdf \
  -F data='{
      "language": "english"
    }'

curl -X POST https://api.nutrient.io/processor/ocr ^
  -H "Authorization: Bearer your_api_key_here" ^
  -o result.pdf ^
  --fail ^
  -F file=@document.pdf ^
  -F data="{\"language\": \"english\"}"

package com.example.pspdfkit;

import java.io.File;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;

import org.json.JSONArray;
import org.json.JSONObject;

import okhttp3.MediaType;
import okhttp3.MultipartBody;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;

public final class PspdfkitApiExample {
  public static void main(final String[] args) throws IOException {
    final RequestBody body = new MultipartBody.Builder()
      .setType(MultipartBody.FORM)
      .addFormDataPart(
        "file",
        "document.pdf",
        RequestBody.create(
          MediaType.parse("application/pdf"),
          new File("document.pdf")
        )
      )
      .addFormDataPart(
        "data",
        new JSONObject()
          .put("language", "english").toString()
      )
      .build();

    final Request request = new Request.Builder()
      .url("https://api.nutrient.io/processor/ocr")
      .method("POST", body)
      .addHeader("Authorization", "Bearer your_api_key_here")
      .build();

    final OkHttpClient client = new OkHttpClient()
      .newBuilder()
      .build();

    final Response response = client.newCall(request).execute();

    if (response.isSuccessful()) {
      Files.copy(
        response.body().byteStream(),
        FileSystems.getDefault().getPath("result.pdf"),
        StandardCopyOption.REPLACE_EXISTING
      );
    } else {
      // Handle the error
      throw new IOException(response.body().string());
    }
  }
}

using System;
using System.IO;
using System.Net;
using RestSharp;

namespace PspdfkitApiDemo
{
  class Program
  {
    static void Main(string[] args)
    {
      var client = new RestClient("https://api.nutrient.io/processor/ocr");

      var request = new RestRequest(Method.POST)
        .AddHeader("Authorization", "Bearer your_api_key_here")
        .AddFile("file", "document.pdf")
        .AddParameter("data", new JsonObject
        {
          ["language"] = "english"
        }.ToString());

      request.AdvancedResponseWriter = (responseStream, response) =>
      {
        if (response.StatusCode == HttpStatusCode.OK)
        {
          using (responseStream)
          {
            using var outputFileWriter = File.OpenWrite("result.pdf");
            responseStream.CopyTo(outputFileWriter);
          }
        }
        else
        {
          var responseStreamReader = new StreamReader(responseStream);
          Console.Write(responseStreamReader.ReadToEnd());
        }
      };

      client.Execute(request);
    }
  }
}

// This code requires Node.js. Do not run this code directly in a web browser.

const axios = require('axios')
const FormData = require('form-data')
const fs = require('fs')

const formData = new FormData()
formData.append('data', JSON.stringify({
  language: "english"
}))
formData.append('file', fs.createReadStream('document.pdf'))

;(async () => {
  try {
    const response = await axios.post('https://api.nutrient.io/processor/ocr', formData, {
      headers: formData.getHeaders({
        'Authorization': 'Bearer your_api_key_here'
      }),
      responseType: "stream"
    })

    response.data.pipe(fs.createWriteStream("result.pdf"))
  } catch (e) {
    const errorString = await streamToString(e.response.data)
    console.log(errorString)
  }
})()

function streamToString(stream) {
  const chunks = []
  return new Promise((resolve, reject) => {
    stream.on("data", (chunk) => chunks.push(Buffer.from(chunk)))
    stream.on("error", (err) => reject(err))
    stream.on("end", () => resolve(Buffer.concat(chunks).toString("utf8")))
  })
}

import requests
import json

response = requests.request(
  'POST',
  'https://api.nutrient.io/processor/ocr',
  headers = {
    'Authorization': 'Bearer your_api_key_here'
  },
  files = {
    'file': open('document.pdf', 'rb')
  },
  data = {
    'data': json.dumps({
      'language': 'english'
    })
  },
  stream = True
)

if response.ok:
  with open('result.pdf', 'wb') as fd:
    for chunk in response.iter_content(chunk_size=8096):
      fd.write(chunk)
else:
  print(response.text)
  exit()

<?php

$FileHandle = fopen('result.pdf', 'w+');

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => 'https://api.nutrient.io/processor/ocr',
  CURLOPT_CUSTOMREQUEST => 'POST',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_POSTFIELDS => array(
    'data' => '{
      "language": "english"
    }',
    'file' => new CURLFILE('document.pdf')
  ),
  CURLOPT_HTTPHEADER => array(
    'Authorization: Bearer your_api_key_here'
  ),
  CURLOPT_FILE => $FileHandle,
));

$response = curl_exec($curl);

curl_close($curl);

fclose($FileHandle);

POST https://api.nutrient.io/processor/ocr HTTP/1.1
Content-Type: multipart/form-data; boundary=--customboundary
Authorization: Bearer your_api_key_here

--customboundary
Content-Disposition: form-data; name="data"
Content-Type: application/json

{
  "language": "english"
}
--customboundary
Content-Disposition: form-data; name="file"; filename="document.pdf"
Content-Type: application/pdf

(file data)
--customboundary--

curl -X POST https://api.nutrient.io/build \
  -H "Authorization: Bearer your_api_key_here" \
  -o result.pdf \
  --fail \
  -F scanned=@document.pdf \
  -F instructions='{
      "parts": [
        {
          "file": "scanned"
        }
      ],
      "actions": [
        {
          "type": "ocr",
          "language": "english"
        }
      ]
    }'

curl -X POST https://api.nutrient.io/build ^
  -H "Authorization: Bearer your_api_key_here" ^
  -o result.pdf ^
  --fail ^
  -F scanned=@document.pdf ^
  -F instructions="{\"parts\": [{\"file\": \"scanned\"}], \"actions\": [{\"type\": \"ocr\", \"language\": \"english\"}]}"

package com.example.pspdfkit;

import java.io.File;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.StandardCopyOption;

import org.json.JSONArray;
import org.json.JSONObject;

import okhttp3.MediaType;
import okhttp3.MultipartBody;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;

public final class PspdfkitApiExample {
  public static void main(final String[] args) throws IOException {
    final RequestBody body = new MultipartBody.Builder()
      .setType(MultipartBody.FORM)
      .addFormDataPart(
        "scanned",
        "document.pdf",
        RequestBody.create(
          MediaType.parse("application/pdf"),
          new File("document.pdf")
        )
      )
      .addFormDataPart(
        "instructions",
        new JSONObject()
          .put("parts", new JSONArray()
            .put(new JSONObject()
              .put("file", "scanned")
            )
          )
          .put("actions", new JSONArray()
            .put(new JSONObject()
              .put("type", "ocr")
              .put("language", "english")
            )
          ).toString()
      )
      .build();

    final Request request = new Request.Builder()
      .url("https://api.nutrient.io/build")
      .method("POST", body)
      .addHeader("Authorization", "Bearer your_api_key_here")
      .build();

    final OkHttpClient client = new OkHttpClient()
      .newBuilder()
      .build();

    final Response response = client.newCall(request).execute();

    if (response.isSuccessful()) {
      Files.copy(
        response.body().byteStream(),
        FileSystems.getDefault().getPath("result.pdf"),
        StandardCopyOption.REPLACE_EXISTING
      );
    } else {
      // Handle the error
      throw new IOException(response.body().string());
    }
  }
}

using System;
using System.IO;
using System.Net;
using RestSharp;

namespace PspdfkitApiDemo
{
  class Program
  {
    static void Main(string[] args)
    {
      var client = new RestClient("https://api.nutrient.io/build");

      var request = new RestRequest(Method.POST)
        .AddHeader("Authorization", "Bearer your_api_key_here")
        .AddFile("scanned", "document.pdf")
        .AddParameter("instructions", new JsonObject
        {
          ["parts"] = new JsonArray
          {
            new JsonObject
            {
              ["file"] = "scanned"
            }
          },
          ["actions"] = new JsonArray
          {
            new JsonObject
            {
              ["type"] = "ocr",
              ["language"] = "english"
            }
          }
        }.ToString());

      request.AdvancedResponseWriter = (responseStream, response) =>
      {
        if (response.StatusCode == HttpStatusCode.OK)
        {
          using (responseStream)
          {
            using var outputFileWriter = File.OpenWrite("result.pdf");
            responseStream.CopyTo(outputFileWriter);
          }
        }
        else
        {
          var responseStreamReader = new StreamReader(responseStream);
          Console.Write(responseStreamReader.ReadToEnd());
        }
      };

      client.Execute(request);
    }
  }
}

// This code requires Node.js. Do not run this code directly in a web browser.

const axios = require('axios')
const FormData = require('form-data')
const fs = require('fs')

const formData = new FormData()
formData.append('instructions', JSON.stringify({
  parts: [
    {
      file: "scanned"
    }
  ],
  actions: [
    {
      type: "ocr",
      language: "english"
    }
  ]
}))
formData.append('scanned', fs.createReadStream('document.pdf'))

;(async () => {
  try {
    const response = await axios.post('https://api.nutrient.io/build', formData, {
      headers: formData.getHeaders({
        'Authorization': 'Bearer your_api_key_here'
      }),
      responseType: "stream"
    })

    response.data.pipe(fs.createWriteStream("result.pdf"))
  } catch (e) {
    const errorString = await streamToString(e.response.data)
    console.log(errorString)
  }
})()

function streamToString(stream) {
  const chunks = []
  return new Promise((resolve, reject) => {
    stream.on("data", (chunk) => chunks.push(Buffer.from(chunk)))
    stream.on("error", (err) => reject(err))
    stream.on("end", () => resolve(Buffer.concat(chunks).toString("utf8")))
  })
}

import requests
import json

response = requests.request(
  'POST',
  'https://api.nutrient.io/build',
  headers = {
    'Authorization': 'Bearer your_api_key_here'
  },
  files = {
    'scanned': open('document.pdf', 'rb')
  },
  data = {
    'instructions': json.dumps({
      'parts': [
        {
          'file': 'scanned'
        }
      ],
      'actions': [
        {
          'type': 'ocr',
          'language': 'english'
        }
      ]
    })
  },
  stream = True
)

if response.ok:
  with open('result.pdf', 'wb') as fd:
    for chunk in response.iter_content(chunk_size=8096):
      fd.write(chunk)
else:
  print(response.text)
  exit()

<?php

$FileHandle = fopen('result.pdf', 'w+');

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => 'https://api.nutrient.io/build',
  CURLOPT_CUSTOMREQUEST => 'POST',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_POSTFIELDS => array(
    'instructions' => '{
      "parts": [
        {
          "file": "scanned"
        }
      ],
      "actions": [
        {
          "type": "ocr",
          "language": "english"
        }
      ]
    }',
    'scanned' => new CURLFILE('document.pdf')
  ),
  CURLOPT_HTTPHEADER => array(
    'Authorization: Bearer your_api_key_here'
  ),
  CURLOPT_FILE => $FileHandle,
));

$response = curl_exec($curl);

curl_close($curl);

fclose($FileHandle);

POST https://api.nutrient.io/build HTTP/1.1
Content-Type: multipart/form-data; boundary=--customboundary
Authorization: Bearer your_api_key_here

--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": "scanned"
    }
  ],
  "actions": [
    {
      "type": "ocr",
      "language": "english"
    }
  ]
}
--customboundary
Content-Disposition: form-data; name="scanned"; filename="document.pdf"
Content-Type: application/pdf

(scanned data)
--customboundary--

Start now

Create an account to access your API key and start with 50 free credits per month

Start building with DWS Processor API in minutes — no payment information required.

Already have an account? Sign in →

API comparison

BASIC

OCR API

Streamlined API for performing OCR on documents. Perfect for most use cases.

FEATURES

Simple request format
Minimal configuration required
Purpose-built for specific tasks

ADVANCED

Build API

Maximum flexibility and advanced features for complex workflows.

FEATURES

Multipart document support
Advanced actions and transformations
Workflow orchestration

Getting started

The following section will walk you through how to best make use of all the functionality the OCR API provides.

GUIDES

1 — Basics

The basics of OCR

2 — Advanced

Advanced OCR

3 — Languages

Supported languages

Most common next steps

Connect OCR evaluation to API key setup, pricing, and the broader Processor workflow

OPEN GETTING STARTED

Use the following:

Extract text from PDF If you need to pull the recognized text into downstream search, compliance, or AI workflows

Continue to:

Getting started For API key setup

Postman collection For the fastest first request

Processor API pricing For credit-based cost review

Processor API overview For broader document-processing automation

Security is our top priority

No document storage

No input or resulting documents are stored on our infrastructure. All files are deleted as soon as a request finishes. Alternatively, check out our self-hosted product.

HTTPS encryption

All communication between your application and Nutrient is done via HTTPS to ensure your data is encrypted when it’s sent to us.

Safe payment processing

All payments are handled by Paddle. Nutrient DWS Processor API never has direct access to any of your payment data.

Ready to try it?

Create an account to get your DWS Processor API key and start making API calls.

START FOR FREE