---
title: "Extract text, tables, and more from PDF server-side | Nutrient"
canonical_url: "https://www.nutrient.io/guides/document-engine/extraction/extract-data/"
md_url: "https://www.nutrient.io/guides/document-engine/extraction/extract-data.md"
last_updated: "2026-05-25T06:31:34.463Z"
description: "Discover how to extract data from PDFs using Document Engine."
---

# Extract text, tables, and more from PDFs

This guide explains how to extract data from PDFs using Document Engine.

You can extract the following pieces of information from a PDF document:

- Text

- Tables

- Key-value pairs. For more information, refer to the guide on [how key-value pair extraction works](https://www.nutrient.io/guides/document-engine/extraction/key-value-pairs/how-it-works.md).

## Sending the request to extract data

To extract data on all pages of a document, post a multipart request to the [`/api/build` endpoint](https://www.nutrient.io/api/reference/document-engine/upstream/#tag/Document-Editing/operation/build-document). In the instructions, specify the following output parameters:

- `type` specifies the output type. Set this to `json-content`.

- `plainText` is a Boolean value that determines whether to extract data as plain text.

- `structuredText` is a Boolean value that determines whether to extract data as structured text. Enabling this option gives you information about characters, lines, paragraphs, and words.

- `keyValuePairs` is a Boolean value that determines whether to extract key-value pairs.

- `tables` is a Boolean value that determines whether to extract table data.

- `language` specifies the language used for recognizing text with optical character recognition (OCR). Sometimes, text is stored in a PDF or an image in a way that makes it so you cannot search or copy it. Nutrient's OCR engine allows you to recognize text and save it in a separate file where you can both search and copy and paste the text.

### SHELL

```shell

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F document=@/path/to/example-document.pdf \
  -F instructions='{
  "parts": [
    {
      "file": "document"
    }
  ],
  "output": {
    "type": "json-content",
    "plainText": true,
    "structuredText": true,
    "keyValuePairs": true,
    "tables": true,
    "language": "english"
  }
}' \
  -o result.json

```

### HTTP

```http

POST /api/build HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token=<API token>

--customboundary
Content-Disposition: form-data; name="document"; filename="example-document.pdf"
Content-Type: application/pdf

<PDF data>
--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [
    {
      "file": "document"
    }
  ],
  "output": {
    "type": "json-content",
    "plainText": true,
    "structuredText": true,
    "keyValuePairs": true,
    "tables": true,
    "language": "english"
  }
}
--customboundary--

```

For more information on the Build instructions, refer to the [API Reference](https://www.nutrient.io/api/reference/document-engine/upstream/#tag/Build-API).

## Interpreting the data extraction response

The API response provides information about the data you included in the API request, such as:

- Plain text

- Structured text with information about characters, lines, paragraphs, and words

- Extracted key-value pairs

- Tables

## Example data extraction response

```json

{
  "pages": [
    {
      "pageIndex": 0,
      "plainText": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa.\n",
      "structuredText": {
        "characters": [
          {
            "bbox": {
              "left": 0,
              "top": 0,
              "width": 100,
              "height": 100
            },
            "value": "T"
          }
        ],
        "lines": [
          {
            "bbox": {
              "left": 0,
              "top": 0,
              "width": 100,
              "height": 100
            },
            "firstWordIndex": 0,
            "isRTL": false,
            "isVertical": false,
            "wordCount": 5
          }
        ],
        "paragraphs": [
          {
            "bbox": {
              "left": 0,
              "top": 0,
              "width": 100,
              "height": 100
            },
            "firstLineIndex": 0,
            "lineCount": 3
          }
        ],
        "words": [
          {
            "bbox": {
              "left": 0,
              "top": 0,
              "width": 100,
              "height": 100
            },
            "characterCount": 4,
            "firstCharacterIndex": 0,
            "isFromDictionary": true,
            "value": "word"
          }
        ]
      },
      "keyValuePairs": [
        {
          "confidence": 95.4,
          "key": {
            "bbox": {
              "left": 0,
              "top": 0,
              "width": 100,
              "height": 100
            },
            "content": "#"

          },
          "value": {
            "bbox": {
              "left": 0,
              "top": 0,
              "width": 100,
              "height": 100
            },
            "content": "€",
            "dataType": "Currency"
          }
        }
      ],
      "tables": [
        {
          "confidence": 95.4,
          "bbox": {
            "left": 0,
            "top": 0,
            "width": 100,
            "height": 100
          },
          "cells": [
            {
              "bbox": {
                "left": 0,
                "top": 0,
                "width": 100,
                "height": 100
              },
              "rowIndex": 0,
              "columnIndex": 0,
              "isHeader": true,
              "text": "Invoice number"
            }
          ],
          "columns": [
            {
              "bbox": {
                "left": 0,
                "top": 0,
                "width": 100,
                "height": 100
              }
            }
          ],
          "lines": [
            {
              "bbox": {
                "left": 0,
                "top": 0,
                "width": 100,
                "height": 100
              },
              "isVertical": false,
              "thickness": 0
            }
          ],
          "rows": [
            {
              "bbox": {
                "left": 0,
                "top": 0,
                "width": 100,
                "height": 100
              }
            }
          ]
        }
      ]
    }
  ]
}

```
---

## Related pages

- [PDF data extraction server](/guides/document-engine/extraction.md)
- [Invoices](/guides/document-engine/extraction/invoices.md)
- [Extract tables from PDFs and images](/guides/document-engine/extraction/tables.md)
- [Extract data from bank statements](/guides/document-engine/extraction/bank-statements.md)
- [Extract text from PDFs and images](/guides/document-engine/extraction/text.md)