---
title: "Extracting JSON data from a PDF document | Nutrient Java SDK"
canonical_url: "https://www.nutrient.io/guides/java/extraction/json-data-extraction/"
md_url: "https://www.nutrient.io/guides/java/extraction/json-data-extraction.md"
last_updated: "2026-05-26T21:54:59.603Z"
description: "How to extract structured data from a PDF as JSON using Nutrient Java SDK."
---

# Extracting JSON data from a PDF document

Extract structured data from PDF files as JSON for storage, API workflows, or analytics pipelines. This approach reduces manual entry and gives your application direct access to document content.

[Download sample](https://www.nutrient.io/downloads/samples/java/json-data-extraction.zip)

## How Nutrient supports this workflow

Nutrient Java SDK handles structured extraction from PDF documents, including digital-native PDFs and PDFs that mix digital text with scanned content.

In this sample, `VisionEngine.AdaptiveOcr` uses an adaptive extraction pipeline that prefers native PDF text when available and falls back to OCR for image-based content when needed.

You don’t need to manage:

- Third-party OCR engine integration

- Switching between native-text extraction and OCR

- Document layout parsing

- Model download and initialization

- Conversion from extracted output to structured data

Use the SDK API to extract structured JSON in your application.

## Complete implementation

This example shows a complete PDF-to-JSON extraction flow.

Specify a package name and create a new class:

```java

package io.nutrient.Sample;

```

Import the required classes from the SDK:

```java

import io.nutrient.sdk.Document;
import io.nutrient.sdk.Vision;
import io.nutrient.sdk.enums.VisionEngine;
import io.nutrient.sdk.exceptions.NutrientException;

import java.io.FileWriter;
import java.io.IOException;

public class JsonDataExtraction {

```

Create the main method and declare thrown exceptions:

```java

    public static void main(String[] args) throws NutrientException, IOException {

```

Open the PDF with try-with-resources so the document closes automatically:

```java

        try (Document document = Document.open("input.pdf")) {

```

Configure the Adaptive OCR engine, extract JSON content, and write it to `output.json`:

```java

            document.getSettings().getVisionSettings().setEngine(VisionEngine.AdaptiveOcr);

            Vision vision = Vision.set(document);
            String contentJson = vision.extractContent();

            try (FileWriter writer = new FileWriter("output.json")) {
                writer.write(contentJson);
            }
        }
    }
}

```

## Summary

The extraction flow has four steps:

1. Open the PDF document.

2. Configure the Adaptive OCR engine.

3. Extract content as JSON with `Vision`.

4. Write the JSON output to a file.

Nutrient handles adaptive extraction and content structuring, so you don’t need to implement PDF parsing, native-text detection, or OCR fallback logic.

You can download [this sample package](https://www.nutrient.io/downloads/samples/java/json-data-extraction.zip) to run the example locally.
---

## Related pages

- [Generating image descriptions using Claude](/guides/java/extraction/describe-image-with-claude.md)
- [Applying OCR to a PDF document](/guides/java/extraction/apply-ocr-to-pdf.md)
- [Applying OCR to a PDF page](/guides/java/extraction/apply-ocr-to-pdf-page.md)
- [Generating image descriptions using OpenAI](/guides/java/extraction/describe-image-with-openai.md)
- [Generating image descriptions using local AI](/guides/java/extraction/describe-image-with-local-ai.md)
- [Extracting data from images using vision language models](/guides/java/extraction/extract-data-from-image-vlm.md)
- [Extracting data from images using OCR](/guides/java/extraction/extract-data-from-image-ocr.md)
- [Extracting data from images using ICR](/guides/java/extraction/extract-data-from-image-icr.md)
- [Extracting text from images](/guides/java/extraction/read-text-from-image.md)
- [Nutrient Java SDK extraction guides](/guides/java/extraction.md)
- [Speeding up first ICR operation by predownloading models](/guides/java/extraction/speed-up-first-icr-by-downloading-requirements.md)
- [Extracting text from multilingual images](/guides/java/extraction/read-text-from-image-multi-language.md)

