Extracting form fields from images

Scanned forms such as tax filings, healthcare intake sheets, lease agreements, and expense reports contain boxes, checkboxes, and signature lines, but a flat image has no machine-readable structure. To process them downstream, you need to locate each fillable region on the page.

This sample shows how to detect form fields on every page of a document and export the result as structured JSON with Nutrient Java SDK. Detection runs locally and offline, so no external model is contacted. To also assign a human-readable semantic label to each field, such as “First name” or “Date of birth”, with a vision model, refer to the label form fields with a VLM guide.

If you want to turn an image-based form into a fillable PDF with AcroForm widgets, refer to the detect and add form fields guide. That workflow uses the same detection model, but it writes fields back into the PDF instead of exporting data.

Download sample

How Nutrient helps

Nutrient Java SDK runs the full form-detection pipeline behind a single method call. It handles:

Rendering each page of the document to a bitmap at the resolution the detection model expects
Running the form-field detection model on every page and classifying each region as text, checkbox, or signature
Recording each field’s type, bounding box, and confidence
Serializing the result to JSON

The result is structured data you can index, validate, or feed into a downstream workflow.

Supported field types and limits

The model recognizes three field types: text, checkbox, and signature. Because detection runs on rendered page images, accuracy depends on image quality. Clean scans produce better results than heavily compressed or skewed pages. Form detection requires the vision form feature in your license.

Prepare the project

Set a package name and create the main class:

package io.nutrient.Sample;

Import the required classes from the SDK:

import io.nutrient.sdk.Document;
import io.nutrient.sdk.Vision;
import io.nutrient.sdk.exceptions.NutrientException;

import java.io.FileWriter;
import java.io.IOException;

public class ExtractFormFieldsFromImage {

Load the document

Open the document with try-with-resources so the SDK closes resources after processing:

    public static void main(String[] args) throws NutrientException, IOException {
        try (Document document = Document.open("input_forms_detection.pdf")) {

Detect form fields

Create a vision instance from the document with Vision.set(document), then call detectForms(). Detection runs locally and offline:

            Vision vision = Vision.set(document);
            String formsJson = vision.detectForms();

Write the JSON result to a file for downstream processing:

            try (FileWriter writer = new FileWriter("output.json")) {
                writer.write(formsJson);
            }
        }
    }
}

Understand the output

detectForms() returns structured JSON. The elements array holds one form element per page. Each form element includes its pageNumber and a fields list, so fields from a multi-page document stay grouped by the page they came from. Each field includes:

fieldType — The detected type: Text, Checkbox, or Signature.
bounds — The bounding box of the field on the page.
confidence — The detection confidence for the field.
id — A unique identifier for the field.

Tune detection

If the model misses a class or detects too many of them, adjust the logit-bias settings in form recognition settings — textLogitBias, checkboxLogitBias, and signatureLogitBias. A positive value increases the rate of that class, and a negative value suppresses it. The default of 0 applies no bias. These settings affect detection itself, so they apply to every detection run.

Handle errors

Vision API throws VisionException, which derives from NutrientException, when detection fails.

Common failure scenarios include:

The document can’t be read due to path or permission issues
The page produces no renderable image
The form detection model is missing or inaccessible, or the feature isn’t licensed

In production code:

Catch NutrientException.
Return a clear error message.
Log failure details for debugging.

Conclusion

The workflow for extracting form-field data from an image is:

Open the source document using try-with-resources for automatic resource cleanup.
Create a vision instance with Vision.set().
Call detectForms() to detect every field and export the result as JSON.
Write the JSON to a file for indexing, validation, or downstream processing.
Handle NutrientException for robust error recovery.

Detection runs locally and offline. To assign semantic labels to each field with a vision model, refer to the label form fields with a VLM guide. To produce a fillable PDF instead of data, refer to the detect and add form fields guide.

For related image extraction workflows, refer to the Java SDK guides.

Download the sample package to explore form-field extraction.