---
title: "Labeling form fields with a vision language model | Nutrient Java SDK"
canonical_url: "https://www.nutrient.io/guides/java/extraction/label-form-fields-with-vlm/"
md_url: "https://www.nutrient.io/guides/java/extraction/label-form-fields-with-vlm.md"
last_updated: "2026-06-09T21:11:56.021Z"
description: "Detect form fields and assign semantic labels using a vision language model with Nutrient Java SDK."
---

# Labeling form fields with a vision language model

Form-field detection locates the fillable regions on a page, but a bounding box alone doesn't tell you what each field means. AI labeling adds a human-readable semantic label to each detected field, such as "First name" or "Date of birth", by sending the page to a vision language model (VLM).

This sample builds on offline form-field detection. For the detection basics — and a fully offline workflow with no model contacted — refer to the [extract form fields from an image](https://www.nutrient.io/guides/java/extraction/extract-form-fields-from-image.md) guide. Here you connect a VLM provider and turn labeling on with Nutrient Java SDK.

[Download sample](https://www.nutrient.io/downloads/samples/java/label-form-fields-with-vlm.zip)

## How Nutrient helps

Nutrient Java SDK runs detection and labeling behind a single method call. With labeling enabled, it also:

- Draws numbered marks over each detected field on a rendered copy of the page

- Sends the annotated page to the VLM you select and reads each field's semantic label back

- Optionally drops detections the model judges to be false positives

- Records each field's type, bounding box, confidence, and assigned label in JSON

The result is structured data you can index, validate, or feed into a downstream workflow.

## Prerequisites

AI labeling requires a reachable VLM endpoint. The SDK does not provision or start a VLM service for you.

- Configure a reachable VLM endpoint in your environment.

- Configure the API endpoint and model in [custom VLM API settings](https://www.nutrient.io/api/java-sdk/nutrient-java-sdk/io.nutrient.sdk.settings/custom-vlm-api-settings/index.html).

- By default, the SDK may assume:
  - API endpoint: `http://localhost:1234/v1`
  - Model: `qwen/qwen3-vl-8b`

- For clarity and reliability, set both the API endpoint and model explicitly.

- Example with [LM Studio](https://lmstudio.ai/):
  - Run LM Studio in server mode.
  - Load a compatible vision model such as Qwen3-VL (4B, 8B, or larger depending on your hardware).

- Make sure the endpoint is running before you call `detectForms()` with labeling enabled.

If no VLM endpoint is available, labeling fails at runtime. Leave `enableAiLabeling` at its default of `false` to run detection only and keep the workflow offline.

## Connect a vision model

Labeling uses the same provider configuration as the rest of the Vision API, so you don't configure a separate endpoint for form labeling. Set the provider in [vision settings](https://www.nutrient.io/api/java-sdk/nutrient-java-sdk/io.nutrient.sdk.settings/vision-settings/set-provider.html) and fill in the matching provider settings class:

- **Custom / local (default)** — An OpenAI-compatible server such as [LM Studio](https://lmstudio.ai/), Ollama, or vLLM. Configure [custom VLM API settings](https://www.nutrient.io/api/java-sdk/nutrient-java-sdk/io.nutrient.sdk.settings/custom-vlm-api-settings/index.html).

- **OpenAI** — Configure [OpenAI API endpoint settings](https://www.nutrient.io/api/java-sdk/nutrient-java-sdk/io.nutrient.sdk.settings/open-aiapi-endpoint-settings/index.html).

## Prepare the project

Set a package name and create the main class:

```java

package io.nutrient.Sample;

```

Import the required classes from the SDK:

```java

import io.nutrient.sdk.Document;
import io.nutrient.sdk.Vision;
import io.nutrient.sdk.enums.VlmProvider;
import io.nutrient.sdk.exceptions.NutrientException;

import java.io.FileWriter;
import java.io.IOException;

public class LabelFormFieldsWithVlm {

```

## Load the document

Open the document with try-with-resources so the SDK closes resources after processing:

```java

    public static void main(String[] args) throws NutrientException, IOException {
        try (Document document = Document.open("input_forms_detection.pdf")) {

```

## Configure AI labeling

Set the provider to your vision model, then opt in with `setEnableAiLabeling`:

```java

            // Select the vision model provider (the same setting Vision.describe() uses)
            document.getSettings().getVisionSettings().setProvider(VlmProvider.Custom);

            // Configure the matching provider settings class
            var vlm = document.getSettings().getCustomVlmApiSettings();
            vlm.setApiEndpoint("http://localhost:1234/v1");
            vlm.setModel("qwen/qwen3-vl-8b");

            // Turn on labeling and drop detections the model judges to be false positives
            var formLabeling = document.getSettings().getFormLabelingSettings();
            formLabeling.setEnableAiLabeling(true);
            formLabeling.setEnableAiRemoveFalsePositives(true);

            // Optional: constrain labels to a known vocabulary
            formLabeling.setCandidateLabels("First name, Last name, Date of birth, Signature");

```

## Detect and label form fields

Create a vision instance from the document with `Vision.set(document)`, then call `detectForms()`. The same call covers both modes; it includes labels because `enableAiLabeling` is set:

```java

            Vision vision = Vision.set(document);
            String formsJson = vision.detectForms();

```

Write the JSON result to a file for downstream processing:

```java

            try (FileWriter writer = new FileWriter("output.json")) {
                writer.write(formsJson);
            }
        }
    }
}

```

## Match labels to a vocabulary

Free-form labels can vary between runs ("First name" vs. "Given name"), which makes them hard to map to a database or template. Supply a vocabulary of preferred labels with `setCandidateLabels`, as shown above, and the model maps each field to one when it fits. If no label fits, it invents a concise new label.

Pass the labels as newline- or comma-separated text. A matched label uses the casing you supplied, and each field's `labelSource` records whether the label was `matched` or `invented`. Leave candidate labels empty, which is the default, for free-form labeling.

## Understand the output

`detectForms()` returns structured JSON. The `elements` array holds one form element per page. Each form element includes its `pageNumber` and a `fields` list, so fields from a multi-page document stay grouped by the page they came from. Each field includes:

- **`fieldType`** — The detected type: `Text`, `Checkbox`, or `Signature`.

- **`bounds`** — The bounding box of the field on the page.

- **`confidence`** — The detection confidence for the field.

- **`label`** — The AI-assigned semantic label (for example, "First name"). Present only when AI labeling is enabled.

- **`labelSource`** — `matched` or `invented`, present only when a candidate vocabulary was supplied.

- **`id`** — A unique identifier for the field.

## Handle errors

Vision API throws `VisionException`, which derives from `NutrientException`, when detection or labeling fails.

Common failure scenarios include:

- The document can't be read due to path or permission issues

- The page produces no renderable image

- The form detection model is missing or inaccessible, or the feature isn't licensed

- AI labeling is enabled but the selected provider's endpoint is unreachable

In production code:

- Catch `NutrientException`.

- Return a clear error message.

- Log failure details for debugging.

- Consider running detection only, with labeling disabled, as a fallback when the vision endpoint is unavailable.

## Conclusion

The workflow for labeling form fields with a vision model is:

1. Open the source document using try-with-resources for automatic resource cleanup.

2. Select a provider with `getVisionSettings().setProvider(...)`, configure the matching provider settings class, then call `setEnableAiLabeling(true)` on `getFormLabelingSettings()`.

3. Create a vision instance with `Vision.set()`.

4. Call `detectForms()` to detect every field, assign a semantic label, and export the result as JSON.

5. Write the JSON to a file for indexing, validation, or downstream processing.

6. Handle `NutrientException` for robust error recovery.

Labeling adds semantic meaning when a vision model is available. For offline detection with no model contacted, refer to the [extract form fields from an image](https://www.nutrient.io/guides/java/extraction/extract-form-fields-from-image.md) guide. To produce a fillable PDF instead of data, refer to the [detect and add form fields](https://www.nutrient.io/guides/java/editor/detect-and-add-form-fields.md) guide.

For related image extraction workflows, refer to the [Java SDK](https://www.nutrient.io/guides/java.md) guides.

Download the [sample package](https://www.nutrient.io/downloads/samples/java/label-form-fields-with-vlm.zip) to explore form-field labeling.
---

## Related pages

- [Applying OCR to a PDF document](/guides/java/extraction/apply-ocr-to-pdf.md)
- [Applying OCR to a PDF page](/guides/java/extraction/apply-ocr-to-pdf-page.md)
- [Generating image descriptions using Claude](/guides/java/extraction/describe-image-with-claude.md)
- [Generating image descriptions using local AI](/guides/java/extraction/describe-image-with-local-ai.md)
- [Extracting data from images using OCR](/guides/java/extraction/extract-data-from-image-ocr.md)
- [Generating image descriptions using OpenAI](/guides/java/extraction/describe-image-with-openai.md)
- [Extracting data from images using ICR](/guides/java/extraction/extract-data-from-image-icr.md)
- [Extracting JSON data from a PDF document](/guides/java/extraction/json-data-extraction.md)
- [Extracting data from images using vision language models](/guides/java/extraction/extract-data-from-image-vlm.md)
- [Extracting form fields from images](/guides/java/extraction/extract-form-fields-from-image.md)
- [Extracting structured data from documents](/guides/java/extraction/extract-structured-data.md)
- [Extracting text from PDF documents](/guides/java/extraction/pdf-to-text.md)
- [Nutrient Java SDK extraction guides](/guides/java/extraction.md)
- [Extracting text from multilingual images](/guides/java/extraction/read-text-from-image-multi-language.md)
- [Extracting text from images](/guides/java/extraction/read-text-from-image.md)
- [Speeding up first ICR operation by predownloading models](/guides/java/extraction/speed-up-first-icr-by-downloading-requirements.md)

