Labeling form fields with a vision language model
Form-field detection locates the fillable regions on a page, but a bounding box alone doesn’t tell you what each field means. AI labeling adds a human-readable semantic label to each detected field, such as “First name” or “Date of birth”, by sending the page to a vision language model (VLM).
This sample builds on offline form-field detection. For the detection basics — and a fully offline workflow with no model contacted — refer to the extract form fields from an image guide. Here you connect a VLM provider and turn labeling on with Nutrient Java SDK.
Download sampleHow Nutrient helps
Nutrient Java SDK runs detection and labeling behind a single method call. With labeling enabled, it also:
- Draws numbered marks over each detected field on a rendered copy of the page
- Sends the annotated page to the VLM you select and reads each field’s semantic label back
- Optionally drops detections the model judges to be false positives
- Records each field’s type, bounding box, confidence, and assigned label in JSON
The result is structured data you can index, validate, or feed into a downstream workflow.
Prerequisites
AI labeling requires a reachable VLM endpoint. The SDK does not provision or start a VLM service for you.
- Configure a reachable VLM endpoint in your environment.
- Configure the API endpoint and model in custom VLM API settings.
- By default, the SDK may assume:
- API endpoint:
http://localhost:1234/v1 - Model:
qwen/qwen3-vl-8b
- API endpoint:
- For clarity and reliability, set both the API endpoint and model explicitly.
- Example with LM Studio(opens in a new tab):
- Run LM Studio in server mode.
- Load a compatible vision model such as Qwen3-VL (4B, 8B, or larger depending on your hardware).
- Make sure the endpoint is running before you call
detectForms()with labeling enabled.
If no VLM endpoint is available, labeling fails at runtime. Leave enableAiLabeling at its default of false to run detection only and keep the workflow offline.
Connect a vision model
Labeling uses the same provider configuration as the rest of the Vision API, so you don’t configure a separate endpoint for form labeling. Set the provider in vision settings and fill in the matching provider settings class:
- Custom / local (default) — An OpenAI-compatible server such as LM Studio(opens in a new tab), Ollama, or vLLM. Configure custom VLM API settings.
- OpenAI — Configure OpenAI API endpoint settings.
Prepare the project
Set a package name and create the main class:
package io.nutrient.Sample;Import the required classes from the SDK:
import io.nutrient.sdk.Document;import io.nutrient.sdk.Vision;import io.nutrient.sdk.enums.VlmProvider;import io.nutrient.sdk.exceptions.NutrientException;
import java.io.FileWriter;import java.io.IOException;
public class LabelFormFieldsWithVlm {Load the document
Open the document with try-with-resources so the SDK closes resources after processing:
public static void main(String[] args) throws NutrientException, IOException { try (Document document = Document.open("input_forms_detection.pdf")) {Configure AI labeling
Set the provider to your vision model, then opt in with setEnableAiLabeling:
// Select the vision model provider (the same setting Vision.describe() uses) document.getSettings().getVisionSettings().setProvider(VlmProvider.Custom);
// Configure the matching provider settings class var vlm = document.getSettings().getCustomVlmApiSettings(); vlm.setApiEndpoint("http://localhost:1234/v1"); vlm.setModel("qwen/qwen3-vl-8b");
// Turn on labeling and drop detections the model judges to be false positives var formLabeling = document.getSettings().getFormLabelingSettings(); formLabeling.setEnableAiLabeling(true); formLabeling.setEnableAiRemoveFalsePositives(true);
// Optional: constrain labels to a known vocabulary formLabeling.setCandidateLabels("First name, Last name, Date of birth, Signature");Detect and label form fields
Create a vision instance from the document with Vision.set(document), then call detectForms(). The same call covers both modes; it includes labels because enableAiLabeling is set:
Vision vision = Vision.set(document); String formsJson = vision.detectForms();Write the JSON result to a file for downstream processing:
try (FileWriter writer = new FileWriter("output.json")) { writer.write(formsJson); } } }}Match labels to a vocabulary
Free-form labels can vary between runs (“First name” vs. “Given name”), which makes them hard to map to a database or template. Supply a vocabulary of preferred labels with setCandidateLabels, as shown above, and the model maps each field to one when it fits. If no label fits, it invents a concise new label.
Pass the labels as newline- or comma-separated text. A matched label uses the casing you supplied, and each field’s labelSource records whether the label was matched or invented. Leave candidate labels empty, which is the default, for free-form labeling.
Understand the output
detectForms() returns structured JSON. The elements array holds one form element per page. Each form element includes its pageNumber and a fields list, so fields from a multi-page document stay grouped by the page they came from. Each field includes:
fieldType— The detected type:Text,Checkbox, orSignature.bounds— The bounding box of the field on the page.confidence— The detection confidence for the field.label— The AI-assigned semantic label (for example, “First name”). Present only when AI labeling is enabled.labelSource—matchedorinvented, present only when a candidate vocabulary was supplied.id— A unique identifier for the field.
Handle errors
Vision API throws VisionException, which derives from NutrientException, when detection or labeling fails.
Common failure scenarios include:
- The document can’t be read due to path or permission issues
- The page produces no renderable image
- The form detection model is missing or inaccessible, or the feature isn’t licensed
- AI labeling is enabled but the selected provider’s endpoint is unreachable
In production code:
- Catch
NutrientException. - Return a clear error message.
- Log failure details for debugging.
- Consider running detection only, with labeling disabled, as a fallback when the vision endpoint is unavailable.
Conclusion
The workflow for labeling form fields with a vision model is:
- Open the source document using try-with-resources for automatic resource cleanup.
- Select a provider with
getVisionSettings().setProvider(...), configure the matching provider settings class, then callsetEnableAiLabeling(true)ongetFormLabelingSettings(). - Create a vision instance with
Vision.set(). - Call
detectForms()to detect every field, assign a semantic label, and export the result as JSON. - Write the JSON to a file for indexing, validation, or downstream processing.
- Handle
NutrientExceptionfor robust error recovery.
Labeling adds semantic meaning when a vision model is available. For offline detection with no model contacted, refer to the extract form fields from an image guide. To produce a fillable PDF instead of data, refer to the detect and add form fields guide.
For related image extraction workflows, refer to the Java SDK guides.
Download the sample package to explore form-field labeling.