---
title: "Labeling form fields with a vision language model | Nutrient Python SDK"
canonical_url: "https://www.nutrient.io/guides/python/extraction/label-form-fields-with-vlm/"
md_url: "https://www.nutrient.io/guides/python/extraction/label-form-fields-with-vlm.md"
last_updated: "2026-06-09T19:34:32.777Z"
description: "Detect form fields and assign semantic labels using a vision language model with Nutrient Python SDK."
---

# Labeling form fields with a vision language model

Form-field detection locates the fillable regions on a page, but a bounding box alone doesn't tell you what each field means. AI labeling adds a human-readable semantic label to each detected field, such as "First name" or "Date of birth", by sending the page to a vision language model (VLM).

This sample builds on offline form-field detection. For the detection basics — and a fully offline workflow with no model contacted — refer to the [extract form fields from an image](https://www.nutrient.io/guides/python/extraction/extract-form-fields-from-image.md) guide. Here you connect a VLM provider and turn labeling on with Nutrient Python SDK.

[Download sample](https://www.nutrient.io/downloads/samples/python/label-form-fields-with-vlm.zip)

## How Nutrient helps

Nutrient Python SDK runs detection and labeling behind a single method call. With labeling enabled, it also:

- Draws numbered marks over each detected field on a rendered copy of the page

- Sends the annotated page to the VLM you select and reads each field's semantic label back

- Optionally drops detections the model judges to be false positives

- Records each field's type, bounding box, confidence, and assigned label in JSON

The result is structured data you can index, validate, or feed into a downstream workflow.

## Prerequisites

AI labeling requires a reachable VLM endpoint. The SDK does not provision or start a VLM service for you.

- Configure a reachable VLM endpoint in your environment.

- Configure `api_endpoint` and `model` in [custom VLM API settings](https://www.nutrient.io/api/python/settings/advanced/vision/custom-vlm-api-settings.md).

- By default, the SDK may assume:
  - `api_endpoint`: `http://localhost:1234/v1`
  - `model`: `qwen/qwen3-vl-8b`

- For clarity and reliability, set both `api_endpoint` and `model` explicitly.

- Example with [LM Studio](https://lmstudio.ai/):
  - Run LM Studio in server mode.
  - Load a compatible vision model such as Qwen3-VL (4B, 8B, or larger depending on your hardware).

- Make sure the endpoint is running before you call `detect_forms()` with labeling enabled.

If no VLM endpoint is available, labeling fails at runtime. Leave `enable_ai_labeling` at its default of `False` to run detection only and keep the workflow offline.

## Connect a vision model

Labeling uses the same provider configuration as the rest of the Vision API, so you don't configure a separate endpoint for form labeling. Set the provider in [vision settings](https://www.nutrient.io/api/python/settings/vision/vision-settings.md#provider) and fill in the matching provider settings class:

- **Custom / local (default)** — An OpenAI-compatible server such as [LM Studio](https://lmstudio.ai/), Ollama, or vLLM. Configure [custom VLM API settings](https://www.nutrient.io/api/python/settings/advanced/vision/custom-vlm-api-settings.md).

- **OpenAI** — Configure [OpenAI API endpoint settings](https://www.nutrient.io/api/python/settings/vision/advanced/open-ai-api-endpoint-settings.md).

## Complete implementation

Start by importing the classes used in the sample:

```python

from nutrient_sdk import Document, Vision, VlmProvider, NutrientException

```

## Load the document

Open the document in a [context manager](https://docs.python.org/3/reference/datamodel.html#context-managers) so resources are cleaned up after processing:

```python

def main():
    try:
        with Document.open("input_forms_detection.pdf") as document:

```

## Configure AI labeling

Select the provider, point it at your vision model, then opt in with `enable_ai_labeling`:

```python

            # Select the vision model provider (the same setting Vision.describe() uses)

            document.settings.vision_settings.provider = VlmProvider.Custom

            # Configure the matching provider settings class

            vlm = document.settings.custom_vlm_api_settings
            vlm.api_endpoint = "http://localhost:1234/v1"
            vlm.model = "qwen/qwen3-vl-8b"

            # Turn on labeling and drop detections the model judges to be false positives

            form_labeling = document.settings.form_labeling_settings
            form_labeling.enable_ai_labeling = True
            form_labeling.enable_ai_remove_false_positives = True

            # Optional: constrain labels to a known vocabulary

            form_labeling.candidate_labels = "First name, Last name, Date of birth, Signature"

```

## Detect and label form fields

Create a vision instance from the document with `Vision.set(document)`, then call `detect_forms()`. The same call covers both modes; it includes labels because `enable_ai_labeling` is set:

```python

            vision = Vision.set(document)
            forms_json = vision.detect_forms()

```

Write the JSON result to a file for downstream processing:

```python

            with open("output.json", "w") as f:
                f.write(forms_json)
    except NutrientException as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

```

## Match labels to a vocabulary

Free-form labels can vary between runs ("First name" vs. "Given name"), which makes them hard to map to a database or template. Supply a vocabulary of preferred labels with `candidate_labels`, as shown above, and the model maps each field to one when it fits. If no label fits, it invents a concise new label.

Pass the labels as newline- or comma-separated text. A matched label uses the casing you supplied, and each field's `labelSource` records whether the label was `matched` or `invented`. Leave `candidate_labels` empty, which is the default, for free-form labeling.

## Understand the output

`detect_forms()` returns structured JSON. The `elements` array holds one form element per page. Each form element includes its `pageNumber` and a `fields` list, so fields from a multi-page document stay grouped by the page they came from. Each field includes:

- **`fieldType`** — The detected type: `Text`, `Checkbox`, or `Signature`.

- **`bounds`** — The bounding box of the field on the page.

- **`confidence`** — The detection confidence for the field.

- **`label`** — The AI-assigned semantic label (for example, "First name"). Present only when AI labeling is enabled.

- **`labelSource`** — `matched` or `invented`, present only when a candidate vocabulary was supplied.

- **`id`** — A unique identifier for the field.

## Handle errors

Vision API raises `VisionException`, which derives from `NutrientException`, when detection or labeling fails.

Common failure scenarios include:

- The document can't be read due to path or permission issues

- The page produces no renderable image

- The form detection model is missing or inaccessible, or the feature isn't licensed

- AI labeling is enabled but the selected provider's endpoint is unreachable

In production code:

- Catch `NutrientException`.

- Return a clear error message.

- Log failure details for debugging.

- Consider running detection only, with labeling disabled, as a fallback when the vision endpoint is unavailable.

## Conclusion

The workflow for labeling form fields with a vision model is:

1. Open the source document using a [context manager](https://docs.python.org/3/reference/datamodel.html#context-managers) for automatic resource cleanup.

2. Select a provider with `vision_settings.provider`, configure the matching provider settings class, then set `enable_ai_labeling` on `form_labeling_settings`.

3. Create a vision instance with `Vision.set()`.

4. Call `detect_forms()` to detect every field, assign a semantic label, and export the result as JSON.

5. Write the JSON to a file for indexing, validation, or downstream processing.

6. Handle `NutrientException` for robust error recovery.

Labeling adds semantic meaning when a vision model is available. For offline detection with no model contacted, refer to the [extract form fields from an image](https://www.nutrient.io/guides/python/extraction/extract-form-fields-from-image.md) guide. To produce a fillable PDF instead of data, refer to the [detect and add form fields](https://www.nutrient.io/guides/python/editor/detect-and-add-form-fields.md) guide.

For related image extraction workflows, refer to the [Python SDK](https://www.nutrient.io/guides/python.md) guides.

Download the [sample package](https://www.nutrient.io/downloads/samples/python/label-form-fields-with-vlm.zip) to explore form-field labeling.
---

## Related pages

- [Speeding up first ICR operation by predownloading models](/guides/python/extraction/speed-up-first-icr-by-downloading-requirements.md)
- [Extracting text from PDF documents](/guides/python/extraction/pdf-to-text.md)
- [Extracting text from multilingual images](/guides/python/extraction/read-text-from-image-multi-language.md)
- [Extracting structured data from documents](/guides/python/extraction/extract-structured-data.md)
- [Generating image descriptions using Claude](/guides/python/extraction/describe-image-with-claude.md)
- [Extracting data from images using vision language models](/guides/python/extraction/extract-data-from-image-vlm.md)
- [Generating image descriptions using OpenAI](/guides/python/extraction/describe-image-with-openai.md)
- [Extracting text from images](/guides/python/extraction/read-text-from-image.md)
- [Generating image descriptions using local AI](/guides/python/extraction/describe-image-with-local-ai.md)
- [Nutrient Python SDK extraction guides](/guides/python/extraction.md)
- [Applying OCR to a PDF document](/guides/python/extraction/apply-ocr-to-pdf.md)
- [Extracting form fields from images](/guides/python/extraction/extract-form-fields-from-image.md)
- [Extracting data from images using OCR](/guides/python/extraction/extract-data-from-image-ocr.md)
- [Applying OCR to a PDF page](/guides/python/extraction/apply-ocr-to-pdf-page.md)
- [Extracting structured JSON data from PDF documents](/guides/python/extraction/json-data-extraction.md)
- [Extracting data from images using ICR](/guides/python/extraction/extract-data-from-image-icr.md)