Labeling form fields with a vision language model
Form-field detection locates the fillable regions on a page, but a bounding box alone doesn’t tell you what each field means. AI labeling adds a human-readable semantic label to each detected field, such as “First name” or “Date of birth”, by sending the page to a vision language model (VLM).
This sample builds on offline form-field detection. For the detection basics — and a fully offline workflow with no model contacted — refer to the extract form fields from an image guide. Here you connect a VLM provider and turn labeling on with Nutrient Python SDK.
Download sampleHow Nutrient helps
Nutrient Python SDK runs detection and labeling behind a single method call. With labeling enabled, it also:
- Draws numbered marks over each detected field on a rendered copy of the page
- Sends the annotated page to the VLM you select and reads each field’s semantic label back
- Optionally drops detections the model judges to be false positives
- Records each field’s type, bounding box, confidence, and assigned label in JSON
The result is structured data you can index, validate, or feed into a downstream workflow.
Prerequisites
AI labeling requires a reachable VLM endpoint. The SDK does not provision or start a VLM service for you.
- Configure a reachable VLM endpoint in your environment.
- Configure
api_endpointandmodelin custom VLM API settings. - By default, the SDK may assume:
api_endpoint:http://localhost:1234/v1model:qwen/qwen3-vl-8b
- For clarity and reliability, set both
api_endpointandmodelexplicitly. - Example with LM Studio(opens in a new tab):
- Run LM Studio in server mode.
- Load a compatible vision model such as Qwen3-VL (4B, 8B, or larger depending on your hardware).
- Make sure the endpoint is running before you call
detect_forms()with labeling enabled.
If no VLM endpoint is available, labeling fails at runtime. Leave enable_ai_labeling at its default of False to run detection only and keep the workflow offline.
Connect a vision model
Labeling uses the same provider configuration as the rest of the Vision API, so you don’t configure a separate endpoint for form labeling. Set the provider in vision settings and fill in the matching provider settings class:
- Custom / local (default) — An OpenAI-compatible server such as LM Studio(opens in a new tab), Ollama, or vLLM. Configure custom VLM API settings.
- OpenAI — Configure OpenAI API endpoint settings.
Complete implementation
Start by importing the classes used in the sample:
from nutrient_sdk import Document, Vision, VlmProvider, NutrientExceptionLoad the document
Open the document in a context manager(opens in a new tab) so resources are cleaned up after processing:
def main(): try: with Document.open("input_forms_detection.pdf") as document:Configure AI labeling
Select the provider, point it at your vision model, then opt in with enable_ai_labeling:
# Select the vision model provider (the same setting Vision.describe() uses) document.settings.vision_settings.provider = VlmProvider.Custom
# Configure the matching provider settings class vlm = document.settings.custom_vlm_api_settings vlm.api_endpoint = "http://localhost:1234/v1" vlm.model = "qwen/qwen3-vl-8b"
# Turn on labeling and drop detections the model judges to be false positives form_labeling = document.settings.form_labeling_settings form_labeling.enable_ai_labeling = True form_labeling.enable_ai_remove_false_positives = True
# Optional: constrain labels to a known vocabulary form_labeling.candidate_labels = "First name, Last name, Date of birth, Signature"Detect and label form fields
Create a vision instance from the document with Vision.set(document), then call detect_forms(). The same call covers both modes; it includes labels because enable_ai_labeling is set:
vision = Vision.set(document) forms_json = vision.detect_forms()Write the JSON result to a file for downstream processing:
with open("output.json", "w") as f: f.write(forms_json) except NutrientException as e: print(f"Error: {e}")
if __name__ == "__main__": main()Match labels to a vocabulary
Free-form labels can vary between runs (“First name” vs. “Given name”), which makes them hard to map to a database or template. Supply a vocabulary of preferred labels with candidate_labels, as shown above, and the model maps each field to one when it fits. If no label fits, it invents a concise new label.
Pass the labels as newline- or comma-separated text. A matched label uses the casing you supplied, and each field’s labelSource records whether the label was matched or invented. Leave candidate_labels empty, which is the default, for free-form labeling.
Understand the output
detect_forms() returns structured JSON. The elements array holds one form element per page. Each form element includes its pageNumber and a fields list, so fields from a multi-page document stay grouped by the page they came from. Each field includes:
fieldType— The detected type:Text,Checkbox, orSignature.bounds— The bounding box of the field on the page.confidence— The detection confidence for the field.label— The AI-assigned semantic label (for example, “First name”). Present only when AI labeling is enabled.labelSource—matchedorinvented, present only when a candidate vocabulary was supplied.id— A unique identifier for the field.
Handle errors
Vision API raises VisionException, which derives from NutrientException, when detection or labeling fails.
Common failure scenarios include:
- The document can’t be read due to path or permission issues
- The page produces no renderable image
- The form detection model is missing or inaccessible, or the feature isn’t licensed
- AI labeling is enabled but the selected provider’s endpoint is unreachable
In production code:
- Catch
NutrientException. - Return a clear error message.
- Log failure details for debugging.
- Consider running detection only, with labeling disabled, as a fallback when the vision endpoint is unavailable.
Conclusion
The workflow for labeling form fields with a vision model is:
- Open the source document using a context manager(opens in a new tab) for automatic resource cleanup.
- Select a provider with
vision_settings.provider, configure the matching provider settings class, then setenable_ai_labelingonform_labeling_settings. - Create a vision instance with
Vision.set(). - Call
detect_forms()to detect every field, assign a semantic label, and export the result as JSON. - Write the JSON to a file for indexing, validation, or downstream processing.
- Handle
NutrientExceptionfor robust error recovery.
Labeling adds semantic meaning when a vision model is available. For offline detection with no model contacted, refer to the extract form fields from an image guide. To produce a fillable PDF instead of data, refer to the detect and add form fields guide.
For related image extraction workflows, refer to the Python SDK guides.
Download the sample package to explore form-field labeling.