Detecting and adding form fields to a PDF document
Scanned forms — tax filings, healthcare intake sheets, lease agreements, expense reports — usually arrive as image-based PDFs. The page shows boxes, checkboxes, and signature lines, but the PDF has no AcroForm structure behind them, so end users can’t fill the form in a viewer and downstream tools can’t read the values. Adding those fields by hand means measuring rectangles for every blank on every page.
This sample shows how to detect form-field regions on each page of a document and add matching AcroForm fields automatically using Nutrient Python SDK. The input can be any document format the SDK supports. If the input isn’t already a PDF, the SDK converts it to PDF when you create the editor.
Download sampleHow Nutrient helps
Nutrient Python SDK handles the full detection-and-creation pipeline behind a single method call. The SDK takes care of:
- Implicitly converting non-PDF inputs (images, multi-page TIFFs, Office documents) to PDF when the editor is created
- Rendering each PDF page to a bitmap at the resolution the detection model expects
- Running the form-field detection model and classifying each region as text, checkbox, or signature
- Mapping image coordinates back to PDF page coordinates (including the Y-axis flip)
- Adding a matching AcroForm widget for each detected region with a unique field name
The result is a standard PDF with interactive form fields that any viewer can fill or any downstream tool can read.
Supported field types and limits
The current detection model classifies regions into three types: text, checkbox, and signature. The SDK maps them to PdfTextField, PdfCheckBoxField, and PdfSignatureField respectively. Radio buttons aren’t yet emitted by the model and are skipped. Detection runs on the rendered page image, so accuracy depends on the visual quality of the original document — clean scans produce better results than heavily compressed or skewed pages.
Preparing the project
Import the classes used in the sample:
from nutrient_sdk import Documentfrom nutrient_sdk import PdfEditorfrom nutrient_sdk import NutrientExceptionDetecting and adding form fields
The main() function opens the source document inside a context manager(opens in a new tab), creates a PDF editor, and calls detect_and_add_form_fields() to process every page in a single call. The context manager closes the document automatically when the block ends, even if an error is raised:
def main(): try: with Document.open("input_forms_detection.pdf") as document: editor = PdfEditor.edit(document) editor.detect_and_add_form_fields()PdfEditor.edit(document) attaches an editor to the open document. If the document isn’t already a PDF, the SDK converts it to PDF at this step so the rest of the pipeline works on a uniform page representation. Calling editor.detect_and_add_form_fields() walks every page, runs detection, and adds an AcroForm widget for each detected region. Existing form fields on the document aren’t removed — detection only adds new fields and synthesizes unique names so it doesn’t collide with anything already present.
Saving the result
Save the modified document to a new file and close the editor. Wrap the call in try/except on NutrientException to surface any licensing, model-loading, or I/O issue that the SDK reports:
editor.save_as("output.pdf") editor.close() except NutrientException as e: print(f"Error: {e}")
if __name__ == "__main__": main()Conclusion
The workflow for auto-populating an image-based form is:
- Open the source document.
- Create a
PdfEditorfor the document. - Call
detect_and_add_form_fields()to detect and add fields across every page. - Save the result and close the editor.
The output is a standard PDF with interactive form fields, so existing PDF viewers can fill the form and downstream tools can read the values without any extra configuration.