One Shot
Send a single full-page image with bounding box coordinates in the prompt. The VLM performs spatial reasoning to classify each zone by its bbox.
Send a single full-page image with bounding box coordinates in the prompt. The VLM performs spatial reasoning to classify each zone by its bbox.