---
title: "Generating image descriptions using local AI | Nutrient Python SDK"
canonical_url: "https://www.nutrient.io/guides/python/extraction/describe-image-with-local-ai/"
md_url: "https://www.nutrient.io/guides/python/extraction/describe-image-with-local-ai.md"
last_updated: "2026-05-30T02:20:01.349Z"
description: "Generate accessible image descriptions using local AI models with Nutrient Python SDK."
---

# Generating image descriptions using local AI

Use local AI image description when you need privacy, cost control, or offline operation.

Common use cases include:

- On-premises accessibility workflows

- Medical or regulated environments with local-only data handling

- Secure networks where external APIs are restricted

- High-volume processing without per-image API fees

- Offline-capable applications

This guide uses a local OpenAI-compatible VLM endpoint with Nutrient Vision API.

[Download sample](https://www.nutrient.io/downloads/samples/python/describe-image-with-local-ai.zip)

## How Nutrient helps

Nutrient Python SDK handles local VLM integration, endpoint configuration, and request/response processing.

The SDK handles:

- OpenAI-compatible endpoint formatting and communication

- Image encoding and multimodal payload construction

- Model parameters such as temperature and max tokens

- Local server and model runtime failure handling

## Complete implementation

This example generates image descriptions using a local AI endpoint:

```python

from nutrient_sdk import Document, Vision

```

## Opening the image file and configuring the local server

Open the image in a [context manager](https://docs.python.org/3/reference/datamodel.html#context-managers) and configure local endpoint settings if needed.

In this sample:

- The SDK supports PNG, JPEG, GIF, BMP, and TIFF.

- The default endpoint is `http://localhost:1234/v1`.

- The default model is `qwen/qwen3-vl-4b`.

- `custom_vlm_api_settings` overrides defaults.

```python

with Document.open("input_photo.png") as document:
    # Optional: Configure the VLM API endpoint settings

    # These settings are customizable based on your VLM provider

    vlm_settings = document.settings.custom_vlm_api_settings
    vlm_settings.api_endpoint = "http://localhost:1234/v1"
    vlm_settings.model = "qwen/qwen3-vl-4b"

```

## Creating a vision instance

Create a vision instance with `Vision.set(document)`.

Before calling vision methods, ensure:

- The local server is running.

- A vision-capable model is loaded.

- The configured endpoint is reachable.

```python

    vision = Vision.set(document)

```

## Generating the description

Call `vision.describe()` to generate a natural language description.

The SDK handles image encoding, request construction, and response parsing:

```python

    description = vision.describe()

```

## Outputting the description

Print the description for review, or store it in your application.

Common destinations include:

- Database fields

- JSON output files

- HTML `alt` attributes

```python

    print("Image description:")
    print(description)

```

## Understanding the output

`describe()` returns natural language text generated by the local model.

Descriptions are typically:

- **Concise** — Focused on key subjects and details, often one to three sentences

- **Accessible** — Suitable for users who rely on screen readers

- **Accurate** — Based on visible content only

- **Model-dependent quality** — Quality varies by model architecture and size

Use this output for accessibility metadata, image search, and document workflows, all with local processing.

## Configuring the VLM API endpoint

Vision API uses OpenAI-compatible endpoints. Configure `custom_vlm_api_settings` if your setup differs from defaults:

- `api_endpoint` — Base URL for the OpenAI-compatible API (default: `http://localhost:1234/v1`)

- `api_key` — API key for secured deployments (optional on many local servers)

- `model` — Model identifier (default: `qwen/qwen3-vl-4b`)

- `temperature` — Creativity control (`0.0` deterministic, `1.0` varied phrasing)

- `max_tokens` — Response token limit (`-1` unlimited)

**Local server setup examples**:

- **LM Studio** — Start server with vision model loaded, default endpoint `http://localhost:1234/v1`

- **Ollama** — Run `ollama serve` with vision model, default endpoint `http://localhost:11434/v1`

- **vLLM** — Launch with `--api-key` flag for authentication, custom port configuration

The SDK handles request formatting and response parsing for local endpoints.

## Error handling

The SDK raises `NutrientException` when vision operations fail.

Common failure scenarios include:

- The input image can’t be read due to path, permission, or format issues

- The local VLM server isn’t running or reachable

- No compatible vision model is loaded

- Responses time out on large images or heavy models

- Available CPU/GPU memory is insufficient

- Endpoint or authentication settings are invalid

In production code:

- Catch `NutrientException`.

- Return a clear error message.

- Log failure details for debugging.

- Add retry logic for transient local server issues.

## Conclusion

Use this workflow to generate image descriptions with local AI:

1. Open the image file using a [context manager](https://docs.python.org/3/reference/datamodel.html#context-managers) for automatic resource cleanup.

2. The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF.

3. Vision API uses OpenAI-compatible endpoints for local VLM servers by default.

4. Default configuration connects to `http://localhost:1234/v1` with model `qwen/qwen3-vl-4b`.

5. Supported local VLM servers include LM Studio, Ollama, vLLM, and custom inference servers.

6. Optionally configure custom server settings through `document.settings.custom_vlm_api_settings` property assignments.

7. Create a vision instance with `Vision.set()` bound to the document for local AI processing.

8. Generate the description with `vision.describe()`, which sends the image to the local server endpoint and returns natural language text.

9. The SDK encodes image data, constructs OpenAI-compatible multimodal requests, and parses responses automatically.

10. Generated descriptions are concise (1–3 sentences), accessible (WCAG-compliant alt text), accurate (observable details only), and model-dependent.

11. Description quality varies by model size — larger models (7B+) produce more nuanced descriptions than smaller variants (4B).

12. Print or save the description for use in accessibility systems, content management, or cataloging workflows.

13. Handle `NutrientException` for vision processing failures, including server unavailable, model not loaded, or timeout errors.

14. The context manager ensures proper resource cleanup when processing completes or exceptions occur.

For related image workflows, refer to the [Python SDK guides](https://www.nutrient.io/guides/python.md).

Download [this ready-to-use sample package](https://www.nutrient.io/downloads/samples/python/describe-image-with-local-ai.zip) to explore local AI image description.
---

## Related pages

- [Generating image descriptions using Claude](/guides/python/extraction/describe-image-with-claude.md)
- [Extracting data from images using ICR](/guides/python/extraction/extract-data-from-image-icr.md)
- [Applying OCR to a PDF page](/guides/python/extraction/apply-ocr-to-pdf-page.md)
- [Extracting text from multilingual images](/guides/python/extraction/read-text-from-image-multi-language.md)
- [Nutrient Python SDK extraction guides](/guides/python/extraction.md)
- [Extracting structured JSON data from PDF documents](/guides/python/extraction/json-data-extraction.md)
- [Extracting data from images using vision language models](/guides/python/extraction/extract-data-from-image-vlm.md)
- [Extracting text from images](/guides/python/extraction/read-text-from-image.md)
- [Extracting data from images using OCR](/guides/python/extraction/extract-data-from-image-ocr.md)
- [Speeding up first ICR operation by predownloading models](/guides/python/extraction/speed-up-first-icr-by-downloading-requirements.md)
- [Applying OCR to a PDF document](/guides/python/extraction/apply-ocr-to-pdf.md)
- [Generating image descriptions using OpenAI](/guides/python/extraction/describe-image-with-openai.md)

