---
title: "Generating image descriptions using local AI | Nutrient Java SDK"
canonical_url: "https://www.nutrient.io/guides/java/extraction/describe-image-with-local-ai/"
md_url: "https://www.nutrient.io/guides/java/extraction/describe-image-with-local-ai.md"
last_updated: "2026-06-10T14:59:56.779Z"
description: "Generate accessible image descriptions using local AI models with Nutrient Java SDK."
---

# Generating image descriptions using local AI

Use local AI image description when you need privacy, cost control, or offline operation.

Common use cases include:

- On-premises accessibility workflows

- Medical or regulated environments with local-only data handling

- Secure networks where external APIs are restricted

- High-volume processing without per-image API fees

- Offline-capable applications

This guide uses a local OpenAI-compatible VLM endpoint with Nutrient Vision API.

[Download sample](https://www.nutrient.io/downloads/samples/java/describe-image-with-local-ai.zip)

## How Nutrient helps

Nutrient Java SDK handles local VLM integration, endpoint configuration, and request/response processing.

The SDK handles:

- Local VLM server communication, endpoint configuration, and OpenAI-compatible API formatting

- Image encoding and multimodal request structures for local models

- Model parameters such as temperature, max tokens, and server-specific settings

- Local server failures and model loading issues

## Complete implementation

This example generates image descriptions using a local AI endpoint:

```java

package io.nutrient.Sample;

```

Import required classes and define the sample class:

```java

import io.nutrient.sdk.Document;
import io.nutrient.sdk.Vision;
import io.nutrient.sdk.exceptions.NutrientException;

import java.io.FileWriter;
import java.io.IOException;

public class DescribeImageWithLocalAi {

```

## Opening the image file

Create the main method and open the image in try-with-resources.

In this sample:

- Input can be PNG, JPEG, GIF, BMP, or TIFF.

- The default local endpoint is `http://localhost:1234/v1`.

- The default model is `qwen/qwen3-vl-4b`.

```java

    public static void main(String[] args) throws NutrientException, IOException {

```

## Creating a vision instance

Create a vision instance with `Vision.set(document)`.

Before calling vision methods, ensure:

- The local server is running.

- A vision-capable model is loaded.

- The configured endpoint is reachable.

```java

        try (Document document = Document.open("input_photo.png")) {

```

## Generating the description

Call `vision.describe()` to generate a natural language description.

The SDK handles image encoding, request construction, and response parsing:

```java

            Vision vision = Vision.set(document);

```

## Saving the description

Write the description to a text file.

This sample uses try-with-resources for both document and file-writer cleanup:

```java

            String description = vision.describe();

```

```java

            try (FileWriter writer = new FileWriter("output.txt")) {
                writer.write(description);
            }
        }
    }
}

```

## Understanding the output

`describe()` returns natural language text generated by the local model.

Descriptions are typically:

- **Concise** — Focused on key subjects and details, often one to three sentences

- **Accessible** — Suitable for users who rely on screen readers

- **Accurate** — Based on visible content only

- **Model-dependent quality** — Quality varies by model architecture and size

Use this output for accessibility metadata, image search, and document workflows, all with local processing.

## Configuring the VLM API endpoint

Vision API uses OpenAI-compatible endpoints. Configure `CustomVlmApiSettings` if your setup differs from defaults:

- `ApiEndpoint` — Base URL for the OpenAI-compatible API (default: `http://localhost:1234/v1`)

- `ApiKey` — API key for secured deployments (optional on many local servers)

- `Model` — Model identifier (default: `qwen/qwen3-vl-4b`)

- `Temperature` — Creativity control (`0.0` deterministic, `1.0` varied phrasing)

- `MaxTokens` — Response token limit (`-1` unlimited)

**Local server setup examples**:

- **LM Studio** — Start server with vision model loaded, default endpoint `http://localhost:1234/v1`

- **Ollama** — Run `ollama serve` with vision model, default endpoint `http://localhost:11434/v1`

- **vLLM** — Launch with `--api-key` flag for authentication, custom port configuration

The SDK handles request formatting and response parsing for local endpoints.

## Error handling

The sample can throw:

- `NutrientException` for vision and local server issues

- `IOException` for file I/O operations

Common failure scenarios include:

- The input image can’t be read due to path, permission, or format issues

- The local VLM server isn’t running or reachable

- No compatible vision model is loaded

- Responses time out on large images or heavy models

- Available CPU/GPU memory is insufficient

- Endpoint or authentication settings are invalid

- File writing fails due to path, disk, or permission issues

In production code:

- Catch `NutrientException` and `IOException`.

- Return clear error messages.

- Log failure details for debugging.

- Add retry logic for transient local server issues.

## Conclusion

Use this workflow to generate image descriptions with local AI:

1. Open the image file using try-with-resources for automatic resource cleanup.

2. The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF.

3. Vision API uses OpenAI-compatible endpoints for local VLM servers by default.

4. Default configuration connects to `http://localhost:1234/v1` with model `qwen/qwen3-vl-4b`.

5. Supported local VLM servers include LM Studio, Ollama, vLLM, and custom inference servers.

6. Create a vision instance with `Vision.set()` bound to the document for local AI processing.

7. Generate the description with `vision.describe()`, which sends the image to the local server endpoint and returns natural language text.

8. The SDK encodes image data, constructs OpenAI-compatible multimodal requests, and parses responses automatically.

9. Generated descriptions are concise (1–3 sentences), accessible (WCAG-compliant alt text), accurate (observable details only), and model-dependent.

10. Description quality varies by model size — larger models (7B+) produce more nuanced descriptions than smaller variants (4B).

11. Write the description to a file using try-with-resources with `FileWriter` for automatic resource cleanup.

12. Handle `NutrientException` for vision processing failures, including server unavailable, model not loaded, or timeout errors.

13. Handle `IOException` for file operations, including read failures or write errors when saving output.

14. Configure custom endpoints through `CustomVlmApiSettings` for non-default server configurations.

For related image workflows, refer to the [Java SDK guides](https://www.nutrient.io/guides/java.md).

Download [this ready-to-use sample package](https://www.nutrient.io/downloads/samples/java/describe-image-with-local-ai.zip) to explore local AI image description.
---

## Related pages

- [Applying OCR to a PDF page](/guides/java/extraction/apply-ocr-to-pdf-page.md)
- [Applying OCR to a PDF document](/guides/java/extraction/apply-ocr-to-pdf.md)
- [Generating image descriptions using Claude](/guides/java/extraction/describe-image-with-claude.md)
- [Generating image descriptions using OpenAI](/guides/java/extraction/describe-image-with-openai.md)
- [Extracting data from images using ICR](/guides/java/extraction/extract-data-from-image-icr.md)
- [Extracting data from images using OCR](/guides/java/extraction/extract-data-from-image-ocr.md)
- [Extracting data from images using vision language models](/guides/java/extraction/extract-data-from-image-vlm.md)
- [Nutrient Java SDK extraction guides](/guides/java/extraction.md)
- [Extracting structured data from documents](/guides/java/extraction/extract-structured-data.md)
- [Extracting form fields from images](/guides/java/extraction/extract-form-fields-from-image.md)
- [Labeling form fields with a vision language model](/guides/java/extraction/label-form-fields-with-vlm.md)
- [Extracting JSON data from a PDF document](/guides/java/extraction/json-data-extraction.md)
- [Extracting text from multilingual images](/guides/java/extraction/read-text-from-image-multi-language.md)
- [Extracting text from images](/guides/java/extraction/read-text-from-image.md)
- [Extracting text from PDF documents](/guides/java/extraction/pdf-to-text.md)
- [Speeding up first ICR operation by predownloading models](/guides/java/extraction/speed-up-first-icr-by-downloading-requirements.md)

