Generating image descriptions using local AI
Use local AI image description when you need privacy, cost control, or offline operation.
Common use cases include:
- On-premises accessibility workflows
- Medical or regulated environments with local-only data handling
- Secure networks where external APIs are restricted
- High-volume processing without per-image API fees
- Offline-capable applications
This guide uses a local OpenAI-compatible VLM endpoint with Nutrient Vision API.
Download sampleHow Nutrient helps
Nutrient Java SDK handles local VLM integration, endpoint configuration, and request/response processing.
The SDK handles:
- Local VLM server communication, endpoint configuration, and OpenAI-compatible API formatting
- Image encoding and multimodal request structures for local models
- Model parameters such as temperature, max tokens, and server-specific settings
- Local server failures and model loading issues
Complete implementation
This example generates image descriptions using a local AI endpoint:
package io.nutrient.Sample;Import required classes and define the sample class:
import io.nutrient.sdk.Document;import io.nutrient.sdk.Vision;import io.nutrient.sdk.exceptions.NutrientException;
import java.io.FileWriter;import java.io.IOException;
public class DescribeImageWithLocalAi {Opening the image file
Create the main method and open the image in try-with-resources.
In this sample:
- Input can be PNG, JPEG, GIF, BMP, or TIFF.
- The default local endpoint is
http://localhost:1234/v1. - The default model is
qwen/qwen3-vl-4b.
public static void main(String[] args) throws NutrientException, IOException {Creating a vision instance
Create a vision instance with Vision.set(document).
Before calling vision methods, ensure:
- The local server is running.
- A vision-capable model is loaded.
- The configured endpoint is reachable.
try (Document document = Document.open("input_photo.png")) {Generating the description
Call vision.describe() to generate a natural language description.
The SDK handles image encoding, request construction, and response parsing:
Vision vision = Vision.set(document);Saving the description
Write the description to a text file.
This sample uses try-with-resources for both document and file-writer cleanup:
String description = vision.describe(); try (FileWriter writer = new FileWriter("output.txt")) { writer.write(description); } } }}Understanding the output
describe() returns natural language text generated by the local model.
Descriptions are typically:
- Concise — Focused on key subjects and details, often one to three sentences
- Accessible — Suitable for users who rely on screen readers
- Accurate — Based on visible content only
- Model-dependent quality — Quality varies by model architecture and size
Use this output for accessibility metadata, image search, and document workflows, all with local processing.
Configuring the VLM API endpoint
Vision API uses OpenAI-compatible endpoints. Configure CustomVlmApiSettings if your setup differs from defaults:
ApiEndpoint— Base URL for the OpenAI-compatible API (default:http://localhost:1234/v1)ApiKey— API key for secured deployments (optional on many local servers)Model— Model identifier (default:qwen/qwen3-vl-4b)Temperature— Creativity control (0.0deterministic,1.0varied phrasing)MaxTokens— Response token limit (-1unlimited)
Local server setup examples:
- LM Studio — Start server with vision model loaded, default endpoint
http://localhost:1234/v1 - Ollama — Run
ollama servewith vision model, default endpointhttp://localhost:11434/v1 - vLLM — Launch with
--api-keyflag for authentication, custom port configuration
The SDK handles request formatting and response parsing for local endpoints.
Error handling
The sample can throw:
NutrientExceptionfor vision and local server issuesIOExceptionfor file I/O operations
Common failure scenarios include:
- The input image can’t be read due to path, permission, or format issues
- The local VLM server isn’t running or reachable
- No compatible vision model is loaded
- Responses time out on large images or heavy models
- Available CPU/GPU memory is insufficient
- Endpoint or authentication settings are invalid
- File writing fails due to path, disk, or permission issues
In production code:
- Catch
NutrientExceptionandIOException. - Return clear error messages.
- Log failure details for debugging.
- Add retry logic for transient local server issues.
Conclusion
Use this workflow to generate image descriptions with local AI:
- Open the image file using try-with-resources for automatic resource cleanup.
- The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF.
- Vision API uses OpenAI-compatible endpoints for local VLM servers by default.
- Default configuration connects to
http://localhost:1234/v1with modelqwen/qwen3-vl-4b. - Supported local VLM servers include LM Studio, Ollama, vLLM, and custom inference servers.
- Create a vision instance with
Vision.set()bound to the document for local AI processing. - Generate the description with
vision.describe(), which sends the image to the local server endpoint and returns natural language text. - The SDK encodes image data, constructs OpenAI-compatible multimodal requests, and parses responses automatically.
- Generated descriptions are concise (1–3 sentences), accessible (WCAG-compliant alt text), accurate (observable details only), and model-dependent.
- Description quality varies by model size — larger models (7B+) produce more nuanced descriptions than smaller variants (4B).
- Write the description to a file using try-with-resources with
FileWriterfor automatic resource cleanup. - Handle
NutrientExceptionfor vision processing failures, including server unavailable, model not loaded, or timeout errors. - Handle
IOExceptionfor file operations, including read failures or write errors when saving output. - Configure custom endpoints through
CustomVlmApiSettingsfor non-default server configurations.
For related image workflows, refer to the Java SDK guides.
Download this ready-to-use sample package to explore local AI image description.