Generating image descriptions using OpenAI
Use OpenAI-powered image description to generate alt text and visual summaries in cloud workflows.
Common use cases include:
- Accessibility pipelines for screen readers
- Content management and image cataloging
- Document workflows across regions
- Enterprise integrations with managed API infrastructure
- Fast prototyping without local model hosting
This guide uses OpenAI as the VLM provider through Nutrient Vision API.
Download sampleHow Nutrient helps
Nutrient Java SDK handles provider configuration, request handling, and response parsing.
The SDK handles:
- OpenAI API authentication, endpoint configuration, and request formatting
- Image encoding and multimodal API request structures
- Model parameters such as temperature, max tokens, and provider-specific settings
- Vision service failures and API rate limits
Complete implementation
This example generates an image description using OpenAI:
package io.nutrient.Sample;Import the required classes from the SDK:
import io.nutrient.sdk.Document;import io.nutrient.sdk.Vision;import io.nutrient.sdk.enums.VlmProvider;import io.nutrient.sdk.exceptions.NutrientException;import io.nutrient.sdk.settings.OpenAIApiEndpointSettings;import io.nutrient.sdk.settings.VisionSettings;
import java.io.FileWriter;import java.io.IOException;
public class DescribeImageWithOpenai {Create the main method and declare thrown exceptions:
public static void main(String[] args) throws NutrientException, IOException {Configuring the OpenAI provider
Open the image in try-with-resources and configure OpenAI as the provider.
In this sample:
setProvider(VlmProvider.OpenAI)selects OpenAI.setApiKey("OPENAI_API_KEY")sets your API key.- Input can be PNG, JPEG, GIF, BMP, or TIFF.
try (Document document = Document.open("input_photo.png")) { // Configure OpenAI as the VLM provider VisionSettings visionSettings = document.getSettings().getVisionSettings(); visionSettings.setProvider(VlmProvider.OpenAI);
// Set the OpenAI API key OpenAIApiEndpointSettings openaiSettings = document.getSettings().getOpenAIApiEndpointSettings(); openaiSettings.setApiKey("OPENAI_API_KEY");Creating a vision instance and generating the description
Create a vision instance and call describe() to generate text.
In this sample:
Vision.set(document)binds processing to the opened image.vision.describe()returns a description string.- The SDK handles encoding, request construction, and response parsing.
Vision vision = Vision.set(document); String description = vision.describe();Saving the description
Write the description to a text file.
This sample uses try-with-resources for both document and file-writer cleanup:
try (FileWriter writer = new FileWriter("output.txt")) { writer.write(description); } } }}Understanding the output
describe() returns natural language text for accessibility and content understanding.
Descriptions are typically:
- Concise — Focused on key subjects and details, often one to three sentences
- Accessible — Suitable for users who rely on screen readers
- Accurate — Based on visible content only
- Contextual — Include relevant relationships and scene context
Use this output for accessibility metadata, image search, and document workflows.
OpenAI API settings
The OpenAI provider uses these OpenAIApiEndpointSettings properties:
- ApiEndpoint — The OpenAI API endpoint (default:
https://api.openai.com/v1). - ApiKey — Your OpenAI API key for authentication.
- Model — The model identifier to use.
- Temperature — Controls response creativity (0.0 = deterministic, 1.0 = creative).
- MaxTokens — Maximum tokens in the response (default: 16384).
Error handling
The sample can throw:
NutrientExceptionfor vision and API issuesIOExceptionfor file I/O operations
Common failure scenarios include:
- The input image can’t be read due to path, permission, or format issues
- The OpenAI API key is missing or invalid
- The OpenAI API is unavailable
- Rate limits are exceeded
- Network requests fail before reaching the API
- Image data is too large or corrupted
- File writing fails due to path, disk, or permission issues
In production code:
- Catch
NutrientExceptionandIOException. - Return clear error messages.
- Log failure details for debugging.
- Add retry logic for transient API failures.
Conclusion
Use this workflow to generate image descriptions with OpenAI:
- Open the image file using try-with-resources for automatic resource cleanup.
- The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF.
- Retrieve the vision settings with
document.getSettings().getVisionSettings()to configure the VLM provider. - Set the provider to OpenAI with
setProvider(VlmProvider.OpenAI)instead of alternatives like Claude or local models. - Retrieve OpenAI-specific settings with
document.getSettings().getOpenAIApiEndpointSettings()for API configuration. - Set the OpenAI API key with
setApiKey()using credentials obtained from the OpenAI platform. - OpenAI API settings control endpoint URLs, model selection, temperature, and max tokens.
- Create a vision instance with
Vision.set()bound to the document with configured provider settings. - Generate the description with
vision.describe()which sends the image to OpenAI’s vision endpoint and returns natural language text. - The SDK encodes image data, constructs multimodal API requests, and parses responses automatically.
- Generated descriptions are concise (1–3 sentences), accessible (WCAG-compliant alt text), accurate (observable details only), and contextual.
- Write the description to a file using try-with-resources with
FileWriterfor automatic resource cleanup. - Handle
NutrientExceptionfor vision processing failures, including authentication errors, API failures, and rate limits. - Handle
IOExceptionfor file operations, including read failures or write errors when saving output.
For related image workflows, refer to the Java SDK guides.
Download this ready-to-use sample package to explore OpenAI-based image description.