Generating image descriptions using Claude

Generating accessible image descriptions programmatically enables teams to build inclusive applications, automate content cataloging, and enhance document accessibility workflows. Whether you’re creating accessibility compliance systems that generate alt text for web images meeting WCAG standards, building digital asset management platforms that automatically catalog photo libraries with searchable descriptions, implementing document accessibility workflows that add descriptive text to scanned images in PDF reports, creating e-learning content management systems that generate descriptions for educational diagrams and charts, or building archival systems that catalog historical photograph collections with detailed metadata, AI-powered image description provides natural language understanding of visual content. Image description operations include analyzing image content with vision language models (VLM); generating concise descriptions focused on main subjects and key details; producing accessibility-compliant descriptions for screen readers; extracting semantic meaning from charts, diagrams, and photographs; and providing contextual understanding beyond simple object detection.

Download sample

This guide demonstrates using Claude (Anthropic’s AI model) as the VLM provider for generating image descriptions through the Nutrient vision API. Claude provides state-of-the-art vision understanding with strong reasoning capabilities, nuanced contextual descriptions, and cloud-based processing, eliminating local GPU infrastructure requirements.

How Nutrient helps you achieve this

Nutrient Java SDK handles vision API integration, VLM provider configuration, and image processing pipelines. With the SDK, you don’t need to worry about:

Managing VLM API authentication, endpoint configuration, and request formatting
Encoding image data and handling multimodal API request structures
Configuring model parameters like temperature, max tokens, and provider-specific settings
Complex error handling for vision service failures and API rate limits

Instead, Nutrient provides an API that handles all the complexity behind the scenes, enabling you to focus on your business logic.

Complete implementation

Below is a complete working example that demonstrates generating accessible image descriptions using Claude as the VLM provider. The following lines set up the Java application. Start by specifying a package name and importing the required classes:

package io.nutrient.Sample;

Create the main class and function. The following code creates the main entry point that will contain the image description logic. The throws NutrientException, IOException declaration handles exceptions that may occur during document processing, vision operations, or file I/O operations:

import io.nutrient.sdk.Document;
import io.nutrient.sdk.Vision;
import io.nutrient.sdk.enums.VlmProvider;
import io.nutrient.sdk.exceptions.NutrientException;
import io.nutrient.sdk.settings.ClaudeApiSettings;
import io.nutrient.sdk.settings.VisionSettings;

import java.io.FileWriter;
import java.io.IOException;

public class DescribeImageWithClaude {

Configuring the Claude provider

Open the image file using a try-with-resources statement and configure the vision API to use Claude as the VLM provider. The following code opens an image file (PNG format in this example) using Document.open() with a file path parameter. The try-with-resources pattern ensures the document is properly closed after processing, releasing memory and image data regardless of whether description generation succeeds or fails. The document.getSettings().getVisionSettings() call retrieves the vision configuration object where you specify which VLM provider to use. The setProvider(VlmProvider.Claude) method configures Claude (Anthropic’s AI model) as the vision provider instead of alternatives like OpenAI or local AI models. The document.getSettings().getClaudeApiSettings() call retrieves Claude-specific configuration, including API endpoints, model selection, and authentication. The setApiKey() method sets your Anthropic API key for authentication — replace "CLAUDE_API_KEY" with your actual API key obtained from the Anthropic Console. The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF:

    public static void main(String[] args) throws NutrientException, IOException {

Creating a vision instance and generating the description

Create a vision API instance bound to the document and generate a natural language description of the image content. The following code uses the Vision.set() static factory method with the document parameter to create a vision processor configured with the Claude provider settings defined earlier. The vision.describe() method sends the image to the Claude API for analysis and returns a natural language description as a string. During processing, the SDK encodes the image data, constructs a multimodal API request with the image payload, sends the request to Claude’s vision endpoint, and parses the response to extract the description text. The description focuses on the main subject and key visual details observable in the image, optimized for accessibility purposes and screen reader compatibility:

        try (Document document = Document.open("input_photo.png")) {
            // Configure Claude as the VLM provider
            VisionSettings visionSettings = document.getSettings().getVisionSettings();
            visionSettings.setProvider(VlmProvider.Claude);

            // Set the Claude API key
            ClaudeApiSettings claudeSettings = document.getSettings().getClaudeApiSettings();
            claudeSettings.setApiKey("CLAUDE_API_KEY");

Saving the description

Write the generated description to a text file for storage or further processing. The following code uses a try-with-resources statement with FileWriter to create and write the description string to "output.txt". The try-with-resources pattern ensures the file writer is automatically closed after writing completes, properly flushing buffers and releasing file handles. The outer try-with-resources block automatically closes the document after file writing completes, ensuring proper cleanup of both the document and file resources:

            Vision vision = Vision.set(document);
            String description = vision.describe();

            try (FileWriter writer = new FileWriter("output.txt")) {
                writer.write(description);
            }
        }
    }
}

Understanding the output

The describe() method returns a natural language description of the image content optimized for accessibility and content understanding. Claude analyzes the visual content and generates descriptions with specific characteristics:

Concise — Focused on the main subject and key details without unnecessary verbosity, typically 1–3 sentences.
Accessible — Written for users who cannot see the image, following accessibility best practices for alt text.
Accurate — Describes only what is clearly visible in the image, avoiding speculation or interpretation beyond observable details.
Contextual — Understands relationships between objects, spatial arrangements, and relevant context within the scene.

The generated descriptions are suitable for accessibility compliance (WCAG alt text requirements), content management metadata, searchable image cataloging, and document accessibility workflows.

Claude API settings

The Claude provider uses the following settings from ClaudeApiSettings:

ApiEndpoint — The Claude API endpoint (default: https://api.anthropic.com/v1/).
ApiKey — Your Anthropic API key for authentication.
Model — The model identifier to use (default: claude-sonnet-4-5).
Temperature — Controls response creativity (0.0 = deterministic, 1.0 = creative).
MaxTokens — Maximum tokens in the response (default: 16384).

Error handling

The Java SDK throws NutrientException if vision operations fail due to processing errors, API failures, or configuration issues. The main method also declares IOException for file I/O operations. Exception handling ensures robust error recovery in production environments.

Common failure scenarios include:

The input image file can’t be read due to file system permissions, path errors, or unsupported image formats
Invalid or missing Claude API key causing authentication failures with the Anthropic API
Claude API service is unavailable or experiencing outages preventing description generation
API rate limits exceeded when processing high volumes of images in rapid succession
Network connectivity issues preventing API requests from reaching Claude’s endpoints
Image data too large or corrupted, preventing proper encoding and transmission
File writing failures due to disk space, permissions, or path errors when saving output descriptions

In production code, wrap the vision operations in a try-catch block to catch NutrientException and IOException instances, providing appropriate error messages to users and logging failure details, including exception messages and stack traces for debugging. This error handling pattern enables graceful degradation when vision processing fails, preventing application crashes and enabling retry logic with exponential backoff for transient API failures or user notification for manual intervention.

Conclusion

The image description workflow with Claude consists of several key operations:

Open the image file using try-with-resources for automatic resource cleanup.
The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF.
Retrieve the vision settings with document.getSettings().getVisionSettings() to configure the VLM provider.
Set the provider to Claude with setProvider(VlmProvider.Claude) instead of alternatives like OpenAI or local models.
Retrieve Claude-specific settings with document.getSettings().getClaudeApiSettings() for API configuration.
Set the Anthropic API key with setApiKey() using credentials obtained from the Anthropic Console.
Claude API settings control endpoint URLs, model selection (default: claude-sonnet-4-5), temperature, and max tokens.
Create a vision instance with Vision.set() bound to the document with configured provider settings.
Generate the description with vision.describe(), which sends the image to Claude’s vision endpoint and returns natural language text.
The SDK encodes image data, constructs multimodal API requests, and parses responses automatically.
Generated descriptions are concise (1–3 sentences), accessible (WCAG-compliant alt text), accurate (observable details only), and contextual.
Write the description to a file using try-with-resources with FileWriter for automatic resource cleanup.
Handle NutrientException for vision processing failures, including authentication errors, API failures, and rate limits.
Handle IOException for file operations, including read failures or write errors when saving output.

Nutrient handles VLM API authentication, multimodal request formatting, image encoding, endpoint configuration, model parameter management, and response parsing so you don’t need to understand Claude API protocols or manage complex vision service integration manually. The image description system provides precise control for accessibility compliance systems generating WCAG alt text, digital asset management platforms cataloging photo libraries, document accessibility workflows adding descriptions to scanned images, e-learning content management systems describing educational diagrams, and archival systems cataloging historical photographs with metadata.

You can download this ready-to-use sample package, fully configured to help you explore the vision API description capabilities with Claude.