Generating image descriptions using Claude

Generating accessible image descriptions programmatically enables teams to build inclusive applications, automate content cataloging, and enhance document accessibility workflows. Whether you’re creating accessibility compliance systems that generate alt text for web images meeting WCAG standards, building digital asset management platforms that automatically catalog photo libraries with searchable descriptions, implementing document accessibility workflows that add descriptive text to scanned images in PDF reports, creating e-learning content management systems that generate descriptions for educational diagrams and charts, or building archival systems that catalog historical photograph collections with detailed metadata, AI-powered image description provides natural language understanding of visual content. Image description operations include analyzing image content with vision language models (VLM); generating concise descriptions focused on main subjects and key details; producing accessibility-compliant descriptions for screen readers; extracting semantic meaning from charts, diagrams, and photographs; and providing contextual understanding beyond simple object detection.

This guide demonstrates using Claude (Anthropic’s AI model) as the VLM provider for generating image descriptions through the Nutrient vision API. Claude provides state-of-the-art vision understanding with strong reasoning capabilities, nuanced contextual descriptions, and cloud-based processing, eliminating local GPU infrastructure requirements.

Download sample

How Nutrient helps you achieve this

Nutrient Python SDK handles vision API integration, VLM provider configuration, and image processing pipelines. With the SDK, you don’t need to worry about:

Managing VLM API authentication, endpoint configuration, and request formatting
Encoding image data and handling multimodal API request structures
Configuring model parameters like temperature, max tokens, and provider-specific settings
Complex error handling for vision service failures and API rate limits

Instead, Nutrient provides an API that handles all the complexity behind the scenes, enabling you to focus on your business logic.

Complete implementation

Below is a complete working example that demonstrates generating accessible image descriptions using Claude as the VLM provider. The following lines set up the Python application. Start by importing the required classes from the SDK:

from nutrient_sdk import Document, Vision
from nutrient_sdk.settings import VlmProvider

Configuring the Claude provider

Open the image file using a context manager(opens in a new tab) and configure the vision API to use Claude as the VLM provider. The following code opens an image file (PNG format in this example) using Document.open() with a file path parameter. The context manager pattern (using the with statement) ensures the document is properly closed after processing, releasing memory and image data regardless of whether description generation succeeds or fails. The document.settings.vision_settings.provider property assignment specifies which VLM provider to use — setting it to VlmProvider.CLAUDE configures Claude (Anthropic’s AI model) as the vision provider instead of alternatives like OpenAI or local AI models. The document.settings.claude_api_settings.api_key property assignment sets your Anthropic API key for authentication — replace "CLAUDE_API_KEY" with your actual API key obtained from the Anthropic Console. The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF:

with Document.open("input_photo.png") as document:
    # Configure Claude as the VLM provider
    document.settings.vision_settings.provider = VlmProvider.CLAUDE

    # Set the Claude API key
    document.settings.claude_api_settings.api_key = "CLAUDE_API_KEY"

Creating a vision instance and generating the description

Create a vision API instance bound to the document and generate a natural language description of the image content. The following code uses the Vision.set() static method with the document parameter to create a vision processor configured with the Claude provider settings defined earlier. The vision.describe() method sends the image to the Claude API for analysis and returns a natural language description as a string. During processing, the SDK encodes the image data, constructs a multimodal API request with the image payload, sends the request to Claude’s vision endpoint, and parses the response to extract the description text. The description focuses on the main subject and key visual details observable in the image, optimized for accessibility purposes and screen reader compatibility:

    vision = Vision.set(document)
    description = vision.describe()

Outputting the description

Print the generated description to the console for review or logging. The following code prints a header line followed by the description text returned by Claude. In production applications, you might save the description to a database, write it to a file, or use it to populate alt text attributes in HTML documents. The context manager automatically closes the document after the print statements complete, ensuring proper resource cleanup:

    print("Image description:")
    print(description)

Understanding the output

The describe() method returns a natural language description of the image content optimized for accessibility and content understanding. Claude analyzes the visual content and generates descriptions with specific characteristics:

Concise — Focused on the main subject and key details without unnecessary verbosity, typically 1–3 sentences.
Accessible — Written for users who cannot see the image, following accessibility best practices for alt text.
Accurate — Describes only what is clearly visible in the image, avoiding speculation or interpretation beyond observable details.
Contextual — Understands relationships between objects, spatial arrangements, and relevant context within the scene.

The generated descriptions are suitable for accessibility compliance (WCAG alt text requirements), content management metadata, searchable image cataloging, and document accessibility workflows.

Claude API settings

The Claude provider uses the following settings from claude_api_settings:

api_endpoint — The Claude API endpoint (default: https://api.anthropic.com/v1/).
api_key — Your Anthropic API key for authentication.
model — The model identifier to use (default: claude-sonnet-4-5).
temperature — Controls response creativity (0.0 = deterministic, 1.0 = creative).
max_tokens — Maximum tokens in the response (default: 16384).

Error handling

The Python SDK raises NutrientException if vision operations fail due to processing errors, API failures, or configuration issues. Exception handling ensures robust error recovery in production environments.

Common failure scenarios include:

The input image file can’t be read due to file system permissions, path errors, or unsupported image formats
Invalid or missing Claude API key causing authentication failures with the Anthropic API
Claude API service is unavailable or experiencing outages preventing description generation
API rate limits exceeded when processing high volumes of images in rapid succession
Network connectivity issues preventing API requests from reaching Claude’s endpoints
Image data too large or corrupted, preventing proper encoding and transmission

In production code, wrap the vision operations in a try-except block to catch NutrientException instances, providing appropriate error messages to users and logging failure details for debugging. This error handling pattern enables graceful degradation when vision processing fails, preventing application crashes and enabling retry logic with exponential backoff for transient API failures or user notification for manual intervention.

Conclusion

The image description workflow with Claude consists of several key operations:

Open the image file using a context manager(opens in a new tab) for automatic resource cleanup.
The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF.
Access the vision settings with document.settings.vision_settings.provider to configure the VLM provider.
Set the provider to Claude with the property assignment VlmProvider.CLAUDE instead of alternatives like OpenAI or local models.
Access Claude-specific settings with document.settings.claude_api_settings for API configuration.
Set the Anthropic API key with property assignment using credentials obtained from the Anthropic Console.
Claude API settings control endpoint URLs, model selection (default: claude-sonnet-4-5), temperature, and max tokens.
Create a vision instance with Vision.set() bound to the document with configured provider settings.
Generate the description with vision.describe(), which sends the image to Claude’s vision endpoint and returns natural language text.
The SDK encodes image data, constructs multimodal API requests, and parses responses automatically.
Generated descriptions are concise (1–3 sentences), accessible (WCAG-compliant alt text), accurate (observable details only), and contextual.
Print or save the description for use in accessibility systems, content management, or cataloging workflows.
Handle NutrientException for vision processing failures, including authentication errors, API failures, and rate limits.
The context manager ensures proper resource cleanup when processing completes or exceptions occur.

Nutrient handles VLM API authentication, multimodal request formatting, image encoding, endpoint configuration, model parameter management, and response parsing so you don’t need to understand Claude API protocols or manage complex vision service integration manually. The image description system provides precise control for accessibility compliance systems generating WCAG alt text, digital asset management platforms cataloging photo libraries, document accessibility workflows adding descriptions to scanned images, e-learning content management systems describing educational diagrams, and archival systems cataloging historical photographs with metadata.

You can download this ready-to-use sample package, fully configured to help you explore the vision API description capabilities with Claude.