Generating image descriptions using Claude
Use image description to generate alt text and visual summaries from images.
Common use cases include:
- Accessibility workflows for screen readers
- Digital asset cataloging
- Document enrichment for scanned reports
- E-learning content description
- Archive and metadata generation
This guide uses Claude as the VLM provider through Nutrient Vision API.
Download sampleHow Nutrient helps
Nutrient Python SDK handles provider configuration, request handling, and response parsing.
The SDK handles:
- API authentication and endpoint configuration
- Image encoding and multimodal request payloads
- Model settings such as temperature and token limits
- Vision API failures and rate-limit handling
Complete implementation
This example generates an image description using Claude:
from nutrient_sdk import Document, Visionfrom nutrient_sdk import VlmProviderConfiguring the Claude provider
Open the image in a context manager(opens in a new tab) and configure Claude as the provider.
In this sample:
vision_settings.provider = VlmProvider.CLAUDEselects Claude.claude_api_settings.api_keysets the Anthropic API key.- Input can be PNG, JPEG, GIF, BMP, or TIFF.
with Document.open("input_photo.png") as document: # Configure Claude as the VLM provider document.settings.vision_settings.provider = VlmProvider.CLAUDE
# Set the Claude API key document.settings.claude_api_settings.api_key = "CLAUDE_API_KEY"Creating a vision instance and generating the description
Create a vision instance and call describe() to generate text.
In this sample:
Vision.set(document)binds processing to the opened image.vision.describe()returns a description string.- The SDK handles encoding, request construction, and response parsing.
vision = Vision.set(document) description = vision.describe()Outputting the description
Print the description for review, or store it in your application.
Common destinations include:
- Database fields
- JSON output files
- HTML
altattributes
print("Image description:") print(description)Understanding the output
describe() returns natural language text for accessibility and content understanding.
Claude descriptions are typically:
- Concise — Focused on key subjects and details, often one to three sentences
- Accessible — Suitable for users who rely on screen readers
- Accurate — Based on visible content only
- Contextual — Include relevant relationships and scene context
Use this output for accessibility metadata, image search, and document workflows.
Claude API settings
The Claude provider uses these claude_api_settings properties:
api_endpoint— The Claude API endpoint (default:https://api.anthropic.com/v1/).api_key— Your Anthropic API key for authentication.model— The model identifier to use (default:claude-sonnet-4-5).temperature— Controls response creativity (0.0 = deterministic, 1.0 = creative).max_tokens— Maximum tokens in the response (default: 16384).
Error handling
The SDK raises NutrientException when vision operations fail.
Common failure scenarios include:
- The input image can’t be read due to path, permission, or format issues
- The Claude API key is missing or invalid
- The Claude API is unavailable
- Rate limits are exceeded
- Network requests fail before reaching the API
- Image data is too large or corrupted
In production code:
- Catch
NutrientException. - Return a clear error message.
- Log failure details for debugging.
- Add retry logic for transient API failures.
Conclusion
Use this workflow to generate image descriptions with Claude:
- Open the image file using a context manager(opens in a new tab) for automatic resource cleanup.
- The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF.
- Access the vision settings with
document.settings.vision_settings.providerto configure the VLM provider. - Set the provider to Claude with the property assignment
VlmProvider.CLAUDEinstead of alternatives like OpenAI or local models. - Access Claude-specific settings with
document.settings.claude_api_settingsfor API configuration. - Set the Anthropic API key with property assignment using credentials obtained from the Anthropic Console.
- Claude API settings control endpoint URLs, model selection (default: claude-sonnet-4-5), temperature, and max tokens.
- Create a vision instance with
Vision.set()bound to the document with configured provider settings. - Generate the description with
vision.describe(), which sends the image to Claude’s vision endpoint and returns natural language text. - The SDK encodes image data, constructs multimodal API requests, and parses responses automatically.
- Generated descriptions are concise (1–3 sentences), accessible (WCAG-compliant alt text), accurate (observable details only), and contextual.
- Print or save the description for use in accessibility systems, content management, or cataloging workflows.
- Handle
NutrientExceptionfor vision processing failures, including authentication errors, API failures, and rate limits. - The context manager ensures proper resource cleanup when processing completes or exceptions occur.
For related image workflows, refer to the Python SDK guides.
Download this ready-to-use sample package to explore Claude-based image description.