---
title: "Generating image descriptions using OpenAI | Nutrient Python SDK"
canonical_url: "https://www.nutrient.io/guides/python/extraction/describe-image-with-openai/"
md_url: "https://www.nutrient.io/guides/python/extraction/describe-image-with-openai.md"
last_updated: "2026-06-09T19:34:32.777Z"
description: "Generate accessible image descriptions using OpenAI with Nutrient Python SDK."
---

# Generating image descriptions using OpenAI

Use OpenAI-powered image description to generate alt text and visual summaries in cloud workflows.

Common use cases include:

- Accessibility pipelines for screen readers

- Content management and image cataloging

- Document workflows across regions

- Enterprise integrations with managed API infrastructure

- Fast prototyping without local model hosting

This guide uses OpenAI as the VLM provider through Nutrient Vision API.

[Download sample](https://www.nutrient.io/downloads/samples/python/describe-image-with-openai.zip)

## How Nutrient helps

Nutrient Python SDK handles provider configuration, request handling, and response parsing.

The SDK handles:

- OpenAI API authentication and endpoint setup

- Image encoding and multimodal payload formatting

- Model parameters such as temperature and token limits

- API failure and rate-limit handling

## Complete implementation

This example generates an image description using OpenAI:

```python

from nutrient_sdk import Document, Vision
from nutrient_sdk import VlmProvider

```

## Configuring the OpenAI provider

Open the image in a [context manager](https://docs.python.org/3/reference/datamodel.html#context-managers) and configure OpenAI as the provider.

In this sample:

- `vision_settings.provider = VlmProvider.OPEN_AI` selects OpenAI.

- `open_ai_api_endpoint_settings.api_key` sets your API key.

- Input can be PNG, JPEG, GIF, BMP, or TIFF.

```python

with Document.open("input_photo.png") as document:
    # Configure OpenAI as the VLM provider

    document.settings.vision_settings.provider = VlmProvider.OPEN_AI

    # Set the OpenAI API key

    document.settings.open_ai_api_endpoint_settings.api_key = "OPENAI_API_KEY"

```

## Creating a vision instance and generating the description

Create a vision instance and call `describe()` to generate text.

In this sample:

- `Vision.set(document)` binds processing to the opened image.

- `vision.describe()` returns a description string.

- The SDK handles encoding, request construction, and response parsing.

```python

    vision = Vision.set(document)
    description = vision.describe()

```

## Outputting the description

Print the description for review, or store it in your application.

Common destinations include:

- Database fields

- JSON output files

- HTML `alt` attributes

```python

    print("Image description:")
    print(description)

```

## Understanding the output

`describe()` returns natural language text for accessibility and content understanding.

Descriptions are typically:

- **Concise** — Focused on key subjects and details, often one to three sentences

- **Accessible** — Suitable for users who rely on screen readers

- **Accurate** — Based on visible content only

- **Contextual** — Include relevant relationships and scene context

Use this output for accessibility metadata, image search, and document workflows.

## OpenAI API settings

The OpenAI provider uses these `open_ai_api_endpoint_settings` properties:

- **api\_endpoint** — The OpenAI API endpoint (default: `https://api.openai.com/v1`)

- **api\_key** — Your OpenAI API key for authentication

- **model** — The model identifier to use

- **temperature** — Controls response creativity (0.0 = deterministic, 1.0 = creative)

- **max\_tokens** — Maximum tokens in the response (default: 16384)

## Error handling

The SDK raises `NutrientException` when vision operations fail.

Common failure scenarios include:

- The input image can’t be read due to path, permission, or format issues

- The OpenAI API key is missing or invalid

- The OpenAI API is unavailable

- Rate limits are exceeded

- Network requests fail before reaching the API

- Image data is too large or corrupted

In production code:

- Catch `NutrientException`.

- Return a clear error message.

- Log failure details for debugging.

- Add retry logic for transient API failures.

## Conclusion

Use this workflow to generate image descriptions with OpenAI:

1. Open the image file using a [context manager](https://docs.python.org/3/reference/datamodel.html#context-managers) for automatic resource cleanup.

2. The SDK supports multiple image formats, including PNG, JPEG, GIF, BMP, and TIFF.

3. Access the vision settings with `document.settings.vision_settings.provider` to configure the VLM provider.

4. Set the provider to OpenAI with `VlmProvider.OPEN_AI` instead of alternatives like Claude or local models.

5. Access OpenAI-specific settings with `document.settings.open_ai_api_endpoint_settings` for API configuration.

6. Set the OpenAI API key with property assignment using credentials obtained from the OpenAI platform.

7. OpenAI API settings control endpoint URLs, model selection, temperature, and max tokens.

8. Create a vision instance with `Vision.set()` bound to the document with configured provider settings.

9. Generate the description with `vision.describe()`, which sends the image to OpenAI’s vision endpoint and returns natural language text.

10. The SDK encodes image data, constructs multimodal API requests, and parses responses automatically.

11. Generated descriptions are concise (1–3 sentences), accessible (WCAG-compliant alt text), accurate (observable details only), and contextual.

12. Print or save the description for use in accessibility systems, content management, or cataloging workflows.

13. Handle `NutrientException` for vision processing failures, including authentication errors, API failures, and rate limits.

14. The context manager ensures proper resource cleanup when processing completes or exceptions occur.

For related image workflows, refer to the [Python SDK guides](https://www.nutrient.io/guides/python.md).

Download [this ready-to-use sample package](https://www.nutrient.io/downloads/samples/python/describe-image-with-openai.zip) to explore OpenAI-based image description.
---

## Related pages

- [Speeding up first ICR operation by predownloading models](/guides/python/extraction/speed-up-first-icr-by-downloading-requirements.md)
- [Extracting text from PDF documents](/guides/python/extraction/pdf-to-text.md)
- [Extracting text from multilingual images](/guides/python/extraction/read-text-from-image-multi-language.md)
- [Extracting structured data from documents](/guides/python/extraction/extract-structured-data.md)
- [Generating image descriptions using Claude](/guides/python/extraction/describe-image-with-claude.md)
- [Extracting data from images using vision language models](/guides/python/extraction/extract-data-from-image-vlm.md)
- [Extracting text from images](/guides/python/extraction/read-text-from-image.md)
- [Generating image descriptions using local AI](/guides/python/extraction/describe-image-with-local-ai.md)
- [Nutrient Python SDK extraction guides](/guides/python/extraction.md)
- [Applying OCR to a PDF document](/guides/python/extraction/apply-ocr-to-pdf.md)
- [Extracting form fields from images](/guides/python/extraction/extract-form-fields-from-image.md)
- [Extracting data from images using OCR](/guides/python/extraction/extract-data-from-image-ocr.md)
- [Applying OCR to a PDF page](/guides/python/extraction/apply-ocr-to-pdf-page.md)
- [Labeling form fields with a vision language model](/guides/python/extraction/label-form-fields-with-vlm.md)
- [Extracting structured JSON data from PDF documents](/guides/python/extraction/json-data-extraction.md)
- [Extracting data from images using ICR](/guides/python/extraction/extract-data-from-image-icr.md)

