Extracting text from images

In modern business operations, converting image-based text into searchable, editable digital content is a critical step in workflow automation. Organizations often face the challenge of manually transcribing information from scanned documents and photographs — a process that is time-consuming and prone to human error.

This sample demonstrates the technical foundation for automated text extraction from images using Nutrient Python SDK. It showcases how businesses can transform static visual content into actionable digital text. This capability supports complex document processing workflows, including:

Search functionality — Making scanned documents searchable.
Accessibility compliance — Supporting screen readers and text-to-speech.
Automated data entry — Eliminating manual transcription in form processing.

Whether you’re digitizing historical archives or processing high volumes of form submissions, the ability to reliably extract text from images enables automation strategies that maintain accuracy and efficiency across diverse operational contexts.

Streamlining document workflows with our Python SDK

Developers can implement this feature by adding just a few lines of code to their applications. Nutrient Python SDK integrates OCR-based text extraction directly, eliminating the requirement for external tools or complex setups. Whether you’re building a document processing pipeline or adding extraction functionality to a web application, the SDK provides a reliable and efficient solution right out of the box.

Preparing the project

Import the required modules from Nutrient Python SDK:

from nutrient_sdk import Document, Vision, VisionEngine

Loading and processing the image

Open the image file and configure the vision API to use the OCR engine for text extraction:

with Document.open("input.png") as document:
    # Configure OCR engine for text extraction
    document.settings.vision_settings.engine = VisionEngine.ADAPTIVE_OCR

The Document.open method loads the image into memory and prepares it for OCR processing. Nutrient Python SDK automatically detects the image format and applies appropriate preprocessing steps — such as deskewing or noise reduction — to improve recognition accuracy.

Executing text recognition

Create a vision instance and extract the text content. The vision API analyzes the image structure, identifies text regions, and converts visual characters into structured digital text:

    vision = Vision.set(document)
    content_json = vision.extract_content()

The extract_content method performs the actual text recognition, returning a JSON structure containing all recognized text with positioning information.

Saving extracted results

Write the extracted content to a JSON file for use in downstream applications:

    with open("output.json", "w") as f:
        f.write(content_json)

This creates a JSON file containing all recognized text from the image. The structured output includes word-level bounding boxes and text content ready for:

Search indexing
Database storage
Content management workflows

Understanding the output

The extract_content method returns a JSON structure that provides comprehensive metadata for the processed document. This structure includes:

Text content — The full string of extracted text from the document.
Bounding boxes — The precise (x, y) coordinates and dimensions (width/height) of text regions on the page.
Word-level data — Detailed information for individual words, including their specific coordinates and confidence scores.

In trial mode, extracted output can include hidden-content placeholders that mask parts of the text. For full validation of extraction quality, test with a valid production license.

Error handling

The vision API raises VisionException if content extraction fails. This can happen if the image cannot be processed or if OCR resources aren’t available. Handle exceptions appropriately in production code to ensure a robust document processing pipeline.

Conclusion

That’s all it takes to extract text from an image using OCR. The extracted content is ready for integration with search systems, accessibility tools, or automated data processing workflows. You can also download this ready-to-use sample package, fully configured to help you dive into Nutrient Python SDK and explore text extraction capabilities.

Extracting text from images

Streamlining document workflows with our Python SDK

Preparing the project

Loading and processing the image

Executing text recognition

Saving extracted results

Understanding the output

Error handling

Conclusion

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.