Extracting text from multilingual images

Multi-language text extraction addresses a fundamental challenge in global operations where organizations must process documents containing content in multiple languages within the same image. This capability is essential for international companies handling multilingual contracts, government agencies processing diverse public documents, and educational institutions managing multicultural content.

Simultaneously recognizing and extracting text in multiple languages eliminates the need for separate workflows. This enables the efficient handling of documents that contain mixed-language text, such as:

International business correspondence
Multilingual product documentation
Travel documents and passports
Cross-border legal materials

From automated translation workflows to compliance systems processing multilingual regulatory documents, multi-language OCR enables businesses to handle diverse linguistic content with the same efficiency as single-language processing. This breaks down language barriers in document digitization and content management.

Streamlining document workflows with our Python SDK

Developers can implement this feature by adding just a few lines of code to their applications. The SDK integrates multi-language OCR-based text extraction directly, eliminating the requirement for external tools or complex setups. Whether you’re building a document processing pipeline or adding extraction functionality to a web application, our SDK provides a reliable and efficient solution right out of the box.

Preparing the project

Import the required modules from Nutrient Python SDK:

from nutrient_sdk import Document, Vision, VisionEngine

Loading and configuring multi-language OCR

Open the image file and configure the vision API with multi-language support. Setting the default languages tells the OCR engine which language models to load for optimal recognition accuracy:

with Document.open("input_ocr_multiple_languages.png") as document:
    # Configure OCR engine for text extraction
    document.settings.vision_settings.engine = VisionEngine.ADAPTIVE_OCR

    # Configure multiple languages for recognition
    document.settings.ocr_settings.default_languages = "eng+fra"

The default_languages property accepts a string with language codes separated by plus signs (for example, “eng+fra”). Each language addition loads specialized recognition models that include:

Character sets specific to the language.
Linguistic patterns and dictionaries.
Contextual analysis rules for improved accuracy.

Executing multi-language text extraction

Create a vision instance and extract the text content. The vision API applies language-specific recognition algorithms and extracts text while maintaining language accuracy:

    vision = Vision.set(document)
    content_json = vision.extract_content()

The OCR engine automatically handles language transitions within the document, maintaining accuracy when text switches between languages and preserving the natural flow and organization of multilingual content.

Saving extracted results

Write the extracted content to a JSON file for use in downstream applications:

    with open("output.json", "w") as f:
        f.write(content_json)

This creates a JSON file containing all recognized text from the image in all configured languages. The structured output is ready for:

Translation workflows
Content management systems
Automated data processing

Understanding the output

The extract_content method returns a JSON structure that provides comprehensive metadata for the processed document. This structure includes:

Text content — The full string of extracted text, preserving multi-language characters and symbols.
Bounding boxes — The precise (x, y) coordinates and dimensions (width/height) of text regions on the page.
Word-level data — Detailed information for individual words, including their specific coordinates and confidence scores.

In trial mode, extracted output can include hidden-content placeholders that mask parts of the text. For full validation of multilingual extraction quality, test with a valid production license.

Error handling

The vision API raises VisionException if content extraction fails. This can happen if the image cannot be processed or if OCR resources aren’t available. Handle exceptions appropriately in production code to ensure a robust document processing pipeline.

Conclusion

That’s all it takes to extract text from a multi-language image! The extracted content preserves the linguistic diversity and content organization of the original image while enabling further processing, translation, or analysis workflows. You can also download this ready-to-use sample package, which is fully configured to help you explore Nutrient Python SDK and its seamless multi-language text extraction capabilities.

Extracting text from multilingual images

Streamlining document workflows with our Python SDK

Preparing the project

Loading and configuring multi-language OCR

Executing multi-language text extraction

Saving extracted results

Understanding the output

Error handling

Conclusion

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.