---
title: "Extracting text from PDF documents | Nutrient Python SDK"
canonical_url: "https://www.nutrient.io/guides/python/extraction/pdf-to-text/"
md_url: "https://www.nutrient.io/guides/python/extraction/pdf-to-text.md"
last_updated: "2026-06-09T19:34:32.777Z"
description: "Extract layout-preserving text from PDF documents using Nutrient Python SDK."
---

# Extracting text from PDF documents

PDF-to-text extraction pulls readable content from a static document while preserving its spatial arrangement. Layout-aware extraction keeps columns, indentation, and table alignment intact, so the output matches what readers see on the page.

Use programmatic extraction to:

- Index large document libraries for search.

- Send structured text to data pipelines and language models.

- Reuse report and statement content without manual retyping.

## Extract PDF text with the Python SDK

You can add layout-preserving text extraction to a Python application with the Nutrient Python SDK. The SDK extracts text directly from PDFs, so you don't need external tools for this workflow.

## Prepare the project

Start by importing the Nutrient Python SDK classes:

```python

from nutrient_sdk import Document
from nutrient_sdk import NutrientException

```

## Load the PDF document

This guide uses the `Document` class. Use Python's [context manager](https://docs.python.org/3/reference/datamodel.html#context-managers) to manage the document instance lifecycle.

The SDK can load a source file from a file path or a stream. This guide uses a file path:

```python

def main():
    try:
        with Document.open("input.pdf") as document:

```

The path can be absolute or relative. This example loads the file from the application's working directory.

## Extract layout-preserving text

Call `export_as_text` to extract the document text into a plain-text file. The method maps each word to a character grid that mirrors its position on the page:

```python

            document.export_as_text("output.txt")
            print("Successfully extracted to output.txt")
    except NutrientException as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

```

The `export_as_text` method analyzes the PDF text content and the position of each word, then reconstructs the page in plain text. Words that sit close together join with single spaces, large horizontal gaps become proportional whitespace that preserves columns and tab stops, and vertical gaps between lines produce blank lines. The result reads like the original page while staying in a portable format.

The method handles these PDF content types:

- Flowing text.

- Multi-column layouts.

- Tables and aligned data.

- Mixed content layouts.

## Handle errors

Nutrient Python SDK uses exception handling for errors. The methods in this guide raise a `NutrientException` if a failure occurs. Use this exception to troubleshoot issues and implement error handling logic.

## Conclusion

You've extracted layout-preserving text from a PDF document. The extracted content is ready for search indexing, data pipelines, and downstream processing. You can also download the [sample package](https://www.nutrient.io/downloads/samples/python/pdf-to-text.zip) to explore text extraction with the Python SDK.
---

## Related pages

- [Speeding up first ICR operation by predownloading models](/guides/python/extraction/speed-up-first-icr-by-downloading-requirements.md)
- [Extracting text from multilingual images](/guides/python/extraction/read-text-from-image-multi-language.md)
- [Extracting structured data from documents](/guides/python/extraction/extract-structured-data.md)
- [Generating image descriptions using Claude](/guides/python/extraction/describe-image-with-claude.md)
- [Extracting data from images using vision language models](/guides/python/extraction/extract-data-from-image-vlm.md)
- [Generating image descriptions using OpenAI](/guides/python/extraction/describe-image-with-openai.md)
- [Extracting text from images](/guides/python/extraction/read-text-from-image.md)
- [Generating image descriptions using local AI](/guides/python/extraction/describe-image-with-local-ai.md)
- [Nutrient Python SDK extraction guides](/guides/python/extraction.md)
- [Applying OCR to a PDF document](/guides/python/extraction/apply-ocr-to-pdf.md)
- [Extracting form fields from images](/guides/python/extraction/extract-form-fields-from-image.md)
- [Extracting data from images using OCR](/guides/python/extraction/extract-data-from-image-ocr.md)
- [Applying OCR to a PDF page](/guides/python/extraction/apply-ocr-to-pdf-page.md)
- [Labeling form fields with a vision language model](/guides/python/extraction/label-form-fields-with-vlm.md)
- [Extracting structured JSON data from PDF documents](/guides/python/extraction/json-data-extraction.md)
- [Extracting data from images using ICR](/guides/python/extraction/extract-data-from-image-icr.md)