Process PDF, Office, and image documents at scale — entirely in Python

Work with PDFs, plus convert Word, Excel, PowerPoint, and images. Annotate, fill forms, sign, extract text with OCR, extract structured data, and redact sensitive information. One SDK replaces fragmented libraries — Pythonic APIs and efficient processing keep your data pipelines running smoothly.



Automatic file type handling

Drop in any supported file — Word, Excel, PowerPoint, HTML, Markdown, images, and more — and convert bidirectionally between PDF and Office formats with high fidelity.

Proven core

Runs on the battle‑tested engine behind Nutrient Web and Document Engine SDKs that’s trusted by thousands of enterprises worldwide for reliable document processing.

Server-optimized performance

Process thousands of documents efficiently with multithreaded batch operations, low memory footprint, and no external dependencies — perfect for data pipelines, automation workflows, and containerized deployments.

Enterprise-ready security

Built-in digital signatures, AES-256 encryption, irreversible redaction for GDPR/HIPAA compliance, and PDF/A archiving — meet audit requirements without stitching together multiple libraries.


Complete document processing toolkit built for modern Python

From file conversions and OCR to forms, signatures, and redaction — everything lives in one SDK. No extra servers, CLI tools, or library juggling.

Python capabilities

Bidirectional conversion

Convert Word, Excel, PowerPoint, HTML, images, and CAD files to PDF — and convert PDFs back to Office formats. High-fidelity conversion preserves layouts, fonts, and formatting for seamless content repurposing.

Template-based automation

Generate thousands of personalized documents from Word templates and JSON data. Automate contracts, invoices, reports, and letters at scale with consistent formatting — no manual assembly required.

Annotations and collaboration

Add comments, highlights, stamps, shapes, and file attachments to PDFs. Enable document review workflows and markup blueprints programmatically within your Python application.

Forms and data collection

Create fillable forms, extract submitted data, and automate batch form filling from databases. Process applications, surveys, and registration documents programmatically with field-level control.

Digital signatures

Apply electronic signatures and certificate-based digital signatures to PDFs. Authenticate documents, ensure integrity, and meet legal compliance requirements for secure remote signing workflows.

OCR and text extraction

Convert scanned documents and images into searchable PDFs. Extract text from 100+ file types in 100+ languages with automated preprocessing for skew correction and noise removal.

Redaction and privacy

Permanently remove sensitive content with zone-based redaction. Specify areas to redact — content is removed from the file structure for GDPR and HIPAA compliance, not just covered with black boxes.

Data extraction

Extract structured data from invoices, receipts, bank statements, and forms. Key-value pair detection identifies dates, amounts, addresses, and more — export to JSON for seamless integration with your data pipelines.

Minimal code, powerful results

Most operations take 3–4 lines of code. Add Nutrient to your project in minutes, and then extend or customize every step with a clean, Pythonic API and context managers.

from nutrient_sdk import Document
from nutrient_sdk.exceptions import NutrientException
try:
with Document.open("input.docx") as document:
document.export_as_pdf("output.pdf")
except NutrientException as e:
print(f"Error: {e}")

Reasons to build with Nutrient

Nutrient SDKs and Cloud APIs add full document lifecycle support to any platform, tech stack, or infrastructure in minutes. The same technology meets Fortune 500 requirements while helping startups ship fast.

Ready for context engineering

Clean documentation, drop-in code, and MCP hooks for both hands-on developers and AI agents.

Build for and deploy anywhere

Web, mobile, desktop, server, or Nutrient Cloud — with no lock-in.

Secure and accessible

SOC 2 Type 2-compliant workflows with enterprise-grade security and encryption.

AI-first document workflows

Built-in document AI with support for leading LLMs and their private implementations.



PROVEN AT SCALE

Trusted by the brands that move the world


The digital arm of Germany’s national railway digitizes millions of track maintenance blueprints with the Nutrient PDF SDK, keeping 40,000 trains rolling each day.


Governance portal trusted by 2,000+ boards in 30 countries embeds Nutrient Web SDK to enable in‑portal annotations and cross‑device continuity, achieving 80 percent user engagement.


Rolled out nationwide PAdES-compliant signatures with the Nutrient PDF SDK, letting every Austrian citizen sign official documents securely in seconds.


FREE TRIAL

Ready to get started?

Start building with our Python SDK in minutes — no payment information required.


Python PDF libraries

What are the advantages?

Integrating PDF functionality into Python applications can significantly enhance document management capabilities. This section will explore the essentials of Python PDF libraries to guide you through this integration.

What is a Python PDF library?

A Python PDF library is a set of APIs enabling developers to process, convert, and manage documents within Python applications. These libraries provide functionality such as converting between PDF and Office formats, automating document generation from templates, adding annotations and forms, applying digital signatures, extracting text with OCR, redacting sensitive information, and extracting structured data.

How to choose the right Python PDF library

Selecting the right Python PDF library depends on your specific document processing needs. Consider the following factors:

  • All-in-one vs. fragmented — Avoid combining multiple libraries (PyPDF2, OCRmyPDF, endesive) when one comprehensive solution can handle conversion, OCR, forms, signatures, and redaction.
  • Enterprise compliance — Ensure support for digital signatures, PII redaction (GDPR, HIPAA), and PDF/A archiving standards.
  • Advanced features — Look beyond basic manipulation to annotations, form filling, OCR with 100+ languages, and AI-powered data extraction.
  • Performance and scalability — Evaluate optimization for high-volume batch processing, memory efficiency, and enterprise workloads.
  • Developer experience — Pythonic APIs, minimal code requirements, documentation quality, and integration effort matter for velocity.
What are the benefits of using Nutrient’s Python PDF library?

Nutrient Python SDK offers unique advantages for comprehensive document processing:

  • All-in-one solution — Replaces fragmented open source libraries with one SDK handling conversion, OCR, forms, signatures, annotations, and redaction without dependency juggling.
  • AI-powered data extraction — Automatically identify and extract structured data from invoices, receipts, and forms. OCR recognizes text in 100+ languages with preprocessing for skew and noise removal.
  • Enterprise compliance built in — Digital signatures with certificates, GDPR/HIPAA-compliant PII redaction, and PDF/A archiving integrated at the core.
  • Minimal code required — Most operations take just 3–4 lines of Python code with Pythonic context managers and intuitive APIs that feel native to Python workflows.
  • Battle-tested at scale — Proven technology from 20+ years of development, trusted by Fortune 500 companies for high-volume batch processing and mission-critical document workflows.
How does Nutrient’s Python PDF library compare to other solutions?

Nutrient Python SDK differentiates itself through comprehensive capabilities in one package. While open source libraries require combining PyPDF2 (manipulation), OCRmyPDF (text recognition), endesive (signatures), and custom scripts (redaction), Nutrient delivers all features in a unified package — bidirectional Office conversion, template automation, annotations, forms, digital signatures, OCR, redaction, data extraction, and PDF/A archiving.

The SDK integrates seamlessly with Django, Flask, and FastAPI frameworks via pip or Poetry. Unlike fragmented open source stacks, Nutrient provides enterprise-grade accuracy, performance optimization for batch processing, and dedicated support. Developers implement sophisticated document workflows in hours instead of weeks, meeting requirements for security, privacy compliance, and scalability without library incompatibility headaches.


Frequently asked questions

What makes Nutrient Python SDK different from other Python PDF libraries?

Nutrient Python SDK stands out as an all-in-one solution replacing fragmented open source libraries. Instead of combining PyPDF2 (manipulation), OCRmyPDF (text recognition), endesive (signatures), and custom redaction scripts, Nutrient delivers comprehensive capabilities in one package — bidirectional Office conversion, annotations, form filling, digital signatures, OCR, redaction, data extraction, template automation, and PDF/A archiving.

Enterprise features set Nutrient apart: certificate-based digital signatures for legal compliance, zone-based redaction that permanently removes content from the file structure for GDPR/HIPAA compliance, OCR supporting 100+ languages with preprocessing, AI-powered data extraction from invoices and forms, and form creation/extraction for automated workflows. The clean Pythonic API requires just 3–4 lines of code for most operations — no dependency juggling or library incompatibility headaches.

How does Nutrient Python SDK ensure high-fidelity document conversion?

The SDK uses proven document processing technology built on more than 20 years of development, which also powers Nutrient’s .NET SDK (formerly GdPicture) and Document Engine products. This ensures accurate format conversion that preserves complex layouts, formatting, fonts, and document structure across all supported formats — Word, Excel, PowerPoint, PDF, HTML, and Markdown.

What document processing capabilities does Nutrient Python SDK provide?

Nutrient Python SDK offers comprehensive document processing in one package. Convert bidirectionally between PDF and Office formats (Word, Excel, PowerPoint, HTML), merge different file types into unified PDFs, and generate documents from Word templates with JSON data. Add annotations (comments, highlights, stamps, shapes) for document review workflows, create and fill forms programmatically, and extract form data for database integration.

Apply electronic and certificate-based digital signatures for legal compliance and document authentication. Convert scanned documents and images to searchable PDFs with OCR supporting 100+ languages and automatic preprocessing. Protect sensitive information with redaction that permanently removes PII like credit cards, SSNs, emails, and phone numbers from the file structure for GDPR/HIPAA compliance.

Extract structured data from invoices, receipts, bank statements, and forms using AI-powered key-value pair detection. All features support PDF/A archiving for long-term document preservation, making the SDK suitable for high-volume enterprise operations. Most operations require just 3–4 lines of Python code with intuitive APIs.

Is Nutrient Python SDK suitable for enterprise-level applications?

Absolutely. The SDK is designed to be safe, reliable, and scalable, making it well-suited for enterprise applications that require robust document processing capabilities. It’s trusted by industry leaders, and it ensures compliance with various security and privacy standards. The SDK includes optimization features for memory usage and processing speed, making it suitable for server applications that need to process thousands of files.

Does Nutrient Python SDK support OCR, redaction, and data extraction?

Yes. Nutrient Python SDK includes OCR that converts scanned documents and images into searchable PDFs with text recognition in 100+ languages. The OCR engine supports 100+ file types, automatic preprocessing (deskew, noise removal, line removal), and zonal recognition for specific document regions. Convert paper archives, photos, and scanned PDFs into fully searchable digital assets with automated batch processing.

For redaction, the SDK provides zone-based redaction where you specify the areas to permanently remove from documents. Redaction removes content from the file structure (not just black boxes), ensuring GDPR, HIPAA, and privacy compliance. You can search and redact with plain text or regex patterns, coordinate-based redaction for specific areas, and batch processing for multiple documents simultaneously.

For data extraction, AI-powered key-value pair detection automatically identifies and extracts structured data from invoices, receipts, bank statements, and forms — including dates, amounts, addresses, and 18+ data types. Export extracted data to JSON, Excel, or Markdown.