Nutrient Python SDK

Convert PDF to Word, Excel, and PowerPoint in Python

  • Convert PDF to DOCX, XLSX, and PPTX — no Microsoft Office installation required
  • Extract tabular data from PDFs to Excel for analysis and reporting
  • Server-ready with no GUI dependencies — runs on Linux, Docker, and CI/CD pipelines
  • Extract text and images from PDFs using built-in OCR

Need pricing or implementation help? Talk to Sales.

PDF-TO-OFFICE CONVERSION

from nutrient_sdk import Document
from nutrient_sdk import NutrientException
def main():
try:
with Document.open("input.pdf") as document:
document.export_as_word("output.docx")
print("Successfully converted to output.docx")
except NutrientException as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()

USE CASES

When developers reach for this SDK

Make PDFs editable again

Contracts, reports, and proposals locked in PDFs need to go back to Word or PowerPoint for revision. The SDK converts PDF to DOCX and PPTX so teams can edit content natively.

Extract PDF tables to Excel for analysis

Financial statements, inventory reports, and data tables trapped in PDFs need to reach spreadsheets. Convert PDF to Excel and preserve tabular structure for immediate analysis.

Repurpose presentations and slide decks

PDF slide decks need to go back to PowerPoint for editing and collaboration. The SDK converts PDF pages to PPTX slides so teams can update and present content directly.

Extract text and images from scanned PDFs

Scanned documents and image-based PDFs contain valuable data locked in pixels. Use the built-in OCR engine to extract text and images for search indexing, NLP, or downstream processing.

Every PDF export you need in Python

PDF to Word

Convert PDF to DOCX in Python. The SDK parses PDF content and exports editable Word documents.


  • Extracts text, layout, and formatting into DOCX
  • Handles fonts and document structure automatically
  • One method call: export_as_word()

PDF to Excel

Convert PDF to XLSX in Python. The SDK extracts tabular data and exports it as an Excel spreadsheet.


  • Parses PDF table structures into spreadsheet cells
  • Preserves cell alignment and formatting
  • One method call: export_as_spreadsheet()

PDF to PowerPoint

Convert PDF to PPTX in Python. The SDK extracts slide content and exports editable presentations.


  • Converts each PDF page to a PowerPoint slide
  • Handles text, images, and slide layouts
  • One method call: export_as_presentation()

Export formats

ADVANCED CAPABILITIES

Beyond basic PDF export

The SDK handles more than one-off conversions. Build PDF export into automated workflows, extract structured data with OCR, and deploy anywhere Python runs.

Illustration of a PDF document being converted to Word, Excel, and PowerPoint formats
Extract text from scanned PDFs

Use the built-in OCR engine to extract text from image-based PDFs and output JSON with word-level bounding boxes — ready for search indexing or NLP pipelines.


Extract images from PDFs

Process image-based PDFs and scanned documents with Vision API. Extract embedded content for downstream processing and analysis.


Batch PDF export

Convert multiple PDFs in a loop or with Python’s concurrency tools. Each export follows the same pattern: open and export.


Server-side processing

No GUI dependencies — run PDF-to-Office conversions in background jobs, cron tasks, or API handlers on any server.


Frequently asked questions

How do I convert PDF to Word (DOCX) in Python?

Install Nutrient Python SDK. Then open your PDF with Document.open('input.pdf') and call document.export_as_word('output.docx'). The SDK parses the PDF content and exports it as an editable Word document — no Microsoft Office installation required. See the PDF-to-Word guide for a complete working example.

How do I convert PDF to Excel in Python?

Open your PDF with Document.open('input_table.pdf') and call document.export_as_spreadsheet('output.xlsx'). The SDK extracts tabular data from the PDF and writes it to an Excel spreadsheet, preserving cell alignment and formatting. See the PDF-to-Excel guide for a step-by-step example.

How do I convert PDF to PowerPoint using the Python SDK?

Use Document.open('input.pdf') and call document.export_as_presentation('output.pptx'). The SDK converts each PDF page into a PowerPoint slide, handling text, images, and layout extraction automatically. See the PDF-to-PowerPoint guide for full details.

How do I convert PDF to text in Python?

For PDFs with selectable text, export to DOCX or HTML and extract the text content. For scanned or image-based PDFs, use the built-in OCR engine: Configure VisionEngine.OCR, create a vision instance, and call vision.extract_content() to get structured JSON output with word-level text. See the OCR extraction guide.

How do I extract images from a PDF in Python?

The SDK’s Vision API processes image-based PDFs and scanned documents. Open the document, set the vision engine to VisionEngine.OCR, and call vision.extract_content() to extract embedded content as structured JSON with bounding box coordinates. See the image extraction guide.

How do I convert PDF to JSON in Python?

For tabular data extraction, use Nutrient Document Converter Services with the ExtractTables API to export PDF tables as structured JSON. For text extraction from scanned PDFs, use the Python SDK’s Vision API with VisionEngine.OCR — it outputs JSON with word-level bounding boxes and position coordinates.

Do I need Microsoft Office installed to convert PDFs to Office formats?

No. The Nutrient Python SDK handles PDF parsing, content extraction, and Office format generation internally. It converts PDF to DOCX, XLSX, and PPTX without requiring Microsoft Office on the system — ideal for Linux servers, Docker containers, and CI/CD pipelines where Office cannot be installed.

How do I handle errors during PDF export?

Wrap conversion calls in a try-except block and catch NutrientException for SDK-specific errors. Use the context manager syntax (with Document.open(...) as document:) to ensure automatic resource cleanup, even when errors occur. Each export method follows the same error handling pattern.

FREE TRIAL

Ready to get started?

Start converting PDFs to Word, Excel, and PowerPoint in Python in minutes — no payment information required.