Nutrient Python SDK
Need pricing or implementation help? Talk to Sales.
PDF-TO-OFFICE CONVERSION
from nutrient_sdk import Documentfrom nutrient_sdk import NutrientException
def main(): try: with Document.open("input.pdf") as document: document.export_as_word("output.docx") print("Successfully converted to output.docx") except NutrientException as e: print(f"Error: {e}")
if __name__ == "__main__": main()from nutrient_sdk import Documentfrom nutrient_sdk import NutrientException
def main(): try: with Document.open("input_table.pdf") as document: document.export_as_spreadsheet("output.xlsx") print("Successfully converted to output.xlsx") except NutrientException as e: print(f"Error: {e}")
if __name__ == "__main__": main()from nutrient_sdk import Documentfrom nutrient_sdk import NutrientException
def main(): try: with Document.open("input.pdf") as document: document.export_as_presentation("output.pptx") print("Successfully converted to output.pptx") except NutrientException as e: print(f"Error: {e}")
if __name__ == "__main__": main()USE CASES
Contracts, reports, and proposals locked in PDFs need to go back to Word or PowerPoint for revision. The SDK converts PDF to DOCX and PPTX so teams can edit content natively.
Financial statements, inventory reports, and data tables trapped in PDFs need to reach spreadsheets. Convert PDF to Excel and preserve tabular structure for immediate analysis.
PDF slide decks need to go back to PowerPoint for editing and collaboration. The SDK converts PDF pages to PPTX slides so teams can update and present content directly.
Scanned documents and image-based PDFs contain valuable data locked in pixels. Use the built-in OCR engine to extract text and images for search indexing, NLP, or downstream processing.
Convert PDF to DOCX in Python. The SDK parses PDF content and exports editable Word documents.
export_as_word()Convert PDF to XLSX in Python. The SDK extracts tabular data and exports it as an Excel spreadsheet.
export_as_spreadsheet()Convert PDF to PPTX in Python. The SDK extracts slide content and exports editable presentations.
export_as_presentation()ADVANCED CAPABILITIES
The SDK handles more than one-off conversions. Build PDF export into automated workflows, extract structured data with OCR, and deploy anywhere Python runs.
Use the built-in OCR engine to extract text from image-based PDFs and output JSON with word-level bounding boxes — ready for search indexing or NLP pipelines.
Process image-based PDFs and scanned documents with Vision API. Extract embedded content for downstream processing and analysis.
Convert multiple PDFs in a loop or with Python’s concurrency tools. Each export follows the same pattern: open and export.
No GUI dependencies — run PDF-to-Office conversions in background jobs, cron tasks, or API handlers on any server.
Install Nutrient Python SDK. Then open your PDF with Document.open('input.pdf') and call document.export_as_word('output.docx'). The SDK parses the PDF content and exports it as an editable Word document — no Microsoft Office installation required. See the PDF-to-Word guide for a complete working example.
Open your PDF with Document.open('input_table.pdf') and call document.export_as_spreadsheet('output.xlsx'). The SDK extracts tabular data from the PDF and writes it to an Excel spreadsheet, preserving cell alignment and formatting. See the PDF-to-Excel guide for a step-by-step example.
Use Document.open('input.pdf') and call document.export_as_presentation('output.pptx'). The SDK converts each PDF page into a PowerPoint slide, handling text, images, and layout extraction automatically. See the PDF-to-PowerPoint guide for full details.
For PDFs with selectable text, export to DOCX or HTML and extract the text content. For scanned or image-based PDFs, use the built-in OCR engine: Configure VisionEngine.OCR, create a vision instance, and call vision.extract_content() to get structured JSON output with word-level text. See the OCR extraction guide.
The SDK’s Vision API processes image-based PDFs and scanned documents. Open the document, set the vision engine to VisionEngine.OCR, and call vision.extract_content() to extract embedded content as structured JSON with bounding box coordinates. See the image extraction guide.
For tabular data extraction, use Nutrient Document Converter Services with the ExtractTables API to export PDF tables as structured JSON. For text extraction from scanned PDFs, use the Python SDK’s Vision API with VisionEngine.OCR — it outputs JSON with word-level bounding boxes and position coordinates.
No. The Nutrient Python SDK handles PDF parsing, content extraction, and Office format generation internally. It converts PDF to DOCX, XLSX, and PPTX without requiring Microsoft Office on the system — ideal for Linux servers, Docker containers, and CI/CD pipelines where Office cannot be installed.
Wrap conversion calls in a try-except block and catch NutrientException for SDK-specific errors. Use the context manager syntax (with Document.open(...) as document:) to ensure automatic resource cleanup, even when errors occur. Each export method follows the same error handling pattern.
FREE TRIAL
Start converting PDFs to Word, Excel, and PowerPoint in Python in minutes — no payment information required.