Automate document conversion at scale with Python and Nutrient DCS
Table of contents
Nutrient Document Converter Services is a cloud-based and self-hosted document conversion platform that transforms more than 100 file formats into pixel-perfect PDFs. Available via REST API or as a self-hosted Windows service, DCS handles Microsoft Office, InfoPath, CAD, HTML, and many other file types while preserving fonts, graphics, links, and metadata.
Key capabilities
- Format conversion — Transform 100+ file types to PDF.
- Post-processing — Watermark, merge, split, and compress in single jobs.
- OCR — Make scanned documents searchable and editable.
- Data extraction — Extract key-value pairs from PDFs and images.
- Security — Add encryption and create PDF/A archival formats.
This tutorial shows you how to integrate DCS with Python(opens in a new tab) using the Zeep library to consume the SOAP web service, enabling you to automate document workflows at enterprise scale. This approach is particularly useful for Jupyter Notebook(opens in a new tab) and data science workflows.
What you’ll learn
- How to set up DCS and Python with Visual Studio
- How to use the Zeep library to consume WSDL web services
- How to connect to DCS and create type factories for document processing
Setting up Document Converter Services (DCS)
DCS runs as a Windows service and can be invoked from any platform and language that supports web service calls.
Deployment options
- Separate server — Deploy on a dedicated system or virtual machine.
- Same server — Install on the same server that hosts your application.
DCS must be installed on a Windows-based machine.
Scaling options
- Scale up — Increase the number of parallel conversions.
- Scale out — Use multiple DCS installations behind HTTP load balancers.
For detailed installation and configuration steps, see our getting started guides.
Setting up Python
This example uses Visual Studio (2022) to build the Python project. Other Python development platforms on other operating systems will work just as well.
Installation steps
If you haven’t already installed Python support in Visual Studio(opens in a new tab), use the Visual Studio installer to download and install the Python workload:
- Start Visual Studio installer(opens in a new tab) and select your version of Visual Studio.
- Select the Python development workload and click Install.
- Once the installation is complete, verify it’s working:
- Open a Python Interactive window using Alt-I.
- Enter
3+2and press Enter. - If it returns
5, the installation is successful.
Understanding WSDL and Zeep
Web Services Description Language (WSDL(opens in a new tab)) is used by web services like DCS to describe their available functions and data structures. Think of it as an API documentation file that’s machine-readable, telling your code exactly how to communicate with the service.
The challenge
While languages like C# have built-in tools to automatically convert WSDL into usable code, Python doesn’t have a straightforward native solution.
The solution
The Zeep library(opens in a new tab) bridges this gap by providing:
- Client creation — Automatically generates a client from the WSDL
- Type factories — Creates Python objects matching the web service’s data structures
- Easy integration — Simplifies SOAP web service consumption in Python
Setting up Zeep
Zeep is available from PyPi and GitHub, but for Visual Studio development, it’s best to use the built-in installation procedure(opens in a new tab).
Installation steps
- In Visual Studio, go to View > Other Windows > Python Environment to display the Python environments.
- Select your default environment for new Python packages.
- Click the dropdown menu (it starts out showing Overview) and select Packages (PyPi).
- Enter
zeepin the search box. - Run
pip install zeep.
You can install Zeep in any environment; Visual Studio is shown here only as an example.
This installs Zeep and any required dependencies. After installation completes, the Python Environments window will show the Zeep package in the available packages list.
Accessing and using the WSDL
Now that Zeep is installed, you can inspect the DCS WSDL to understand the available services.
Inspecting the WSDL
Open a PowerShell terminal in Visual Studio: Tools > Command Line > PowerShell.
Run the Zeep inspection command. Change the URL if you’re using a different port or developing on another machine:
Terminal window py.exe -mzeep http://localhost:41734/Muhimbi.DocumentConverter.WebService/?WSDLThe URL above still displays the legacy Muhimbi Document Converter name. This is simply a holdover from before the rebrand to Nutrient DCS.
This outputs the available methods and properties. Here’s a sample:
Terminal window ...ns3:ImageQualityns3:KVPOutputFormatns3:MSGBestBodyModens3:MSGEmailAddressDisplayMode...
Understanding namespaces and factories
Notice the ns3: prefix in the output? This is a namespace identifier. Zeep uses these namespaces to organize different types of objects from the WSDL.
To create objects from a specific namespace, you need to create a type factory for that namespace:
# Create a factory for the ns3 namespace.factory2 = client.type_factory("ns3")
# Now you can create objects from that namespace.# For example, creating a KVP output format object:KVPOutputFormat = factory2.KVPOutputFormat("XML")Why this matters
Different settings objects may belong to different namespaces (ns1, ns2, ns3, etc.). Always check the WSDL output to identify which namespace prefix each object uses, and then create the appropriate factory.
Code examples
We have a number of code examples on our website:
- Using Document Converter Services with Python
- Extract PDF text with Python
- Extract PDF tables to JSON using Python
- Create Word documents using templates in Python
- Pattern redaction with Python
- Pattern highlighting with Python
Conclusion
Integrating Nutrient Document Converter Services with Python opens up powerful document automation capabilities for your applications. By using the Zeep library to consume the DCS SOAP web service, you can programmatically convert, process, and manipulate documents at scale without the complexity of building these features from scratch.
Some of the key benefits of this approach include:
- Enterprise-scale automation — Handle high-volume document processing.
- Simplified integration — Zeep abstracts away SOAP complexity.
- Python ecosystem — Leverage Python’s rich libraries and simplicity.
- Focus on business logic — Let DCS handle format conversion, OCR, and security.
This setup is particularly valuable for data science workflows, enterprise document processing pipelines, and automation scenarios where Python’s ecosystem makes it the ideal choice.
Next steps
- Explore our complete code examples for Document Converter Services with Python.
- Review the comprehensive DCS guides for advanced features.
- Learn about OCR, data extraction, and document security.