Import the zeep library to access WSDL
Document Converter Services (DCS) is a highly versatile system that can be integrated with various environments. Python, a widely adopted programming language, is often used both independently and within Jupyter Notebooks to interact with DCS.
Setting up
The Python examples mentioned in this guide uses the Zeep library, which simplifies working with WSDL in Python. For more information, refer to the Zeep documentation(opens in a new tab).
Accessing the WSDL
There is no straightforward method for consuming the published Nutrient Web Services Description Language (WSDL) for DCS.
To begin, open a PowerShell terminal (Tools > Command Line > PowerShell).
Although instructions typically reference python
, it currently works when using py.exe
:
py.exe -mzeep http://localhost:<port_number>/Muhimbi.DocumentConverter.WebService/?WSDL
Note the namespace prefixes in the WSDL. These are necessary when working with factory objects, as demonstrated in the data structures section further down in this guide.
Retrieving configuration details
The example below demonstrates how to call the web service to retrieve configuration details:
# Import the zeep library to access WSDLimport zeep
print("Test WSDL")
#Service URLservice_url = "http://localhost:41734/Muhimbi.DocumentConverter.WebService/"# WSDL URLwsdl_url = service_url+"?WSDL"
# Create the header structureheader = zeep.xsd.Element( "Header", zeep.xsd.ComplexType( [ zeep.xsd.Element( "{http://www.w3.org/2005/08/addressing}Action", zeep.xsd.String() ), zeep.xsd.Element( "{http://www.w3.org/2005/08/addressing}To", zeep.xsd.String() ), ] ),)
# Create a header objectheader_value = header(Action=service_url,To=service_url)#create client object using the headerclient = zeep.Client(wsdl=wsdl_url)
# Get the configuration information from the serverresult = client.service.GetConfiguration()
# Print the returned dataprint(result)print("Done")
Data structures
DCS operations often require complex parameters. You can construct these parameters using the client.type_factory
method, along with the appropriate WSDL prefix:
# Create clientclient = zeep.Client(wsdl=wsdl_url)# Create a factory type to construct objects with the suffix ns2 (see the WSDL)factory = client.type_factory("ns2")# Create a DiagnosticRequestItem with the property ConverterName set to a converter typerequest_item = factory.DiagnosticRequestItem(ConverterName = "WordProcessing")
Retrieving diagnostic status
The following example retrieves the diagnostic status for the WordProcessing
converter:
import zeepfrom array import arrayprint("Get diagnostic for a converter")#Service URLservice_url = "http://localhost:41734/Muhimbi.DocumentConverter.WebService/"# WSDL URLwsdl_url = service_url+"?WSDL"
# Construct the headerheader = zeep.xsd.Element( "Header", zeep.xsd.ComplexType( [ zeep.xsd.Element( "{http://www.w3.org/2005/08/addressing}Action", zeep.xsd.String() ), zeep.xsd.Element( "{http://www.w3.org/2005/08/addressing}To", zeep.xsd.String() ), ] ),)# Create a heading objectheader_value = header(Action=service_url,To=service_url)# Create clientclient = zeep.Client(wsdl=wsdl_url)# Create a factory type to construct objects with the suffix ns2 (see the WSDL)factory = client.type_factory("ns2")# Create a DiagnosticRequestItem with the property ConverterName set to a converter typerequest_item = factory.DiagnosticRequestItem(ConverterName = "WordProcessing")# Create an array of DiagnosticRequestItem objects (in this case just one)request_items = factory.ArrayOfDiagnosticRequestItem(request_item)# Call the DCS GetDiagnostics operation with the selected conversion typeresult = client.service.GetDiagnostics(request_items)
print(result)print("Done")
Output:

Extracting key-value pairs from a PDF
To extract key-value pairs from a PDF, start by identifying the. ExtractKeyValuePairs
web service method name. You can locate it in the WSDL file:
ns1:ExtractKeyValuePairs(sourceFile: xsd:base64Binary, openOptions: ns2:OpenOptions, extractKeyValuePairsSettings: ns2:KVPSettings)
This method uses the ns1
prefix and takes the following three parameters:
sourceFile
—xsd:base64Binary
openOptions
—ns2:OpenOptions
extractKeyValuePairsSettings
—ns2:KVPSettings
Important things to note
The
sourceFile
parameter expects a W3 XML schema Base64-encoded binary representation of the document. The subsequent parameters,openOptions
andextractKeyValuePairsSettings
, are custom Nutrient (formerly Muhimbi) types. Instantiation of these types necessitates a factory generated using thens2
prefix, as defined within the WSDL document. Examination of the WSDL will reveal the constituent properties of these Nutrient types.The
openOptions
type presents a straightforward structure, requiring minimal configuration through its fundamental properties: file name and extension.Conversely, the
KVPSettings
type encompasses several complex properties; however, theKVPFormat
property is the sole mandatory attribute for this operation.Due to the
ns3
prefix associated with theKVPFormat
type, a distinct factory instance, specific to this namespace, will be required for its creation.
factory2 = client.type_factory("ns3")KVPOutputFormat = factory2.KVPOutputFormat(1)
Sample code:
import zeepimport base64import lxml.etree as etree
print("Get Key-Value Pairs for a file")#Service URLservice_url = "http://localhost:41734/Muhimbi.DocumentConverter.WebService/"# WSDL URLwsdl_url = service_url+"?WSDL"
# Construct the headerheader = zeep.xsd.Element( "Header", zeep.xsd.ComplexType( [ zeep.xsd.Element( "{http://www.w3.org/2005/08/addressing}Action", zeep.xsd.String() ), zeep.xsd.Element( "{http://www.w3.org/2005/08/addressing}To", zeep.xsd.String() ), ] ),)# Create a heading objectheader_value = header(Action=service_url,To=service_url)# Create clientclient = zeep.Client(wsdl=wsdl_url)# Create a factory type to construct objects with the suffix ns2 (see the WSDL)factory = client.type_factory("ns2")# Create a factory type to construct objects with the suffix ns3 (see the WSDL)factory2 = client.type_factory("ns3")
# Create the OpenOptions object with the minimum settingsopen_options = factory.OpenOptions(OriginalFileName = "Three-in-one invoice.pdf", FileExtension = "pdf")
# Create the KVP Output Format objectKVPOutputFormat = factory2.KVPOutputFormat("XML")# Create the expected keys stringexpectedKeys = "[{\"expectedKey\":\"Name\",\"synonyms\":[\"name\"]},{\"expectedKey\":\"grand total\",\"synonyms\":[\"total\"]},{\"expectedKey\":\"invoice number\",\"synonyms\":[\"invoice no\"]}]"# Create the KVP Settings objectKVPSettings = factory.KVPSettings(OCRLanguage = "eng", KVPFormat = KVPOutputFormat, DPI = 300, ExpectedKeys = expectedKeys)
# Load the source file as a Base64 stringwith open("Three-in-one invoice.pdf", "rb") as image_file: encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
# Call the ExtractKeyValuePairs method of the service with the required parametersresult = client.service.ExtractKeyValuePairs(encoded_string, open_options, KVPSettings)
# Load the result XML into an lxml etree objecttree = etree.fromstring(result)
# Find all the KVPData elementskvpData = tree.xpath('/ArrayOfKVPData/KVPData')# For each KVPData elementfor x in kvpData: # Get the key and value elements' text key = x.xpath('Key')[0].text value = x.xpath('Value')[0].text # Print the key and value print(key, ' - ', value)
print("Done")
Output:

References
Create a Python project in Visual Studio(opens in a new tab)
Install Python packages in Visual Studio(opens in a new tab)