---
title: "Pattern redaction with Python | Nutrient DCS"
canonical_url: "https://www.nutrient.io/guides/document-converter/document-converter-services/document-security/pattern-redaction-using-python/"
md_url: "https://www.nutrient.io/guides/document-converter/document-converter-services/document-security/pattern-redaction-using-python.md"
last_updated: "2026-05-25T06:31:34.439Z"
description: "Redact sensitive data patterns in PDFs using Python and Nutrient DCS API. Protect Personally Identifiable Information (PII) with regex-based redaction. Complete code examples included."
---

This guide demonstrates how to automatically redact sensitive information from PDF documents using regular expression patterns with Python and Nutrient Document Converter Services (DCS). Pattern redaction permanently removes sensitive content by replacing it with colored blocks, making it ideal for protecting confidential information before sharing documents.

## Common use cases

Pattern redaction is essential for:

- **Protecting personal identifiable information (PII)** - Remove social security numbers, addresses, and personal details

- **Financial data security** - Redact credit card numbers, account numbers, and financial information

- **Legal document preparation** - Remove confidential details before disclosure or filing

- **Compliance requirements** - Meet GDPR, HIPAA, and other regulatory standards for data protection

- **Public document release** - Create sanitized versions safe for external distribution

The sample code in this guide can be run in any Python environment with access to the [Zeep library](https://docs.python-zeep.org/en/master/in_depth.html#).

The Zeep library enables interaction with Web Services Description Language (WSDL), which defines how to call the web services and describes the data structures returned. Nutrient Document Converter Services (DCS) provides these WSDL definitions for text extraction and other operations.

## Prerequisites

Before implementing pattern redaction, ensure you have:

- Python 3.x installed on your system

- The Zeep library installed (`pip install zeep`)

- Nutrient Document Converter Services running locally on port 41734

- Sample PDF documents for testing redaction operations

- Basic understanding of regular expressions for pattern matching

- Appropriate file system permissions for reading input files and writing output

For initial DCS setup with Python, refer to the [using Document Converter Services with Python](https://www.nutrient.io/guides/document-converter/document-converter-services/dcs-with-python.md) guide.

## Common regex patterns

Use these regular expression patterns for typical redaction scenarios:

- **Social Security Numbers**: `\b\d{3}-\d{2}-\d{4}\b` or `\b\d{3} \d{2} \d{4}\b`

- **Credit Card Numbers**: `\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b`

- **Email Addresses**: `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b`

- **Phone Numbers**: `\b\d{3}[-.]?\d{3}[-.]?\d{4}\b` or `\(\d{3}\)\s?\d{3}-\d{4}`

- **IP Addresses**: `\b(?:\d{1,3}\.){3}\d{1,3}\b`

- **Account Numbers**: `\b\d{8,12}\b` (adjust length as needed)

> Always test your regular expression patterns thoroughly before processing production documents. Incorrect patterns may result in over-redaction or missed sensitive data.

## WSDL

Zeep extracts the following WSDL definitions:

```python

PatternHighlight(sourceFile: xsd:base64Binary, openOptions: ns2:OpenOptions, patternHighlightSettings: ns3:PatternHighlightSettings) -> PatternHighlightResult: xsd:base64Binary
PatternRedaction(sourceFile: xsd:base64Binary, openOptions: ns2:OpenOptions, PatternRedactionSettings: ns3:PatternRedactionSettings) -> PatternRedactionResult: xsd:base64Binary...
ns2:OpenOptions(UserName: xsd:string, Password: xsd:string, FileExtension: xsd:string, OriginalFileName: xsd:string, RefreshContent: xsd:boolean, AllowExternalConnections: xsd:boolean, AllowMacros: ns3:MacroSecurityOption, SystemSettings: ns5:SystemSettings, SubscriptionSettings: ns9:SubscriptionSettings)...
     ns3:PatternHighlightSettings(Alpha: xsd:unsignedByte,Red: xsd:unsignedByte, Green: xsd:unsignedByte, Blue: xsd:unsignedByte, CaseSensitive: ns3:BooleanEnum, Debug: ns3:BooleanEnum, PageRange: xsd:string, Pattern: xsd:string )
     ns3:PatternRedactionSettings(Alpha: xsd:unsignedByte,Red: xsd:unsignedByte, Green: xsd:unsignedByte, Blue: xsd:unsignedByte, CaseSensitive: ns3:BooleanEnum, Debug: ns3:BooleanEnum, PageRange: xsd:string, Pattern: xsd:string )

```

The `PatternRedaction` method requires three parameters:

- `sourceFile: xsd:base64Binary`

- `openOptions: ns2:OpenOptions`

- `PatternRedactionSettings: ns3:PatternRedactionSettings`

Use a Base64-encoded binary string for the `sourceFile`, as specified by the W3C XML schema.

Create the `openOptions` and `PatternRedactionSettings` objects using Zeep type factories (`ns2` and `ns3` namespaces).

Configure the `OpenOptions` type by specifying basic properties such as:

- `FileExtension`

- `OriginalFileName`

- Authentication details (if applicable)

The `PatternRedactionSettings` type enables you to define:

- Redaction color using RGBA byte values

- Case sensitivity and debug behavior with `BooleanEnum`

- Target `PageRange`

- Text `Pattern` to match and redact

The method returns a Base64-encoded binary representation of the redacted document.

## Sample code

The following Python code demonstrates how to redact a PDF file using a regular expression pattern:

```python

import zeep
import base64

print("Redact a PDF file using a regular expression pattern")
#Service URL.

service_url = "http://localhost:41734/Muhimbi.DocumentConverter.WebService/"

# WSDL URL.

wsdl_url = service_url+"?WSDL"

# Source file.

sourceFile = "Redaction-Test-2.pdf"

# Construct the header.

header = zeep.xsd.Element(
    "Header",
    zeep.xsd.ComplexType(
        [
            zeep.xsd.Element(
                "{http://www.w3.org/2005/08/addressing}Action", zeep.xsd.String()
            ),
            zeep.xsd.Element(
                "{http://www.w3.org/2005/08/addressing}To", zeep.xsd.String()
            ),
        ]
    ),
)

# Create a heading object.

header_value = header(Action=service_url,To=service_url)

# Create client.

client = zeep.Client(wsdl=wsdl_url)

# Load the source PDF file and encode it as Base64 string for web service transmission.

with open(sourceFile, "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read()).decode('utf-8')

# Create a factory type to construct objects with the suffix ns2 (see the WSDL).

factory = client.type_factory("ns2")

# Create the OpenOptions object with minimum settings.

open_options = factory.OpenOptions(OriginalFileName = sourceFile, FileExtension = "pdf")

# Create a factory type to construct objects with the suffix ns3 (see the WSDL).

factory2 = client.type_factory("ns3")

# Create the PatternRedactionSettings with redaction color and pattern.

# PageRange = "*" processes all pages, RGB values define redaction block color

PatternRedactionSettings = factory2.PatternRedactionSettings(PageRange = "*", Red = 0 ,Green = 0, Blue = 255, Pattern = "374245455400126")

# Call the PatternRedaction method with the required parameters.

result = client.service.PatternRedaction(encoded_string, open_options, PatternRedactionSettings)

# Write the redacted file.

with open("Redaction-Test-redacted.pdf", "wb") as f:
  f.write(result)

print("Done")

```

> In production environments, replace `localhost:41734` with your actual Document Converter Services endpoint URL and consider implementing authentication if required.

## Troubleshooting

**Pattern matching error: No patterns found**

- Verify that the pattern exists in the document content

- Test your regex pattern with online regex validators

- Ensure the pattern format matches the document’s text structure

- Check if the pattern is case-sensitive and adjust accordingly

**Service connection error: Cannot connect to DCS**

- Ensure Nutrient Document Converter Services is running on `localhost:41734`

- Check that no firewall is blocking the connection

- Verify the service URL in your code matches your DCS installation

**File access error: Permission denied**

- Verify that Python has read access to the source PDF files

- Check that the output directory has write permissions

- Ensure files aren’t locked by other applications or PDF viewers

**Redaction not visible: Pattern processed but no redaction appears**

- Verify that RGB color values create visible blocks (avoid transparent colors)

- Check that the page range includes the pages containing your pattern

- Ensure the pattern exists in the specified page range

- Test with a simpler, more visible pattern first

**Invalid file format error**

- Ensure your source file is a valid PDF document

- Check that the file path is correct and the file exists

- Verify the file isn’t corrupted or password-protected

## What’s next

Now that you can redact sensitive patterns in documents with Python, explore these related document security capabilities:

- **Pattern highlighting** - Learn how to visually mark sensitive content with [pattern highlighting using Python](https://www.nutrient.io/guides/document-converter/document-converter-services/document-security/pattern-highlighting-using-python.md)

- **C# implementation** - Compare approaches with [pattern redaction using C#](https://www.nutrient.io/guides/document-converter/document-converter-services/document-security/code-samples.md) code samples

- **Complete document security** - Explore the [document security with C#](https://www.nutrient.io/guides/document-converter/document-converter-services/document-security/csharp.md) guide for additional features

- **Text extraction** - Discover [text extraction using Python](https://www.nutrient.io/guides/document-converter/document-converter-services/extraction/extract-text-using-python.md) to analyze document content before redaction
---

## Related pages

- [Secure and protect PDFs and Office documents in C#](/guides/document-converter/document-converter-services/document-security/csharp.md)
- [Secure your PDFs and Office files easily with .NET](/guides/document-converter/document-converter-services/document-security/dotnet-core.md)
- [Sample code in C# for pattern redaction and highlighting](/guides/document-converter/document-converter-services/document-security/code-samples.md)
- [Secure PDF and Office documents with JavaScript](/guides/document-converter/document-converter-services/document-security/javascript.md)
- [Secure PDF and MS Office documents with Java](/guides/document-converter/document-converter-services/document-security/java.md)
- [Pattern redaction and highlighting with C#](/guides/document-converter/document-converter-services/document-security/pattern-redaction-and-highlighting.md)
- [Password protection for PDF documents in PHP](/guides/document-converter/document-converter-services/document-security/php.md)
- [Smart document redaction with C#](/guides/document-converter/document-converter-services/document-security/smart-redaction.md)
- [WSDL URL.](/guides/document-converter/document-converter-services/document-security/pattern-highlighting-using-python.md)
- [Secure PDF API](/guides/document-converter/document-converter-services/document-security.md)