Pattern redaction and highlighting with C#
This guide demonstrates how to implement document security using pattern redaction and highlighting with Nutrient Document Converter Services (DCS). These complementary techniques help protect sensitive information while maintaining document usability.
Understanding redaction vs highlighting
Pattern redaction permanently removes sensitive content by replacing it with colored blocks. This approach is ideal for:
- Protecting personal identifiable information (PII) in published documents
- Compliance with data protection regulations (GDPR, HIPAA)
- Creating sanitized versions for public distribution
- Preventing unauthorized access to confidential data
Pattern highlighting visually marks sensitive content without removing it. This approach is useful for:
- Document review workflows where content needs verification
- Training materials where sensitive data should be visible but marked
- Audit processes requiring visibility of flagged information
- Quality assurance checks before final redaction
Both techniques use regular expressions to identify patterns such as social security numbers, credit card numbers, or custom sensitive data formats.
Choosing the right approach
Use pattern redaction when:
- Documents will be shared externally or published
- Regulatory compliance requires permanent data removal (GDPR "right to be forgotten")
- Sensitive information must be completely inaccessible
- Creating sanitized versions for public distribution
Use pattern highlighting when:
- Internal review processes require content verification
- Audit trails need to show what was flagged before removal
- Training or educational materials need visible sensitive data markers
- Quality assurance workflows require human verification before final redaction
Common regex patterns
- Social Security Numbers:
\b\d{3}-\d{2}-\d{4}\b
- Credit Card Numbers:
\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b
- Email Addresses:
\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
- Phone Numbers:
\b\d{3}[-.]?\d{3}[-.]?\d{4}\b
Prerequisites
Before implementing pattern redaction and highlighting, ensure you have:
- Nutrient Document Converter Services installed and running
- .NET Framework or .NET Core development environment
- Sample PDF documents for testing security operations
- Implemented
OpenService
andCloseService
methods from theDocumentConverterServiceClient
sample code - Basic understanding of regular expressions for pattern matching
- Appropriate file system permissions for reading input files and writing output
For initial DCS setup, refer to the Document Converter Services installation guide.
Pattern redaction and highlighting with the API
The following table outlines the properties used in pattern redaction and highlighting operations:
Property | Description |
---|---|
Debug | Debug mode gives additional logging information. |
Pattern | Regular expression pattern for the text to be redacted/highlighted. |
CaseSensitive | The regular expression is case sensitive. |
Red | Red component of the highlight/redaction color. Range 0–255. |
Green | Green component of the highlight/redaction color. Range 0–255. |
Blue | Blue component of the highlight/redaction color. Range 0–255. |
Alpha | Alpha value, only used in the pattern highlighting operation. Range 0–255, fixed at 255 for redaction. |
Example
The following example demonstrates how to perform pattern redaction using the API:
/// <summary> /// Perform pattern redaction on the supplied file, writing the result into the target folder. /// </summary> /// <param name="ServiceURL">URL endpoint for the PDF Converter service.</param> /// <param name="sourceFileName">Source filename.</param> /// <param name="targetFolder">Target folder to receive the output file.</param> static void PatternRedaction(string ServiceURL, string sourceFileName, string targetFolder) { DocumentConverterServiceClient client = null; try { // Create minimum `OpenOptions` object. OpenOptions openOptions = new OpenOptions(); openOptions.OriginalFileName = Path.GetFileName(sourceFileName);
// Create minimum `PatternRedactionSettings`. PatternRedactionSettings patternRedactionSettings = new PatternRedactionSettings(); // Set what needs to be redacted. patternRedactionSettings.Red = 0; patternRedactionSettings.Green = 0; patternRedactionSettings.Blue = 255; patternRedactionSettings.Pattern = "\"374245455400126\"";
// Create target folder if required. if (!Directory.Exists(targetFolder)) { Directory.CreateDirectory(targetFolder); } // ** Read the source file into a byte array. byte[] sourceFile = File.ReadAllBytes(sourceFileName);
// ** Open the service and configure the bindings. client = OpenService(ServiceURL);
// ** Carry out the conversion. byte[] result = client.PatternRedaction(sourceFile, openOptions, patternRedactionSettings);
// ** Save the results. if (result != null) { if (!Directory.Exists(targetFolder)) { Directory.CreateDirectory(targetFolder); } string filename = Path.GetFileNameWithoutExtension(sourceFileName); string destinationFileName = Path.GetFullPath(Path.Combine(targetFolder, filename + "-redacted.pdf")); using (FileStream fs = File.Create(destinationFileName)) { fs.Write(result, 0, result.Length); fs.Close(); } Console.WriteLine("File converted to " + destinationFileName); // Open the destination file. ProcessStartInfo psi = new ProcessStartInfo(); psi.FileName = destinationFileName; psi.UseShellExecute = true; Process.Start(psi); } else { Console.WriteLine("Nothing returned"); } } catch (FaultException<WebServiceFaultException> ex) { Console.WriteLine($"FaultException occurred: ExceptionType: {ex.Detail.ExceptionType.ToString()}"); Console.WriteLine(); Console.WriteLine($"Error Detail: {string.Join(Environment.NewLine, ex.Detail.ExceptionDetails)}"); Console.WriteLine($"Error message: {ex.Message}"); Console.WriteLine(); Console.WriteLine($"Error reason: {ex.Reason}"); } catch (Exception ex) { Console.WriteLine(ex.Message); Console.WriteLine(ex.StackTrace); Console.WriteLine(ex.Data.ToString()); } finally { if (client != null) { CloseService(client);
} }
}
For a practical demonstration, refer to the sample codes for pattern redaction and highlighting.
Troubleshooting
Pattern matching error: Pattern not found
- Verify that the pattern exists in the document content
- Check pattern syntax and escape special characters
- Ensure the pattern format matches the document content structure
- Test your regex pattern with online regex validators
Security operation error: Permission denied
- Verify that the application has read access to source documents
- Check that the output directory has write permissions
- Ensure documents aren’t password-protected or locked by other applications
Service connection error: Cannot connect to DCS
- Ensure Nutrient Document Converter Services is running and accessible
- Verify the service URL in your code matches your DCS installation
- Check that no firewall is blocking the connection
Color configuration issues: Redaction/highlighting not visible
- For redaction — Verify RGB values create visible blocks (avoid transparent colors)
- For highlighting — Ensure Alpha value is appropriate (128 recommended for semi-transparency)
- Check that color values are distinct from document background
- Test with high-contrast colors first (for example, red for redaction, yellow for highlighting)
Performance issues: Slow processing
- Consider processing documents in batches for large volumes
- Optimize pattern complexity to reduce processing time
- Monitor memory usage for large documents
- Use page ranges to limit processing scope when appropriate
What’s next
Now that you understand pattern redaction and highlighting concepts with C#, explore these related document security capabilities:
- Complete code examples - Review comprehensive sample code in C# for pattern redaction and highlighting with detailed implementations
- Python implementation - Compare approaches with pattern highlighting using Python for cross-language insights
- Smart redaction - Discover automated sensitive data detection using smart redaction for advanced security workflows
- Document security overview - Explore the complete document security with C# guide for more features