Smart document redaction with C#
This guide explains how to use the smart redaction feature in the Document Converter API to identify and redact sensitive information such as credit card numbers, email addresses, phone numbers, and more. Protect your documents efficiently while maintaining compliance with security standards.
What is smart redaction?
Smart redaction enables you to automatically locate and redact predefined sensitive data types within documents. You can customize the redaction settings to meet your specific security and privacy needs.
Common use cases
Smart redaction is ideal for:
- Financial document processing - Redact credit card numbers, International Bank Account Number (IBANs), and account information
- Legal document preparation - Remove personal identifiers before discovery or public filing
- Healthcare record sanitization - Protect patient information in compliance with Health Insurance Portability and Accountability Act (HIPAA) regulations
- Corporate document sharing - Clean internal documents before external distribution
- Data loss prevention - Scan and protect documents containing sensitive information
- Compliance automation - Meet regulatory requirements for data protection
Smart redaction vs pattern redaction
Smart redaction automatically detects predefined patterns for common sensitive data types. Choose smart redaction when you need:
- Setup without custom regex patterns
- Standard compliance for common sensitive data types (credit cards, emails, phone numbers)
- Automated processing of large document volumes
- Built-in accuracy for well-established data formats
Pattern redaction uses custom regular expressions for specific patterns. Choose pattern redaction when you need:
- Custom sensitive data formats specific to your organization
- More control over redaction patterns and rules
- Industry-specific data types not covered by smart redaction
- Fine-tuned pattern matching for specialized use cases
Learn more about pattern redaction and highlighting for custom redaction scenarios.
Smart redaction automatically detects common sensitive data types without requiring custom regular expressions. For organization-specific data formats, consider using pattern redaction instead.
Prerequisites
Before implementing smart redaction, ensure you have:
- Nutrient Document Converter Services installed and running
- .NET Framework or .NET Core development environment
- Document files containing sensitive information for testing
- Implemented
OpenService
andCloseService
methods from theDocumentConverterServiceClient
sample code - Appropriate file system permissions for reading input files and writing output
- Understanding of common sensitive data types (credit cards, emails, phone numbers)
For initial DCS setup, refer to the Document Converter Services installation guide.
Smart redaction properties
The Nutrient API offers a comprehensive set of properties for smart redaction. These options allow you to specify the types of sensitive information to redact and customize the output format, ensuring your documents meet compliance and security standards.
With these properties, you can fully customize how sensitive information is handled, whether redacting specific data types or applying configurations such as custom colors.
Property | Description |
---|---|
RedactCreditCardNumbers | Specifies whether credit card numbers will be redacted. |
RedactEmailAddresses | Specifies whether email addresses will be redacted. |
RedactIBANs | Specifies whether International Bank Account Numbers (IBANs) will be redacted. |
RedactPhoneNumbers | Specifies whether phone numbers will be redacted. |
RedactURIs | Specifies whether Uniform Resource Identifiers (URIs) will be redacted. |
RedactVATIDs | Specifies whether Value-Added Tax (VAT) IDs will be redacted. |
RedactVehicleIdentificationNumbers | Specifies whether vehicle identification numbers will be redacted. |
MarkColor | The color used to cover redacted information. |
Dictionaries | List of language codes, linked by ‘+’. For example: ”ENG+FRA” |
DetectOrientation | Specifies whether orientation will be detected automatically. |
PageRange* | Range of pages to redact; use “*” for all pages. |
RedactSocialSecurityNumbers | Specifies whether social security numbers will be redacted. This property is currently available in preview. |
RedactPostalAddresses | Specifies whether postal addresses will be redacted. |
Example: Implementing smart redaction with the API
This example demonstrates how to use the API to redact credit card numbers and email addresses in a specific page range:
/// <summary> /// Perform smart redaction on the supplied file, writing the result into the target folder. /// </summary> /// <param name="ServiceURL">URL endpoint for the PDF Converter service.</param> /// <param name="sourceFileName">Source filename.</param> /// <param name="targetFolder">Target folder to receive the output file.</param> static void SmartRedaction(string ServiceURL, string sourceFileName, string targetFolder) { DocumentConverterServiceClient client = null; try { // Create minimum `OpenOptions` object. OpenOptions openOptions = new OpenOptions(); openOptions.OriginalFileName = Path.GetFileName(sourceFileName);
// Create minimum `SmartRedactionSettings`. SmartRedactionSettings smartRedactionSettings = new SmartRedactionSettings(); smartRedactionSettings = new SmartRedactionSettings(); // Set what needs to be redacted. smartRedactionSettings.RedactCreditCardNumbers = BooleanEnum.True; smartRedactionSettings.RedactEmailAddresses = BooleanEnum.True; smartRedactionSettings.RedactPhoneNumbers = BooleanEnum.True;
// Create target folder if required. if (!Directory.Exists(targetFolder)) { Directory.CreateDirectory(targetFolder); } // ** Read the source file into a byte array. byte[] sourceFile = File.ReadAllBytes(sourceFileName);
// ** Open the service and configure the bindings. client = OpenService(ServiceURL);
// ** Carry out the conversion. byte[] result = client.SmartRedaction(sourceFile, openOptions, smartRedactionSettings);
// ** Save the results. if (result != null) { if (!Directory.Exists(targetFolder)) { Directory.CreateDirectory(targetFolder); } string filename = Path.GetFileNameWithoutExtension(sourceFileName); string destinationFileName = Path.GetFullPath(Path.Combine(targetFolder, filename + "-redacted.pdf")); using (FileStream fs = File.Create(destinationFileName)) { fs.Write(result, 0, result.Length); fs.Close(); } Console.WriteLine("File converted to " + destinationFileName); }
else { Console.WriteLine("Nothing returned"); } } catch (FaultException<WebServiceFaultException> ex) { Console.WriteLine($"FaultException occurred: ExceptionType: {ex.Detail.ExceptionType.ToString()}"); Console.WriteLine(); Console.WriteLine($"Error Detail: {string.Join(Environment.NewLine, ex.Detail.ExceptionDetails)}"); Console.WriteLine($"Error message: {ex.Message}"); Console.WriteLine(); Console.WriteLine($"Error reason: {ex.Reason}"); } catch (Exception ex) { Console.WriteLine(ex.Message); Console.WriteLine(ex.StackTrace); Console.WriteLine(ex.Data.ToString()); } finally { if (client != null) { CloseService(client);
} }
}
For a practical demonstration, refer to the sample code for smart redaction.
Always test smart redaction on sample documents before processing production files. Verify that all sensitive information is detected and redacted according to your security requirements.
Troubleshooting
Service connection error: Cannot connect to DCS
- Ensure Nutrient Document Converter Services is running and accessible
- Verify the service URL in your code matches your DCS installation
- Check that no firewall is blocking the connection
No redaction results: Smart redaction processes but nothing is redacted
- Verify that the document contains the specified sensitive data types
- Check that the sensitive information format matches standard patterns (e.g., valid credit card number formats)
- Ensure the page range includes pages containing sensitive data
- Enable debug mode to see detailed processing information
File access error: Permission denied
- Verify that the application has read access to source documents
- Check that the output directory has write permissions
- Ensure documents aren’t password-protected or locked by other applications
Performance issues: Slow processing
- For large documents, consider processing in batches or using specific page ranges
- Limit the number of redaction types to only what’s necessary
- Monitor memory usage for large document sets
Detection accuracy issues: False positives or missed items
- Smart redaction works best with standard formats; consider pattern redaction for custom formats
- Test with different language dictionaries if processing non-English documents
- Use DetectOrientation property for scanned documents with rotation issues
What’s next
Now that you understand smart document redaction with C#, explore these related document security capabilities:
- Pattern redaction - Create custom redaction patterns with pattern redaction and highlighting for organization-specific sensitive data
- Complete code examples - Review comprehensive sample code in C# for pattern redaction and highlighting with detailed implementations
- Python implementation - Compare approaches with pattern redaction using Python for cross-language insights
- Document security overview - Explore the complete document security with C# guide for additional protection features