Smart document redaction with C#

This guide explains how to use the smart redaction feature in the Document Converter API to identify and redact sensitive information such as credit card numbers, email addresses, phone numbers, and more. Protect your documents efficiently while maintaining compliance with security standards.

What is smart redaction?

Smart redaction enables you to automatically locate and redact predefined sensitive data types within documents. You can customize the redaction settings to meet your specific security and privacy needs.

Common use cases

Smart redaction is ideal for:

  • Financial document processing - Redact credit card numbers, International Bank Account Number (IBANs), and account information
  • Legal document preparation - Remove personal identifiers before discovery or public filing
  • Healthcare record sanitization - Protect patient information in compliance with Health Insurance Portability and Accountability Act (HIPAA) regulations
  • Corporate document sharing - Clean internal documents before external distribution
  • Data loss prevention - Scan and protect documents containing sensitive information
  • Compliance automation - Meet regulatory requirements for data protection

Smart redaction vs pattern redaction

Smart redaction automatically detects predefined patterns for common sensitive data types. Choose smart redaction when you need:

  • Setup without custom regex patterns
  • Standard compliance for common sensitive data types (credit cards, emails, phone numbers)
  • Automated processing of large document volumes
  • Built-in accuracy for well-established data formats

Pattern redaction uses custom regular expressions for specific patterns. Choose pattern redaction when you need:

  • Custom sensitive data formats specific to your organization
  • More control over redaction patterns and rules
  • Industry-specific data types not covered by smart redaction
  • Fine-tuned pattern matching for specialized use cases

Learn more about pattern redaction and highlighting for custom redaction scenarios.

Smart redaction automatically detects common sensitive data types without requiring custom regular expressions. For organization-specific data formats, consider using pattern redaction instead.

Prerequisites

Before implementing smart redaction, ensure you have:

  • Nutrient Document Converter Services installed and running
  • .NET Framework or .NET Core development environment
  • Document files containing sensitive information for testing
  • Implemented OpenService and CloseService methods from the DocumentConverterServiceClient sample code
  • Appropriate file system permissions for reading input files and writing output
  • Understanding of common sensitive data types (credit cards, emails, phone numbers)

For initial DCS setup, refer to the Document Converter Services installation guide.

Smart redaction properties

The Nutrient API offers a comprehensive set of properties for smart redaction. These options allow you to specify the types of sensitive information to redact and customize the output format, ensuring your documents meet compliance and security standards.

With these properties, you can fully customize how sensitive information is handled, whether redacting specific data types or applying configurations such as custom colors.

PropertyDescription
RedactCreditCardNumbersSpecifies whether credit card numbers will be redacted.
RedactEmailAddressesSpecifies whether email addresses will be redacted.
RedactIBANsSpecifies whether International Bank Account Numbers (IBANs) will be redacted.
RedactPhoneNumbersSpecifies whether phone numbers will be redacted.
RedactURIsSpecifies whether Uniform Resource Identifiers (URIs) will be redacted.
RedactVATIDsSpecifies whether Value-Added Tax (VAT) IDs will be redacted.
RedactVehicleIdentificationNumbersSpecifies whether vehicle identification numbers will be redacted.
MarkColorThe color used to cover redacted information.
DictionariesList of language codes, linked by ‘+’. For example: ”ENG+FRA”
DetectOrientationSpecifies whether orientation will be detected automatically.
PageRange*Range of pages to redact; use “*” for all pages.
RedactSocialSecurityNumbersSpecifies whether social security numbers will be redacted. This property is currently available in preview.
RedactPostalAddressesSpecifies whether postal addresses will be redacted.

Example: Implementing smart redaction with the API

This example demonstrates how to use the API to redact credit card numbers and email addresses in a specific page range:

/// <summary>
/// Perform smart redaction on the supplied file, writing the result into the target folder.
/// </summary>
/// <param name="ServiceURL">URL endpoint for the PDF Converter service.</param>
/// <param name="sourceFileName">Source filename.</param>
/// <param name="targetFolder">Target folder to receive the output file.</param>
static void SmartRedaction(string ServiceURL, string sourceFileName, string targetFolder)
{
DocumentConverterServiceClient client = null;
try
{
// Create minimum `OpenOptions` object.
OpenOptions openOptions = new OpenOptions();
openOptions.OriginalFileName = Path.GetFileName(sourceFileName);
// Create minimum `SmartRedactionSettings`.
SmartRedactionSettings smartRedactionSettings = new SmartRedactionSettings();
smartRedactionSettings = new SmartRedactionSettings();
// Set what needs to be redacted.
smartRedactionSettings.RedactCreditCardNumbers = BooleanEnum.True;
smartRedactionSettings.RedactEmailAddresses = BooleanEnum.True;
smartRedactionSettings.RedactPhoneNumbers = BooleanEnum.True;
// Create target folder if required.
if (!Directory.Exists(targetFolder))
{
Directory.CreateDirectory(targetFolder);
}
// ** Read the source file into a byte array.
byte[] sourceFile = File.ReadAllBytes(sourceFileName);
// ** Open the service and configure the bindings.
client = OpenService(ServiceURL);
// ** Carry out the conversion.
byte[] result = client.SmartRedaction(sourceFile, openOptions, smartRedactionSettings);
// ** Save the results.
if (result != null)
{
if (!Directory.Exists(targetFolder))
{
Directory.CreateDirectory(targetFolder);
}
string filename = Path.GetFileNameWithoutExtension(sourceFileName);
string destinationFileName = Path.GetFullPath(Path.Combine(targetFolder, filename + "-redacted.pdf"));
using (FileStream fs = File.Create(destinationFileName))
{
fs.Write(result, 0, result.Length);
fs.Close();
}
Console.WriteLine("File converted to " + destinationFileName);
}
else
{
Console.WriteLine("Nothing returned");
}
}
catch (FaultException<WebServiceFaultException> ex)
{
Console.WriteLine($"FaultException occurred: ExceptionType: {ex.Detail.ExceptionType.ToString()}");
Console.WriteLine();
Console.WriteLine($"Error Detail: {string.Join(Environment.NewLine, ex.Detail.ExceptionDetails)}");
Console.WriteLine($"Error message: {ex.Message}");
Console.WriteLine();
Console.WriteLine($"Error reason: {ex.Reason}");
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
Console.WriteLine(ex.Data.ToString());
}
finally
{
if (client != null)
{
CloseService(client);
}
}
}

For a practical demonstration, refer to the sample code for smart redaction.

Always test smart redaction on sample documents before processing production files. Verify that all sensitive information is detected and redacted according to your security requirements.

Troubleshooting

Service connection error: Cannot connect to DCS

  • Ensure Nutrient Document Converter Services is running and accessible
  • Verify the service URL in your code matches your DCS installation
  • Check that no firewall is blocking the connection

No redaction results: Smart redaction processes but nothing is redacted

  • Verify that the document contains the specified sensitive data types
  • Check that the sensitive information format matches standard patterns (e.g., valid credit card number formats)
  • Ensure the page range includes pages containing sensitive data
  • Enable debug mode to see detailed processing information

File access error: Permission denied

  • Verify that the application has read access to source documents
  • Check that the output directory has write permissions
  • Ensure documents aren’t password-protected or locked by other applications

Performance issues: Slow processing

  • For large documents, consider processing in batches or using specific page ranges
  • Limit the number of redaction types to only what’s necessary
  • Monitor memory usage for large document sets

Detection accuracy issues: False positives or missed items

  • Smart redaction works best with standard formats; consider pattern redaction for custom formats
  • Test with different language dictionaries if processing non-English documents
  • Use DetectOrientation property for scanned documents with rotation issues

What’s next

Now that you understand smart document redaction with C#, explore these related document security capabilities: