Extract PDF tables with C#

This guide demonstrates how to extract tabular data from PDF documents using C# and Nutrient Document Converter Services (DCS). Table extraction converts structured data from PDFs into JSON format, making it accessible for data analysis, reporting, and integration workflows.

Common use cases

PDF table extraction is useful for:

  • Financial data processing - Extract tables from invoices, statements, and reports for automated accounting workflows
  • Research and analysis - Convert tabular data from research papers and reports into JSON for statistical analysis
  • Document digitization - Transform scanned documents with tables into structured, searchable format
  • Compliance reporting - Extract regulatory data from PDF forms into structured format for auditing
  • Data migration - Recover tabular information from legacy PDF documents for database import

Prerequisites

Before extracting tables from PDFs, ensure you have:

  • Nutrient Document Converter Services (DCS) installed, licensed, and running
  • .NET Framework 4.6.1+ or .NET Core 2.0+ development environment
  • Valid DCS license that includes table extraction functionality
  • Implemented OpenService() and CloseService() methods from DocumentConverterServiceClient sample code
  • PDF files containing tabular data for testing
  • Write permissions for the target output folder

Input requirements

  • PDF files containing actual tabular data (not just visual table layouts)
  • Files that are not password-protected or corrupted
  • Tables should be reasonably structured for optimal extraction results

Output format

  • JSON format: Structured data with table metadata and cell content

Sample code

/// <summary>
/// Extract tabular data from a PDF.
/// </summary>
/// <param name="ServiceURL">URL endpoint for the PDF Converter service.</param>
/// <param name="sourceFileName">Source filename.</param>
/// <param name="targetFolder">Target folder to receive the output file.</param>
/// <param name="outputFileType">JSON only currently</param>
/// <param name="languages">List of languages.</param>
static void TestTableExtract(string ServiceURL, string sourceFileName, string targetFolder, string outputFileType, string languages = "eng")
{
Console.WriteLine($"Extracting attachments from {sourceFileName}");
DocumentConverterServiceClient client = null;
// Create an `OpenOptions` instance with minimum properties needed for file identification.
OpenOptions openOptions = new OpenOptions();
openOptions.FileExtension = Path.GetExtension(sourceFileName);
openOptions.OriginalFileName = Path.GetFileName(sourceFileName);
// Create a `TableExtractionSettings` object.
TableExtractionSettings settings = new TableExtractionSettings();
settings.DPI = "300";
settings.SeparateTables = BooleanEnum.True;
settings.EnableOrientationDetection = BooleanEnum.True;
settings.EnableSkewDetection = BooleanEnum.True;
settings.RenderFormFields = BooleanEnum.True;
settings.OutputFileType = outputFileType;
settings.OCRLanguage = languages;
try
{
// Determine the source file and read it into a byte array.
byte[] sourceFile = File.ReadAllBytes(sourceFileName);
// Open the service and configure the bindings.
client = OpenService(ServiceURL);
// Carry out the conversion.
BatchResult result = client.ExtractTables(sourceFile, openOptions, settings);
if(result != null)
{
// Create the target folder if it does not exist.
if (!Directory.Exists(targetFolder))
{
Directory.CreateDirectory(targetFolder);
}
Console.WriteLine($"Output to: {targetFolder}");
// Get the filename.
string filename = result.FileName;
Console.WriteLine(filename);
// Write the result to a file.
File.WriteAllBytes(Path.Combine(targetFolder, filename), result.File);
}
else
{
Console.WriteLine("No result returned");
}
}
finally
{
if (client != null)
{
CloseService(client);
}
}
}

Troubleshooting

Service connection error: Cannot connect to DCS

  • Ensure DCS is running and accessible
  • Verify the service URL in your code matches your DCS installation
  • Check that no firewall is blocking the connection

No tables extracted: Empty result or no output file

  • Verify that the PDF contains actual tabular data, not just visual table layouts
  • Check that the OCR language setting matches the document language
  • Ensure the DPI setting is appropriate for your document quality (try 300 or higher)
  • Enable orientation and skew detection for scanned documents

License error: Table extraction not available

  • Verify that your DCS license includes table extraction functionality
  • Check that the license hasn’t expired
  • Ensure the service is licensed and activated

File access error: Permission denied

  • Verify that the application has read access to the source PDF file
  • Check that the target folder has write permissions
  • Ensure the PDF file isn’t locked by other applications

Poor extraction quality: Incomplete or inaccurate table data

  • Increase the DPI setting for higher quality extraction (try 600 DPI for complex tables)
  • Enable orientation detection if tables are rotated
  • Enable skew detection for scanned documents
  • Set the appropriate OCR language for non-English documents
  • Consider using SeparateTables = BooleanEnum.False for complex multi-column layouts

Large file processing: Slow performance or timeouts

  • For large PDF files, consider processing individual pages
  • Increase timeout values for the service client if processing large files
  • Monitor memory usage when processing multiple large files

What’s next

Now that you can extract tables from PDFs with C#, explore these related document processing capabilities: