Image and PDF compression in C#

Nutrient .NET SDK (formerly GdPicture.NET) enables you to dramatically reduce the file size of PDF documents, with a focus on font optimization, data compression, and image analysis.

PDF optimization involves serializing several compression algorithms to go beyond the limitations of some compression schemes. It also involves removing unwanted or unused objects in a PDF.

To compress a PDF document, follow these steps:

  1. Create a GdPicturePDFReducer object.
  2. Configure the metadata of the resulting PDF document with the following properties of the PDFReducerConfiguration object:
Property nameDescription
AuthorSpecifies the author of the resulting PDF document.
ProducerSpecifies the producer of the resulting PDF document.
ProducerNameSpecifies the name of the producer of the resulting PDF document.
TitleSpecifies the title of the resulting PDF document.
  1. Configure the compression process with the following properties of the PDFReducerConfiguration object:
Property nameDescription
DownscaleImagesSpecifies whether to downscale images. The default value is true.
DownscaleResolutionSpecifies the resolution to downscale images. The default value is 150.
DownscaleResolutionMRCSpecifies the resolution for downscaling the background layer by the mixed raster content (MRC) engine. The default value is 100.
EnableCharRepairSpecifies whether to perform character repair during bitonal conversion. The default value is false.
EnableColorDetectionSpecifies whether to perform color detection on images. The default value is true.
EnableJBIG2Specifies whether to use the JBIG2 compression scheme to compress bitonal images. The default value is true.
EnableJPEG2000Specifies whether to use the JPEG2000 compression scheme to compress the images. The default value is true.
EnableMRCSpecifies whether to use MRC for compressing the content of the source PDF. The default value is false.
EnableParallelizationSpecifies whether to use multiple cores to speed up the process. Threads are dynamically allocated based on the real-time available CPU resources. The default value is true.
FastWebViewSpecifies whether to optimize the PDF for online distribution (linearized PDF). The default value is false.
ImageQualitySpecifies the quality of the compressed images. The default value is PDFReducerImageQuality.ImageQualityMedium.
JBIG2PMSThresholdSpecifies the threshold value for the JBIG2 encoder pattern matching and substitution between 0 and 1. Any number lower than 1 may lead to lossy compression. The default value is 0.75.
MaxBitmapPerPageSpecifies the maximum number of bitmap images per page.
OutputFormatA member of the PDFReducerPDFVersion enumeration that specifies the version and the conformance level of the output PDF document. The default value is PDFReducerPDFVersion.PdfVersion15.
PackDocumentSpecifies whether to pack the PDF to reduce its size. The default value is true.
PackFontsSpecifies whether to pack the PDF fonts to reduce their size. The default value is true.
PreserveSmoothingSpecifies whether the MRC engine preserves smoothing between different layers. The default value is true.
RecompressImagesSpecifies whether to recompress the images. The default value is true.
RemoveAnnotationsSpecifies whether to remove annotations. The default value is false.
RemoveBlankPagesSpecifies whether to remove blank pages. The default value is false.
RemoveBookmarksSpecifies whether to remove bookmarks. The default value is false.
RemoveEmbeddedFilesSpecifies whether to remove embedded files. The default value is false.
RemoveFormFieldsSpecifies whether to remove form fields. The default value is false.
RemoveHyperlinksSpecifies whether to remove hyperlinks. The default value is false.
RemoveJavaScriptSpecifies whether to remove JavaScript. The default value is false.
RemoveMetadataSpecifies whether to remove metadata. The default value is false.
RemovePagePieceInfoSpecifies whether to remove the page PieceInfo dictionary used to hold private application data. The default value is true.
RemovePageThumbnailsSpecifies whether to remove page thumbnails. The default value is false.
UnembedFontsSpecifies whether to remove embedded font data. The default value is false.
  1. Run the compression process with the ProcessDocument method. This method takes the path to the source and the output PDF files as its parameters.

General optimization of PDF documents

The example below focuses on general aspects of PDF optimization such as content removal and font optimization:

using GdPicturePDFReducer gdpicturePDFReducer = new GdPicturePDFReducer();
// Configure the metadata of the resulting PDF document.
gdpicturePDFReducer.PDFReducerConfiguration.Author = "Nutrient .NET PDF Reducer SDK";
gdpicturePDFReducer.PDFReducerConfiguration.Producer = "GdPicture.NET 14";
gdpicturePDFReducer.PDFReducerConfiguration.ProducerName = "Nutrient";
gdpicturePDFReducer.PDFReducerConfiguration.Title = "PDF Optimization";
// Specify the version and the conformance level of the output PDF document.
gdpicturePDFReducer.PDFReducerConfiguration.OutputFormat = PDFReducerPDFVersion.PdfVersionRetainExisting;
// Configure the compression process by removing document elements.
gdpicturePDFReducer.PDFReducerConfiguration.RemoveAnnotations = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveBlankPages = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveBookmarks = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveEmbeddedFiles = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveFormFields = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveHyperlinks = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveJavaScript = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveMetadata = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemovePageThumbnails = true;
// Optimize the output file size by packing fonts.
gdpicturePDFReducer.PDFReducerConfiguration.PackFonts = true;
// Optimize the output file size by packing the document.
gdpicturePDFReducer.PDFReducerConfiguration.PackDocument = true;
// Run the compression process.
gdpicturePDFReducer.ProcessDocument(@"C:\temp\source.pdf", @"C:\temp\output.pdf");

Recompressing images

Compress PDF documents by recompressing existing images in a file. For example, decreasing unnecessarily high resolutions can dramatically reduce the file size without affecting the viewing experience.

The example below shows how to recompress images:

using GdPicturePDFReducer gdpicturePDFReducer = new GdPicturePDFReducer();
// Configure the metadata of the resulting PDF document.
gdpicturePDFReducer.PDFReducerConfiguration.Author = "Nutrient .NET PDF Reducer SDK";
gdpicturePDFReducer.PDFReducerConfiguration.Producer = "GdPicture.NET 14";
gdpicturePDFReducer.PDFReducerConfiguration.ProducerName = "Nutrient";
gdpicturePDFReducer.PDFReducerConfiguration.Title = "Re-Compress Images";
// Specify the version and the conformance level of the output PDF document.
gdpicturePDFReducer.PDFReducerConfiguration.OutputFormat = PDFReducerPDFVersion.PdfVersionRetainExisting;
// Recompress images to obtain a better compression ratio.
gdpicturePDFReducer.PDFReducerConfiguration.RecompressImages = true;
gdpicturePDFReducer.PDFReducerConfiguration.ImageQuality = PDFReducerImageQuality.ImageQualityHigh;
// Reduce the image size by decreasing the image resolution.
gdpicturePDFReducer.PDFReducerConfiguration.DownscaleImages = true;
gdpicturePDFReducer.PDFReducerConfiguration.DownscaleResolution = 200;
// Run the compression process.
gdpicturePDFReducer.ProcessDocument(@"C:\temp\source.pdf", @"C:\temp\output.pdf");

Controlling image compression

The PDF specification allows for seven compression schemes, all of which can be used to compress images. For example, two popular compression schemes are the following:

  • JBIG2 for bitonal images (usually black and white).
  • JPEG2000 for 24-bit color and 8-bit grayscale images.

The example below uses both of these schemes to compress images in a PDF document:

using GdPicturePDFReducer gdpicturePDFReducer = new GdPicturePDFReducer();
// Configure the metadata of the resulting PDF document.
gdpicturePDFReducer.PDFReducerConfiguration.Author = "Nutrient .NET PDF Reducer SDK";
gdpicturePDFReducer.PDFReducerConfiguration.Producer = "GdPicture.NET 14";
gdpicturePDFReducer.PDFReducerConfiguration.ProducerName = "Nutrient";
gdpicturePDFReducer.PDFReducerConfiguration.Title = "Image Compression";
// Specify the version and the conformance level of the output PDF document.
gdpicturePDFReducer.PDFReducerConfiguration.OutputFormat = PDFReducerPDFVersion.PdfVersionRetainExisting;
// Enable automatic color detection.
gdpicturePDFReducer.PDFReducerConfiguration.EnableColorDetection = true;
// Repair characters.
gdpicturePDFReducer.PDFReducerConfiguration.EnableCharRepair = true;
// Control image compression.
gdpicturePDFReducer.PDFReducerConfiguration.EnableJPEG2000 = true;
gdpicturePDFReducer.PDFReducerConfiguration.EnableJBIG2 = true;
gdpicturePDFReducer.PDFReducerConfiguration.JBIG2PMSThreshold = 0.65f;
// Run the compression process.
gdpicturePDFReducer.ProcessDocument(@"C:\temp\source.pdf", @"C:\temp\output.pdf");