Extract file attachments from PDFs in C#
This guide explains how to extract files from PDF documents.
PDF documents can contain files in the following ways:
- A file is embedded in the PDF document.
- A file is added to the PDF document as a file attachment annotation.
The method for extracting the file is different in each case.
Extracting files embedded in a PDF
To extract files embedded in a PDF, follow the steps below:
- Create a
GdPicturePDF
object. - Select the source document by passing its path to the
LoadFromFile
method. - Determine the number of embedded files with the
GetEmbeddedFileCount
method and loop through them. - Determine the file name by passing the index of the file to the
GetEmbeddedFileName
method. - Create an empty byte array where you’ll save the file data.
- Extract the file by passing the index of the file and the empty byte object to the
ExtractEmbeddedFile
method. - Write the file using the standard
System.IO.Stream
class.
The example below extracts all embedded files from a PDF document:
using GdPicturePDF gdpicturePDF = new GdPicturePDF();// Select the source document.gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");// Determine the number of embedded files and loop through them.int embeddedFileCount = gdpicturePDF.GetEmbeddedFileCount();for (int fileIndex = 0; fileIndex < embeddedFileCount; fileIndex++){ // Determine the file name. string fileName = gdpicturePDF.GetEmbeddedFileName(fileIndex); // Create an empty byte array. byte[] fileData = null; // Extract the file. gdpicturePDF.ExtractEmbeddedFile(fileIndex, ref fileData); // Write the file. using System.IO.Stream file = File.OpenWrite(@"C:\temp\" + fileName); file.Write(fileData, 0, fileData.Length);}
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() ' Select the source document. gdpicturePDF.LoadFromFile("C:\temp\source.pdf") ' Determine the number of embedded files and loop through them. Dim embeddedFileCount As Integer = gdpicturePDF.GetEmbeddedFileCount()
For fileIndex = 0 To embeddedFileCount - 1 ' Determine the file name. Dim fileName As String = gdpicturePDF.GetEmbeddedFileName(fileIndex) ' Create an empty byte array. Dim fileData As Byte() = Nothing ' Extract the file. gdpicturePDF.ExtractEmbeddedFile(fileIndex, fileData) ' Write the file. Dim file As Stream = File.OpenWrite("C:\temp\" & fileName) file.Write(fileData, 0, fileData.Length) NextEnd Using
Related topics
Extracting files from file attachment annotations
To extract files from file attachment annotations, follow the steps below:
- Create a
GdPicturePDF
object. - Select the source document by passing its path to the
LoadFromFile
method. - Determine the number of pages with the
GetPageCount
method and loop through them. - Determine the number of annotations on the page with the
GetAnnotationCount
method and loop through them. - Determine the annotation subtype passing the index of the annotation to the
GetAnnotationSubType
method. - If the annotation is a file attachment annotation, determine the file name by passing the index of the annotation to the
GetFileAttachmentAnnotFileName
method. - Create an empty byte array where you’ll save the file data.
- Extract the file by passing the index of the annotation and the empty byte object to the
GetFileAttachmentAnnotEmbeddedFile
method. - Write the file using the standard
System.IO.Stream
class.
The example below extracts all files added to a PDF document as file attachment annotations:
using GdPicturePDF gdpicturePDF = new GdPicturePDF();// Select the source document.gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");// Determine the number of pages and loop through them.int pageCount = gdpicturePDF.GetPageCount();for (int page = 1; page <= pageCount; page++){ gdpicturePDF.SelectPage(page); // Determine the number of annotations on the page and loop through them. int annotationCount = gdpicturePDF.GetAnnotationCount(); for (int annotationIndex = 0; annotationIndex < annotationCount; annotationIndex++) { // Determine the annotation subtype. string annotationSubtype = gdpicturePDF.GetAnnotationSubType(annotationIndex); if (annotationSubtype.Equals("FileAttachment")) { // Determine the file name. string fileName = gdpicturePDF.GetFileAttachmentAnnotFileName(annotationIndex); // Create an empty byte array. byte[] fileData = null; // Extract the file. gdpicturePDF.GetFileAttachmentAnnotEmbeddedFile(annotationIndex, ref fileData); // Write the file. using System.IO.Stream file = File.OpenWrite(@"C:\temp\" + fileName); file.Write(fileData, 0, fileData.Length); } }}
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() ' Select the source document. gdpicturePDF.LoadFromFile("C:\temp\source.pdf") ' Determine the number of pages and loop through them. Dim pageCount As Integer = gdpicturePDF.GetPageCount() For page = 1 To pageCount gdpicturePDF.SelectPage(page) ' Determine the number of annotations on the page and loop through them. Dim annotationCount As Integer = gdpicturePDF.GetAnnotationCount() For annotationIndex = 0 To annotationCount - 1 ' Determine the annotation subtype. Dim annotationSubtype As String = gdpicturePDF.GetAnnotationSubType(annotationIndex) If annotationSubtype.Equals("FileAttachment") Then ' Determine the file name. Dim fileName As String = gdpicturePDF.GetFileAttachmentAnnotFileName(annotationIndex) ' Create an empty byte array. Dim fileData As Byte() = Nothing ' Extract the file. gdpicturePDF.GetFileAttachmentAnnotEmbeddedFile(annotationIndex, fileData) ' Write the file. Dim file As Stream = File.OpenWrite("C:\temp\" & fileName) file.Write(fileData, 0, fileData.Length) End If Next NextEnd Using
Used methods
Related topics