Convert HTML to Word in C#

GdPicture.NET SDK includes the ability to convert any supported file type into Word, Excel, or PowerPoint. This technology applies a unique hybrid adaptive approach that includes heuristics, mathematics, and machine learning.

Nutrient SDKs are deployed in some of the world’s most popular applications, such as those made by Autodesk, Disney, UBS, Dropbox, IBM, and Lufthansa.

Key Capabilities

  • 100+ input file types — PDF, HTML, images, more

  • Convert to MS Office — Word, Excel, or PowerPoint

  • Works offline — Without internet access

  • Add to any application — Web, desktop, and server

  • Merge to Office — Merge multiple files into an Office file

  • Comprehensive PDF-to-Office SDK — For seamless conversion of PDF files to Word, Excel, and PowerPoint

Guides for Conversion to Office

Convert from PDF to Word
How to convert to Word (DOCX) from PDF

Convert from PDF to Excel
How to convert to Excel (XLSX) from PDF

Convert from PDF to PowerPoint
How to convert to PowerPoint (PPTX) from PDF

Convert from HTML to Word
How to convert to Word (DOCX) from HTML

Convert from RTF to Word
How to convert to Word (DOCX) from RTF

Convert from Any File to MS Office
How to convert to Word, Excel, or PowerPoint from any supported file type

Our PDF-to-Office SDK ensures high-quality conversion from PDF to Word, Excel, and PowerPoint.

100+ Supported Input File Types

  • MS Office (Word, Excel, PowerPoint)

  • PDF, PDF/A

  • HTML, MHT, MHTML

  • Email (MSG, EML)

  • Images (raster and vector)

  • Text (TXT and RTF) and OpenDocument (ODT)

  • CAD (DXF)

  • RAW Camera Image Formats (3FR, ARW, BAY, etc.)

For more information, refer to the full list of supported file types.

Free Trial

Start your free trial for unlimited access and expert support.

GdPicture.NET SDK includes the ability to convert any supported file type into Word.

To save a PDF to a Word document (DOCX), use the SaveAsDOCX method method of the GdPictureDocumentConverter class. It uses the following parameter:

  • Stream, or the overload FilePath — A stream object where the current document is saved as a DOCX file. This stream object must be initialized before it can be sent into this method, and it should stay open for subsequent use. If the output stream isn’t open for both reading and writing, the method will fail, returning the GdPictureStatus.InvalidParameter status, which is the file path where the converted file will be saved. If the specified file already exists, it’ll be overwritten. You have to specify a full file path, including the file extension.

Warning

Note that the output stream should be open for both reading and writing and closed/disposed of by the user once processing is complete using the CloseDocument method.

How to Convert PDF to DOCX

  1. Create a GdPictureDocumentConverter object.

  2. Load the source document by passing its path to the LoadFromFile method. This method accepts all supported file formats. However, only PDF will return a high-quality DOCX. If the source document isn’t a PDF, saveAsDOCX will return a DOCX, with each page containing a bitmap image representing the input document. If the source document isn’t a PDF, files can be converted to PDF with GdPictureDocumentConverter.SaveAsPDF and then passed to the saveAsDOCX method. Recommended: Specify the source document format with a member of the DocumentFormat enumeration.

  3. Save the PDF file as a DOCX using SaveAsDOCX.

Warning

If you use SaveAsDOCX after loading a file that isn’t a PDF, the method will create a DOCX containing the original document as an image. Instead, for the best results, ensure the input document is a PDF.

The following example converts and saves a PDF document to a DOCX file (it can also be saved as a stream):

using GdPictureDocumentConverter converter = new();
GdPictureStatus status = converter.LoadFromFile("input.pdf");
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}
             
status = converter.SaveAsDOCX("output.docx");
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}
            
Console.WriteLine("The input document has been converted to a docx file");
Using gdpictureDocumentConverter As New GdPictureDocumentConverter()
    Dim status As GdPictureStatus = gdpictureDocumentConverter.LoadFromFile("input.pdf", GdPicture14.DocumentFormat.DocumentFormatPDF)
    If status = GdPictureStatus.OK Then
        gdpictureDocumentConverter.DocxImageQuality = 80
        status = gdpictureDocumentConverter.SaveAsDOCX("output.docx")
        If status = GdPictureStatus.OK Then
            MessageBox.Show("The file has been saved successfully.", "GdPicture")
        Else
            MessageBox.Show("The file has failed to save. Status: " + status.ToString(), "GdPicture")
        End If
    Else
        MessageBox.Show("The file has failed to load. Status: " + status.ToString(), "GdPicture")
    End If
End Using
See Also

Related Topics

GdPicture.NET’s table extraction engine is a native SDK that enables you to recognize tables in an unstructured document or image, parse the information, and export the tables to an external destination like a spreadsheet. It can detect and extract bordered, semi-bordered, and borderless tables in images, scanned PDFs, and digitally born PDFs. As a native SDK, it can be deployed on-premises or embedded in your application, and it works offline, without internet access.

Try for Free Launch Demo

There are two possible approaches to converting PDFs to Excel with GdPicture.NET:

  1. Convert all contents in a PDF document to Excel.

  2. Recognize and extract only the tables present in a document to Excel

Both of these options are explained below.

Converting the Entire PDF Document to Excel

To save all contents of a PDF document to an Excel spreadsheet (XLSX), use the SaveAsXLSX method method of the GdPictureDocumentConverter class. It uses the following parameter:

  • Stream, or the overload FilePath — A stream object where the current document is saved as an XLSX file. This stream object must be initialized before it can be sent into this method, and it should stay open for subsequent use. If the output stream isn’t open for both reading and writing, the method will fail, returning the GdPictureStatus.InvalidParameter status, which is the file path where the converted file will be saved. If the specified file already exists, it’ll be overwritten. You have to specify a full file path, including the file extension.

Warning

Note that the output stream should be open for both reading and writing and closed/disposed of by the user once processing is complete using the CloseDocument method.

Here’s how to convert PDF to XLSX:

  1. Create a GdPictureDocumentConverter object.

  2. Load the source document by passing its path to the LoadFromFile method. This method accepts all supported file formats. However, only a PDF file can be converted into an XLSX (other input file formats will return GdPictureStatus.NotImplemented). If the source document isn’t a PDF, it can be converted to PDF first with GdPictureDocumentConverter.SaveAsPDF. Recommended: Specify the source document format with a member of the DocumentFormat enumeration.

  3. Save the PDF file as an XLSX using SaveAsXLSX.

The following example converts and saves all content in a PDF document to an XLSX file (it can also be saved as a stream):

using GdPictureDocumentConverter converter = new();
             
var status = converter.LoadFromFile("input.pdf");
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}
             
status = converter.SaveAsXLSX("output.xlsx");
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}
            
Console.WriteLine("The input document has been converted to a xlsx file");
See Also

Related Topics

Recognizing and Extracting Table Data from a PDF to an Excel Spreadsheet

To identify all bordered, semi-bordered, and borderless tables in a PDF and then extract only the tables to an Excel spreadsheet, follow these steps:

Warning

The following approach uses the gdpictureOCR.SaveAsXLSX method, which will only extract the tables present in the document.

To read and extract table data from a PDF document to an Excel spreadsheet, follow these steps:

  1. Create a GdPictureOCR object and a GdPicturePDF object.

  2. Select the source document by passing its path to the LoadFromFile method of the GdPicturePDF object.

  3. Select the page from which to extract the table data with the SelectPage method of the GdPicturePDF object.

  4. Render the selected page to a 300 dots-per-inch (DPI) image with the RenderPageToGdPictureImageEx method of the GdPicturePDF object.

  5. Pass the image to the GdPictureOCR object with the SetImage method.

  6. Configure the table extraction process with the GdPictureOCR object in the following way:

    • Set the path to the OCR resource folder with the ResourceFolder property. The default language resources are located in GdPicture.NET 14\Redist\OCR. For more information on adding language resources, see the language support guide.

    • With the AddLanguage method, add the language resources that GdPicture.NET uses to recognize text in the image. This method takes a member of the OCRLanguage enumeration.

    For more optional configuration parameters, see the GdPictureOCR class.

  7. Run the table extraction process with the RunOCR method of the GdPictureOCR object, and save the result ID in a list.

  8. Create a GdPictureOCR.SpreadsheetOptions object and configure the output spreadsheet. By default, tables from the same OCR result are saved in the same sheet. To save each table in a different sheet, set the SeparateTables property of the GdPictureOCR.SpreadsheetOptions object to true. For more optional configuration parameters, see the GdPictureOCR.SpreadsheetOptions class.

  9. Save the output in an Excel spreadsheet with the SaveAsXLSX method of the GdPictureOCR object. This method takes the following parameters:

    • The list containing the OCR result ID.

    • The path to the output file.

    • The GdPictureOCR.SpreadsheetOptions object.

  10. Release unnecessary resources.

The example below extracts table data from the first page of a document and saves the output in an Excel spreadsheet:

using GdPictureOCR gdpictureOCR = new GdPictureOCR();
using GdPicturePDF gdpicturePDF = new GdPicturePDF();
// Load the source document.
gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");
// Select the first page.
gdpicturePDF.SelectPage(1);
// Render the first page to a 300 DPI image.
int imageId = gdpicturePDF.RenderPageToGdPictureImageEx(300, true);
// Pass the image to the `GdPictureOCR` object.
gdpictureOCR.SetImage(imageId);
// Configure the table extraction process.
gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR";
gdpictureOCR.AddLanguage(OCRLanguage.English);
// Run the table extraction process and save the result ID in a list.
string result = gdpictureOCR.RunOCR();
List<string> resultsList = new List<string>() { result };
// Configure the output spreadsheet.
GdPictureOCR.SpreadsheetOptions spreadsheetOptions = new GdPictureOCR.SpreadsheetOptions()
    {
        SeparateTables = true
    };
// Save the output in an Excel spreadsheet.
gdpictureOCR.SaveAsXLSX(resultsList, @"C:\temp\output.xlsx", spreadsheetOptions);
// Release unnecessary resources.
gdpictureOCR.ReleaseOCRResults();
GdPictureDocumentUtilities.DisposeImage(imageId);
gdpicturePDF.CloseDocument();
Using gdpictureOCR As GdPictureOCR = New GdPictureOCR()
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
    ' Load the source document.
    gdpicturePDF.LoadFromFile("C:\temp\source.pdf")
    ' Select the first page.
    gdpicturePDF.SelectPage(1)
    ' Render the first page to a 300 DPI image.
    Dim imageId As Integer = gdpicturePDF.RenderPageToGdPictureImageEx(300, True)
    ' Pass the image to the `GdPictureOCR` object.
    gdpictureOCR.SetImage(imageId)
    ' Configure the table extraction process.
    gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR"
    gdpictureOCR.AddLanguage(OCRLanguage.English)
    ' Run the table extraction process and save the result ID in a list.
    Dim result As String = gdpictureOCR.RunOCR()
    Dim resultsList As List(Of String) = New List(Of String)()
    resultsList.Add(result)
    ' Configure the output spreadsheet.
    Dim spreadsheetOptions As gdpictureOCR.SpreadsheetOptions = New GdPictureOCR.SpreadsheetOptions() With {
        .SeparateTables = True
    }
    ' Save the output in an Excel spreadsheet.
    gdpictureOCR.SaveAsXLSX(resultsList, "C:\temp\output.xlsx", spreadsheetOptions)
    ' Release unnecessary resources.
    gdpictureOCR.ReleaseOCRResults()
    GdPictureDocumentUtilities.DisposeImage(imageId)
    gdpicturePDF.CloseDocument()
End Using
End Using
Used Methods and Properties

Related Topics

For more information on extracting table data from PDFs, refer to the table extraction guide.

GdPicture.NET SDK includes the ability to convert any supported file type into PowerPoint.

To save a PDF to a PowerPoint presentation (PPTX), use the SaveAsPPTX method of the GdPictureDocumentConverter class. It uses the following parameter:

  • Stream, or the overload FilePath — A stream object where the current document is saved as a PPTX file. This stream object must be initialized before it can be sent into this method, and it should stay open for subsequent use. If the output stream isn’t open for both reading and writing, the method will fail, returning the GdPictureStatus.InvalidParameter status, which is the file path where the converted file will be saved. If the specified file already exists, it’ll be overwritten. You have to specify a full file path, including the file extension.

Warning

Note that the output stream should be open for both reading and writing and closed/disposed of by the user once processing is complete using the CloseDocument method.

How to Convert PDF to PPTX

  1. Create a GdPictureDocumentConverter object.

  2. Load the source document by passing its path to the LoadFromFile method. This method accepts all supported file formats. However, only a PDF file can be converted into a PPTX (other input file formats will return GdPictureStatus.NotImplemented). If the source document isn’t a PDF, it can be converted to PDF first with GdPictureDocumentConverter.SaveAsPDF. Recommended: Specify the source document format with a member of the DocumentFormat enumeration.

  3. Save the PDF file as a PPTX using SaveAsPPTX.

The following example converts and saves a PDF document to a PPTX file (it can also be saved as a stream):

using GdPictureDocumentConverter converter = new();
GdPictureStatus status = converter.LoadFromFile("input.pdf");
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}
             
status = converter.SaveAsPPTX("output.pptx");
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}
            
Console.WriteLine("The input document has been converted to a pptx file");
See Also

Related Topics

GdPicture.NET SDK includes the ability to convert any supported file type into Word.

To save an HTML file to a Word document (DOCX), first use the SaveAsPDF method of the GdPictureDocumentConverter class to convert it to PDF. Then use the SaveAsDOCX method to convert it to a DOCX.

The SaveAsPDF method uses the following parameters:

  • Stream, or the overload FilePath — A stream object where the current document is saved as a DOCX file. This stream object must be initialized before it can be sent into this method, and it should stay open for subsequent use. If the output stream isn’t open for both reading and writing, the method will fail, returning the GdPictureStatus.InvalidParameter status, which is the file path where the converted file will be saved. If the specified file already exists, it’ll be overwritten. You have to specify a full file path, including the file extension.

  • Conformance — A member of the PdfConformance enumeration. This specifies the required conformance to the PDF or PDF/A standard of the saved PDF document. You can use the value of PdfConformance.PDF to save the file as a common PDF document.

The SaveAsDOCX method uses the following parameters:

  • Stream, or the overload FilePath

Warning

Note that the output stream should be open for both reading and writing and closed/disposed of by the user once processing is complete using the CloseDocument method.

How to Convert HTML to DOCX

  1. Create a GdPictureDocumentConverter object.

  2. Convert the source HTML file to PDF with GdPictureDocumentConverter.SaveAsPDF(Stream, PdfConformance). Recommended: Specify the source document format with a member of the DocumentFormat enumeration.

  3. Load the newly generated PDF file by passing its path to the LoadFromFile method (this method only supports PDF documents).

  4. Save the PDF file as a DOCX using SaveAsDOCX.

The following example converts and saves an HTML document to a DOCX file (it can also be saved as a stream):

using GdPictureDocumentConverter converter = new();

// Set the text and document properties to be used for the resulting file.
converter.HtmlPageHeight = 842; // A3 page size
converter.HtmlPageWidth = 1191;  // A3 page size
converter.HtmlPageMarginTop = 10;
converter.HtmlPageMarginBottom = 10;
converter.HtmlPageMarginLeft = 10;
converter.HtmlPageMarginRight = 10;

using Stream inputStream = File.Open(@"input.html", System.IO.FileMode.Open);
using Stream outputStream = new MemoryStream();

GdPictureStatus status = converter.ConvertToPDF(inputStream, GdPicture14.DocumentFormat.DocumentFormatHTML, outputStream, PdfConformance.PDF1_5);
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}

status = converter.LoadFromStream(outputStream);
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}

status = converter.SaveAsDOCX("output.docx");
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}

Console.WriteLine("The input document has been converted to a docx file");
See Also

Related Topics

Optional HTML Configuration Properties

Optionally, configure the conversion with the following properties of the GdPictureDocumentConverter object:

Optional PDF Configuration Properties

Optionally, configure the conversion with the following properties of the GdPictureDocumentConverter object:

  • PdfBitonalImageCompression is a member of the PdfCompression enumeration that specifies the compression scheme used for bitonal images in the output PDF file.

  • PdfColorImageCompression is a member of the PdfCompression enumeration that specifies the compression scheme used for color images in the output PDF file.

  • PdfEnableColorDetection is a Boolean value that specifies whether to use automatic color detection during the conversion that preserves image quality and reduces the output file size.

  • PdfEnableLinearization is a Boolean value that specifies whether to linearize the output PDF to enable Fast Web View mode.

  • PdfImageQuality is an integer from 0 to 100 that specifies the image quality in the output PDF file.

The example below creates a PDF document from an HTML file with a custom configuration:

using GdPictureDocumentConverter gdpictureDocumentConverter = new GdPictureDocumentConverter();
// Load the source document.
gdpictureDocumentConverter.LoadFromFile(@"C:\temp\source.html", GdPicture14.DocumentFormat.DocumentFormatHTML);
// Configure the conversion.
gdpictureDocumentConverter.PdfColorImageCompression = PdfCompression.PdfCompressionJPEG;
gdpictureDocumentConverter.PdfImageQuality = 50;
// Save the output in a new PDF document.
gdpictureDocumentConverter.SaveAsPDF(@"C:\temp\output.pdf");
Using gdpictureDocumentConverter As GdPictureDocumentConverter = New GdPictureDocumentConverter()
    ' Load the source document.
    gdpictureDocumentConverter.LoadFromFile("C:\temp\source.html", GdPicture14.DocumentFormat.DocumentFormatHTML);
    ' Configure the conversion.
    gdpictureDocumentConverter.PdfColorImageCompression = PdfCompression.PdfCompressionJPEG
    gdpictureDocumentConverter.PdfImageQuality = 50
    ' Save the output in a new PDF document.
    gdpictureDocumentConverter.SaveAsPDF("C:\temp\output.pdf")
End Using
Used Methods and Properties

Related Topics

GdPicture.NET SDK includes the ability to convert any supported file type into Word.

To save an RTF document to a Word document (DOCX), first use the SaveAsPDF method of the GdPictureDocumentConverter class to convert it to PDF. Then use the SaveAsDOCX method to convert it to a DOCX.

The SaveAsPDF method uses the following parameter:

  • Stream, or the overload FilePath — A stream object where the current document is saved as a DOCX file. This stream object must be initialized before it can be sent into this method, and it should stay open for subsequent use. If the output stream isn’t open for both reading and writing, the method will fail, returning the GdPictureStatus.InvalidParameter status, which is the file path where the converted file will be saved. If the specified file already exists, it’ll be overwritten. You have to specify a full file path, including the file extension.

The SaveAsDOCX method uses the following parameter:

  • Stream, or the overload FilePath

Warning

Note that the output stream should be open for both reading and writing and closed/disposed of by the user once processing is complete using the CloseDocument method.

How to Convert RTF to DOCX

  1. Create a GdPictureDocumentConverter object.

  2. Convert the source RTF file to PDF with GdPictureDocumentConverter.SaveAsPDF(Stream, PdfConformance). Recommended: Specify the source document format with a member of the DocumentFormat enumeration.

  3. Load the newly generated PDF file by passing its path to the LoadFromFile method method (this method only supports PDF documents).

  4. Save the PDF file as a DOCX using SaveAsDOCX.

The following example converts and saves a RTF document to a DOCX file (it can also be saved as a stream):

using GdPictureDocumentConverter converter = new();

using Stream inputStream = File.Open(@"input.rtf", System.IO.FileMode.Open);
using Stream outputStream = new MemoryStream();

GdPictureStatus status = converter.ConvertToPDF(inputStream, GdPicture14.DocumentFormat.DocumentFormatRTF, outputStream, PdfConformance.PDF1_5);
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}

status = converter.LoadFromStream(outputStream);
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}

status = converter.SaveAsDOCX("output.docx");
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}

Console.WriteLine("The input document has been converted to a docx file");
See Also

Related Topics

Optional PDF Configuration Properties

Optionally, configure the conversion with the following properties of the GdPictureDocumentConverter object:

  • PdfBitonalImageCompression is a member of the PdfCompression enumeration that specifies the compression scheme used for bitonal images in the output PDF file.

  • PdfColorImageCompression is a member of the PdfCompression enumeration that specifies the compression scheme used for color images in the output PDF file.

  • PdfEnableColorDetection is a Boolean value that specifies whether to use automatic color detection during the conversion that preserves image quality and reduces the output file size.

  • PdfEnableLinearization is a Boolean value that specifies whether to linearize the output PDF to enable Fast Web View mode.

  • PdfImageQuality is an integer from 0 to 100 that specifies the image quality in the output PDF file.

The example below creates a PDF document from an RTF file with a custom configuration:

using GdPictureDocumentConverter gdpictureDocumentConverter = new GdPictureDocumentConverter();
// Load the source document.
gdpictureDocumentConverter.LoadFromFile(@"C:\temp\source.rtf", GdPicture14.DocumentFormat.DocumentFormatRTF);
// Configure the conversion.
gdpictureDocumentConverter.PdfColorImageCompression = PdfCompression.PdfCompressionJPEG;
gdpictureDocumentConverter.PdfImageQuality = 50;
// Save the output in a new PDF document.
gdpictureDocumentConverter.SaveAsPDF(@"C:\temp\output.pdf");
Using gdpictureDocumentConverter As GdPictureDocumentConverter = New GdPictureDocumentConverter()
    ' Load the source document.
    gdpictureDocumentConverter.LoadFromFile("C:\temp\source.rtf", GdPicture14.DocumentFormat.DocumentFormatRTF);
    ' Configure the conversion.
    gdpictureDocumentConverter.PdfColorImageCompression = PdfCompression.PdfCompressionJPEG
    gdpictureDocumentConverter.PdfImageQuality = 50
    ' Save the output in a new PDF document.
    gdpictureDocumentConverter.SaveAsPDF("C:\temp\output.pdf")
End Using
Used Methods and Properties

Related Topics

GdPicture.NET supports converting 100+ file types to Word, Excel, or PowerPoint.

100+ Supported Input File Types

  • MS Office (Word, Excel, PowerPoint)

  • PDF, PDF/A

  • HTML, MHT, MHTML

  • Email (MSG, EML)

  • Images (raster and vector)

  • Text (TXT and RTF) and OpenDocument (ODT)

  • CAD (DXF)

  • RAW Camera Image Formats (3FR, ARW, BAY, etc.)

For more information, refer to the full list of supported file types.

Converting PDF to MS Office

To convert PDF files to MS Office, refer to our separate PDF-to-Word, PDF-to-Excel, and PDF-to-PowerPoint guides.

Converting Other File Types to MS Office

To save a file to Word, Excel, or PowerPoint format, first use the SaveAsPDF method of the GdPictureDocumentConverter class to convert it to PDF. Then use the SaveAsDOCX method to convert it to a DOCX, the SaveAsXLSX method to convert it to XLSX, or the SaveAsPPTX method to convert it to PPTX.

The SaveAsPDF method uses the following parameters:

  • Stream, or the overload FilePath — A stream object where the current document is saved to as a DOCX file. This stream object must be initialized before it can be sent into this method, and it should stay open for subsequent use. If the output stream isn’t open for both reading and writing, the method will fail, returning the GdPictureStatus.InvalidParameter status, which is the file path where the converted file will be saved. If the specified file already exists, it’ll be overwritten. You have to specify a full file path, including the file extension.

  • Conformance — A member of the PdfConformance enumeration. This specifies the required conformance to the PDF or PDF/A standard of the saved PDF document. You can use the value of PdfConformance.PDF to save the file as a common PDF document.

The SaveAsDOCX, SaveAsXLSX, and SaveAsPPTX methods use the following parameter:

  • Stream, or the overload FilePath

Warning

Note that the output stream should be open for both reading and writing and closed/disposed of by the user once processing is complete using the CloseDocument method.

How to Convert Any File to MS Office

  1. Create a GdPictureDocumentConverter object.

  2. Convert the source file to PDF with GdPictureDocumentConverter.SaveAsPDF(Stream, PdfConformance). Recommended: Specify the source document format with a member of the DocumentFormat enumeration.

  3. Load the newly generated PDF file by passing its path to the LoadFromFile method (this method only supports PDF documents).

  4. Save the PDF file as a DOCX using SaveAsDOCX, as an XLSX using SaveAsXLSX, or as a PPTX using SaveAsPPTX.

The following example converts and saves an RTF document to a DOCX file (it can also be saved as a stream):

using GdPictureDocumentConverter converter = new();

using Stream inputStream = File.Open(@"input.rtf", System.IO.FileMode.Open);
using Stream outputStream = new MemoryStream();

GdPictureStatus status = converter.ConvertToPDF(inputStream, GdPicture14.DocumentFormat.DocumentFormatRTF, outputStream, PdfConformance.PDF1_5);
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}

status = converter.LoadFromStream(outputStream);
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}

status = converter.SaveAsDOCX("output.docx");
if (status != GdPictureStatus.OK)
{
    throw new Exception(status.ToString());
}

Console.WriteLine("The input document has been converted to a docx file");
See Also

Related Topics

Optional File Type Configuration Properties

The following file types have optional configuration properties for greater precision:

Optional PDF Configuration Properties

Optionally, configure the conversion with the following properties of the GdPictureDocumentConverter object:

  • PdfBitonalImageCompression is a member of the PdfCompression enumeration that specifies the compression scheme used for bitonal images in the output PDF file.

  • PdfColorImageCompression is a member of the PdfCompression enumeration that specifies the compression scheme used for color images in the output PDF file.

  • PdfEnableColorDetection is a Boolean value that specifies whether to use automatic color detection during the conversion that preserves image quality and reduces the output file size.

  • PdfEnableLinearization is a Boolean value that specifies whether to linearize the output PDF to enable Fast Web View mode.

  • PdfImageQuality is an integer from 0 to 100 that specifies the image quality in the output PDF file.

The example below creates a PDF document from an RTF file with a custom configuration:

using GdPictureDocumentConverter gdpictureDocumentConverter = new GdPictureDocumentConverter();
// Load the source document.
gdpictureDocumentConverter.LoadFromFile(@"C:\temp\source.rtf", GdPicture14.DocumentFormat.DocumentFormatRTF);
// Configure the conversion.
gdpictureDocumentConverter.PdfColorImageCompression = PdfCompression.PdfCompressionJPEG;
gdpictureDocumentConverter.PdfImageQuality = 50;
// Save the output in a new PDF document.
gdpictureDocumentConverter.SaveAsPDF(@"C:\temp\output.pdf");
Using gdpictureDocumentConverter As GdPictureDocumentConverter = New GdPictureDocumentConverter()
    ' Load the source document.
    gdpictureDocumentConverter.LoadFromFile("C:\temp\source.rtf", GdPicture14.DocumentFormat.DocumentFormatRTF);
    ' Configure the conversion.
    gdpictureDocumentConverter.PdfColorImageCompression = PdfCompression.PdfCompressionJPEG
    gdpictureDocumentConverter.PdfImageQuality = 50
    ' Save the output in a new PDF document.
    gdpictureDocumentConverter.SaveAsPDF("C:\temp\output.pdf")
End Using
Used Methods and Properties

Related Topics