Convert HTML to Word in C#
Nutrient .NET SDK (formerly GdPicture.NET) can convert HTML to DOCX by first converting the HTML to PDF and then loading the PDF and saving it as DOCX.
To save an HTML file to a Word document (DOCX), first use the ConvertToPDF method of the GdPictureDocumentConverter class to convert it to PDF. Then use the SaveAsDOCX method to convert it to a DOCX.
The ConvertToPDF method uses the following parameters:
InputStream— A stream object containing the source document. This stream must be initialized before it can be sent into this method and should stay open for subsequent use.DocumentFormat— A member of theDocumentFormatenumeration. This specifies the source document format. UseDocumentFormat.DocumentFormatHTMLfor HTML input.OutputStream— A stream object where the generated PDF is written. This stream must be initialized before it can be sent into this method and should stay open for subsequent use. It must be open for writing.Conformance— A member of thePdfConformanceenumeration. This specifies the required conformance to the PDF or PDF/A standard of the saved PDF document. You can use the value ofPdfConformance.PDFto save the file as a common PDF document.
The SaveAsDOCX method uses the following parameters:
Stream, or the overloadFilePath— A stream or file path where the currently loaded PDF is saved as a DOCX file. When using a stream, it must be open for reading, writing, and seeking.
When you reuse the same stream for ConvertToPDF and LoadFromStream, it must be seekable; rewind it before loading. Dispose the streams when you’re done, and call the CloseDocument method to release the loaded document.
How to convert HTML to DOCX
- Create a
GdPictureDocumentConverterobject. - Convert the source HTML file to PDF with
GdPictureDocumentConverter.ConvertToPDF(Stream, DocumentFormat, Stream, PdfConformance). Recommended: Specify the source document format with a member of theDocumentFormatenumeration. - Rewind the PDF stream and load it with
LoadFromStreamusingDocumentFormat.DocumentFormatPDF. - Save the PDF file as a DOCX using
SaveAsDOCX.
The following example converts and saves an HTML document to a DOCX file (it can also be saved as a stream):
using GdPictureDocumentConverter converter = new();
// Set the text and document properties to be used for the resulting file.converter.HtmlPageHeight = 842; // A3 page size.converter.HtmlPageWidth = 1191; // A3 page size.converter.HtmlPageMarginTop = 10;converter.HtmlPageMarginBottom = 10;converter.HtmlPageMarginLeft = 10;converter.HtmlPageMarginRight = 10;
using Stream inputStream = File.OpenRead(@"input.html");using MemoryStream outputStream = new();
GdPictureStatus status = converter.ConvertToPDF(inputStream, GdPicture14.DocumentFormat.DocumentFormatHTML, outputStream, PdfConformance.PDF1_5);if (status != GdPictureStatus.OK){ throw new Exception(status.ToString());}
outputStream.Position = 0;status = converter.LoadFromStream(outputStream, GdPicture14.DocumentFormat.DocumentFormatPDF);if (status != GdPictureStatus.OK){ throw new Exception(status.ToString());}
status = converter.SaveAsDOCX("output.docx");if (status != GdPictureStatus.OK){ throw new Exception(status.ToString());}
status = converter.CloseDocument();if (status != GdPictureStatus.OK && status != GdPictureStatus.Aborted){ throw new Exception(status.ToString());}
Console.WriteLine("The input document has been converted to a docx file");DocxImageQuality PropertyGdPictureDocumentConverter ClassGdPictureDocumentConverter MembersCloseDocument MethodConvertToPDF MethodLoadFromStream MethodRasterizationDPI PropertyHtmlEmulationType PropertyHtmlPageHeight PropertyHtmlPageMarginBottom PropertyHtmlPageMarginLeft PropertyHtmlPageMarginRight PropertyHtmlPageMarginTop PropertyHtmlPageWidth PropertyHtmlPreferCSSPageSize PropertyHtmlPreferOnePage Property
Optional HTML configuration properties
Optionally, configure the conversion with the following properties of the GdPictureDocumentConverter object:
HtmlEmulationTypeHtmlPageHeightHtmlPageMarginBottomHtmlPageMarginLeftHtmlPageMarginRightHtmlPageMarginTopHtmlPageWidthHtmlPreferCSSPageSizeHtmlPreferOnePage
Optional PDF configuration properties
Optionally, configure the intermediate PDF generated from the HTML source with the following properties of the GdPictureDocumentConverter object:
PdfBitonalImageCompressionis a member of thePdfCompressionenumeration that specifies the compression scheme used for bitonal images in the output PDF file.PdfColorImageCompressionis a member of thePdfCompressionenumeration that specifies the compression scheme used for color images in the output PDF file.PdfEnableColorDetectionis a Boolean value that specifies whether to use automatic color detection during the conversion that preserves image quality and reduces the output file size.PdfEnableLinearizationis a Boolean value that specifies whether to linearize the output PDF to enable Fast Web View mode.PdfImageQualityis an integer from 0 to 100 that specifies the image quality in the output PDF file.