Convert HTML to Word in C#
Nutrient .NET SDK (formerly GdPicture.NET) includes the ability to convert any supported file type into Word.
To save an HTML file to a Word document (DOCX), first use the SaveAsPDF
method of the GdPictureDocumentConverter
class to convert it to PDF. Then use the SaveAsDOCX
method to convert it to a DOCX.
The SaveAsPDF
method uses the following parameters:
Stream
, or the overloadFilePath
— A stream object where the current document is saved as a DOCX file. This stream object must be initialized before it can be sent into this method, and it should stay open for subsequent use. If the output stream isn’t open for both reading and writing, the method will fail, returning theGdPictureStatus.InvalidParameter
status, which is the file path where the converted file will be saved. If the specified file already exists, it’ll be overwritten. You have to specify a full file path, including the file extension.Conformance
— A member of thePdfConformance
enumeration. This specifies the required conformance to the PDF or PDF/A standard of the saved PDF document. You can use the value ofPdfConformance.PDF
to save the file as a common PDF document.
The SaveAsDOCX
method uses the following parameters:
Stream
, or the overloadFilePath
Note that the output stream should be open for both reading and writing and closed/disposed of by the user once processing is complete using the CloseDocument
method.
How to convert HTML to DOCX
- Create a
GdPictureDocumentConverter
object. - Convert the source HTML file to PDF with
GdPictureDocumentConverter.SaveAsPDF(Stream, PdfConformance)
. Recommended: Specify the source document format with a member of theDocumentFormat
enumeration. - Load the newly generated PDF file by passing its path to the
LoadFromFile
method (this method only supports PDF documents). - Save the PDF file as a DOCX using
SaveAsDOCX
.
The following example converts and saves an HTML document to a DOCX file (it can also be saved as a stream):
using GdPictureDocumentConverter converter = new();
// Set the text and document properties to be used for the resulting file.converter.HtmlPageHeight = 842; // A3 page sizeconverter.HtmlPageWidth = 1191; // A3 page sizeconverter.HtmlPageMarginTop = 10;converter.HtmlPageMarginBottom = 10;converter.HtmlPageMarginLeft = 10;converter.HtmlPageMarginRight = 10;
using Stream inputStream = File.Open(@"input.html", System.IO.FileMode.Open);using Stream outputStream = new MemoryStream();
GdPictureStatus status = converter.ConvertToPDF(inputStream, GdPicture14.DocumentFormat.DocumentFormatHTML, outputStream, PdfConformance.PDF1_5);if (status != GdPictureStatus.OK){ throw new Exception(status.ToString());}
status = converter.LoadFromStream(outputStream);if (status != GdPictureStatus.OK){ throw new Exception(status.ToString());}
status = converter.SaveAsDOCX("output.docx");if (status != GdPictureStatus.OK){ throw new Exception(status.ToString());}
Console.WriteLine("The input document has been converted to a docx file");
See also
DocxImageQuality Property
GdPictureDocumentConverter Class
GdPictureDocumentConverter Members
CloseDocument Method
RasterizationDPI Property
HtmlEmulationType Property
HtmlPageHeight Property
HtmlPageMarginBottom Property
HtmlPageMarginLeft Property
HtmlPageMarginRight Property
HtmlPageMarginTop Property
HtmlPageWidth Property
HtmlPreferCSSPageSize Property
HtmlPreferOnePage Property
Related topics
Optional HTML configuration properties
Optionally, configure the conversion with the following properties of the GdPictureDocumentConverter
object:
HtmlEmulationType
HtmlPageHeight
HtmlPageMarginBottom
HtmlPageMarginLeft
HtmlPageMarginRight
HtmlPageMarginTop
HtmlPageWidth
HtmlPreferCSSPageSize
HtmlPreferOnePage
Optional PDF configuration properties
Optionally, configure the conversion with the following properties of the GdPictureDocumentConverter
object:
PdfBitonalImageCompression
is a member of thePdfCompression
enumeration that specifies the compression scheme used for bitonal images in the output PDF file.PdfColorImageCompression
is a member of thePdfCompression
enumeration that specifies the compression scheme used for color images in the output PDF file.PdfEnableColorDetection
is a Boolean value that specifies whether to use automatic color detection during the conversion that preserves image quality and reduces the output file size.PdfEnableLinearization
is a Boolean value that specifies whether to linearize the output PDF to enable Fast Web View mode.PdfImageQuality
is an integer from 0 to 100 that specifies the image quality in the output PDF file.
The example below creates a PDF document from an HTML file with a custom configuration:
using GdPictureDocumentConverter gdpictureDocumentConverter = new GdPictureDocumentConverter();// Load the source document.gdpictureDocumentConverter.LoadFromFile(@"C:\temp\source.html", GdPicture14.DocumentFormat.DocumentFormatHTML);// Configure the conversion.gdpictureDocumentConverter.PdfColorImageCompression = PdfCompression.PdfCompressionJPEG;gdpictureDocumentConverter.PdfImageQuality = 50;// Save the output in a new PDF document.gdpictureDocumentConverter.SaveAsPDF(@"C:\temp\output.pdf");
Using gdpictureDocumentConverter As GdPictureDocumentConverter = New GdPictureDocumentConverter() ' Load the source document. gdpictureDocumentConverter.LoadFromFile("C:\temp\source.html", GdPicture14.DocumentFormat.DocumentFormatHTML); ' Configure the conversion. gdpictureDocumentConverter.PdfColorImageCompression = PdfCompression.PdfCompressionJPEG gdpictureDocumentConverter.PdfImageQuality = 50 ' Save the output in a new PDF document. gdpictureDocumentConverter.SaveAsPDF("C:\temp\output.pdf")End Using
Used methods and properties
Related topics