Convert PDF to HTML
Document Engine can convert PDF documents to HTML. You can use this output for downstream processing, web display, or content reuse.
This guide explains how to convert a PDF to HTML with the /api/build endpoint.
- Ensure Document Engine is up and running.
- Send a multipart POST request(opens in a new tab) with instructions to Document Engine’s
/api/buildendpoint.
For more information, refer to the API reference to learn about the /api/build endpoint and all the actions you can perform on PDFs with Document Engine.
For an overview of multipart requests, refer to the brief tour of multipart requests blog post.
Basic conversion
Start with a standard Build API request.
To convert a PDF to HTML, send a PDF as input and set the output type to html.
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F document=@/path/to/document.pdf \ -F instructions='{ "parts": [{ "file": "document" }], "output": { "type": "html" } }' \ -o result.htmlPOST /api/build HTTP/1.1Content-Type: multipart/form-data; boundary=customboundaryAuthorization: Token token=<API token>
--customboundaryContent-Disposition: form-data; name="document"; filename="document.pdf"Content-Type: application/pdf
<PDF data>--customboundaryContent-Disposition: form-data; name="instructions"Content-Type: application/json
{ "parts": [{ "file": "document" }], "output": { "type": "html" }}--customboundary--The response is an HTML document with the text/html content type.
HTML layout options
Choose a layout based on how you want the HTML to represent the source PDF.
The HTML output supports two layouts:
page— Preserves the original page structure of the PDF.reflow— Produces a continuous flow of text without page breaks.
If you don’t specify a layout, Document Engine uses page by default.
Page layout
Use page when you want the generated HTML to stay close to the original PDF page structure.
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F document=@/path/to/document.pdf \ -F instructions='{ "parts": [{ "file": "document" }], "output": { "type": "html", "layout": "page" } }' \ -o result.htmlReflow layout
Use reflow when you want the content to read as continuous HTML instead of page-based output.
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F document=@/path/to/document.pdf \ -F instructions='{ "parts": [{ "file": "document" }], "output": { "type": "html", "layout": "reflow" } }' \ -o result.htmlIf your input file is hosted remotely, you can send the same request as JSON:
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -H "Content-Type: application/json" \ -d '{ "parts": [ { "file": { "url": "https://www.nutrient.io/api/assets/downloads/samples/pdf/document.pdf" } } ], "output": { "type": "html", "layout": "reflow" } }' \ -o result.htmlApply operations before conversion
You can modify the document before Document Engine generates the final HTML output.
Because PDF-to-HTML conversion uses the Build API, you can apply operations such as these before conversion:
- Assemble a PDF from multiple parts.
- Rotate pages.
- Add a watermark.
- Import annotations.
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F document=@/path/to/document.pdf \ -F instructions='{ "parts": [{ "file": "document" }], "actions": [ { "type": "rotatePages", "rotation": 90, "pages": { "start": 0, "end": 0 } } ], "output": { "type": "html", "layout": "page" } }' \ -o result.htmlThis approach helps you make sure the exported HTML reflects the processed document.
API schema reference
Refer to these API reference entries for the full schema details:
Next steps
You can now choose the layout mode that fits your use case:
- Use
pagewhen visual structure matters. - Use
reflowwhen you need continuous HTML output.
For related workflows, refer to the following guides:
- Convert HTML to PDF
- PDF and document conversion server
- Convert PDF to image
- Convert PDF to Office
- Convert documents to PDF/A formats
- Convert PDFs to PDF/UA-1