This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /guides/document-engine/conversion/pdf-to-html.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. PDF-to-HTML server-side conversion

Document Engine can convert PDF documents to HTML. You can use this output for downstream processing, web display, or content reuse.

This guide explains how to convert a PDF to HTML with the /api/build endpoint.

For more information, refer to the API reference to learn about the /api/build endpoint and all the actions you can perform on PDFs with Document Engine.

For an overview of multipart requests, refer to the brief tour of multipart requests blog post.

Basic conversion

Start with a standard Build API request.

To convert a PDF to HTML, send a PDF as input and set the output type to html.

Terminal window
curl -X POST http://localhost:5000/api/build \
-H "Authorization: Token token=<API token>" \
-F document=@/path/to/document.pdf \
-F instructions='{
"parts": [{ "file": "document" }],
"output": {
"type": "html"
}
}' \
-o result.html

The response is an HTML document with the text/html content type.

HTML layout options

Choose a layout based on how you want the HTML to represent the source PDF.

The HTML output supports two layouts:

  • page — Preserves the original page structure of the PDF.
  • reflow — Produces a continuous flow of text without page breaks.

If you don’t specify a layout, Document Engine uses page by default.

Page layout

Use page when you want the generated HTML to stay close to the original PDF page structure.

Terminal window
curl -X POST http://localhost:5000/api/build \
-H "Authorization: Token token=<API token>" \
-F document=@/path/to/document.pdf \
-F instructions='{
"parts": [{ "file": "document" }],
"output": {
"type": "html",
"layout": "page"
}
}' \
-o result.html

Reflow layout

Use reflow when you want the content to read as continuous HTML instead of page-based output.

Terminal window
curl -X POST http://localhost:5000/api/build \
-H "Authorization: Token token=<API token>" \
-F document=@/path/to/document.pdf \
-F instructions='{
"parts": [{ "file": "document" }],
"output": {
"type": "html",
"layout": "reflow"
}
}' \
-o result.html

If your input file is hosted remotely, you can send the same request as JSON:

Terminal window
curl -X POST http://localhost:5000/api/build \
-H "Authorization: Token token=<API token>" \
-H "Content-Type: application/json" \
-d '{
"parts": [
{
"file": {
"url": "https://www.nutrient.io/api/assets/downloads/samples/pdf/document.pdf"
}
}
],
"output": {
"type": "html",
"layout": "reflow"
}
}' \
-o result.html

Apply operations before conversion

You can modify the document before Document Engine generates the final HTML output.

Because PDF-to-HTML conversion uses the Build API, you can apply operations such as these before conversion:

  • Assemble a PDF from multiple parts.
  • Rotate pages.
  • Add a watermark.
  • Import annotations.
Terminal window
curl -X POST http://localhost:5000/api/build \
-H "Authorization: Token token=<API token>" \
-F document=@/path/to/document.pdf \
-F instructions='{
"parts": [{ "file": "document" }],
"actions": [
{
"type": "rotatePages",
"rotation": 90,
"pages": {
"start": 0,
"end": 0
}
}
],
"output": {
"type": "html",
"layout": "page"
}
}' \
-o result.html

This approach helps you make sure the exported HTML reflects the processed document.

API schema reference

Refer to these API reference entries for the full schema details:

Next steps

You can now choose the layout mode that fits your use case:

  • Use page when visual structure matters.
  • Use reflow when you need continuous HTML output.

For related workflows, refer to the following guides: