Convert PDF to HTML

Document Engine can convert PDF documents to HTML. You can use this output for downstream processing, web display, or content reuse.

This guide explains how to convert a PDF to HTML with the /api/build endpoint.

Ensure Document Engine is up and running.
Send a multipart POST request(opens in a new tab) with instructions to Document Engine’s /api/build endpoint.

For more information, refer to the API reference to learn about the /api/build endpoint and all the actions you can perform on PDFs with Document Engine.

For an overview of multipart requests, refer to the brief tour of multipart requests blog post.

Basic conversion

Start with a standard Build API request.

To convert a PDF to HTML, send a PDF as input and set the output type to html.

SHELL
HTTP

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F document=@/path/to/document.pdf \
  -F instructions='{
    "parts": [{ "file": "document" }],
    "output": {
      "type": "html"
    }
  }' \
  -o result.html

POST /api/build HTTP/1.1
Content-Type: multipart/form-data; boundary=customboundary
Authorization: Token token=<API token>

--customboundary
Content-Disposition: form-data; name="document"; filename="document.pdf"
Content-Type: application/pdf

<PDF data>
--customboundary
Content-Disposition: form-data; name="instructions"
Content-Type: application/json

{
  "parts": [{ "file": "document" }],
  "output": {
    "type": "html"
  }
}
--customboundary--

The response is an HTML document with the text/html content type.

HTML layout options

Choose a layout based on how you want the HTML to represent the source PDF.

The HTML output supports two layouts:

page — Preserves the original page structure of the PDF.
reflow — Produces a continuous flow of text without page breaks.

If you don’t specify a layout, Document Engine uses page by default.

Page layout

Use page when you want the generated HTML to stay close to the original PDF page structure.

SHELL

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F document=@/path/to/document.pdf \
  -F instructions='{
    "parts": [{ "file": "document" }],
    "output": {
      "type": "html",
      "layout": "page"
    }
  }' \
  -o result.html

Reflow layout

Use reflow when you want the content to read as continuous HTML instead of page-based output.

SHELL

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F document=@/path/to/document.pdf \
  -F instructions='{
    "parts": [{ "file": "document" }],
    "output": {
      "type": "html",
      "layout": "reflow"
    }
  }' \
  -o result.html

If your input file is hosted remotely, you can send the same request as JSON:

SHELL

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -H "Content-Type: application/json" \
  -d '{
    "parts": [
      {
        "file": {
          "url": "https://www.nutrient.io/api/assets/downloads/samples/pdf/document.pdf"
        }
      }
    ],
    "output": {
      "type": "html",
      "layout": "reflow"
    }
  }' \
  -o result.html

Apply operations before conversion

You can modify the document before Document Engine generates the final HTML output.

Because PDF-to-HTML conversion uses the Build API, you can apply operations such as these before conversion:

Assemble a PDF from multiple parts.
Rotate pages.
Add a watermark.
Import annotations.

SHELL

curl -X POST http://localhost:5000/api/build \
  -H "Authorization: Token token=<API token>" \
  -F document=@/path/to/document.pdf \
  -F instructions='{
    "parts": [{ "file": "document" }],
    "actions": [
      {
        "type": "rotatePages",
        "rotation": 90,
        "pages": {
          "start": 0,
          "end": 0
        }
      }
    ],
    "output": {
      "type": "html",
      "layout": "page"
    }
  }' \
  -o result.html

This approach helps you make sure the exported HTML reflects the processed document.

API schema reference

Refer to these API reference entries for the full schema details:

Next steps

You can now choose the layout mode that fits your use case:

Use page when visual structure matters.
Use reflow when you need continuous HTML output.

For related workflows, refer to the following guides:

Convert PDF to HTML

Basic conversion

HTML layout options

Page layout

Reflow layout

Apply operations before conversion

API schema reference

Next steps

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.