Support large documents

If your application works with large uploads; scanned document bundles; architectural drawings; or PDFs with many pages, annotations, or form fields, you should plan for large-document support from the start.

Document Engine doesn’t have a separate “large document mode.” Instead, support depends on a combination of:

The document itself, including file size, page count, annotation count, form fields, and page dimensions
The storage backend you choose
The upload and processing timeouts across your stack
The available CPU, memory, local disk, and network throughput in your deployment

Use object storage for large documents

For production deployments that need to handle large documents, use object storage:

Avoid using the built-in database-backed storage for large documents in production.

Why:

The built-in storage backend stores assets in PostgreSQL, and PostgreSQL has a hard 1 GB limit for a field value. Because Document Engine stores built-in assets in a bytea column, that limit applies to stored assets. For details, see the PostgreSQL limits documentation(opens in a new tab).
It’s not a good fit for large-object production workloads.
Object storage performs better and scales more predictably for large PDFs and attachments.

If you expect uploads larger than 1 GB, object storage isn’t just recommended; it’s required.

Increase the right limits

Large-document support usually requires adjusting several settings together.

The most relevant settings are:

ASSET_STORAGE_BACKEND — Choose s3 or azure for large documents in production.
MAX_UPLOAD_SIZE_BYTES — Sets the largest upload Document Engine will accept.
SERVER_REQUEST_TIMEOUT — Sets the HTTP API request timeout for upload and post-upload processing.
PSPDFKIT_WORKER_TIMEOUT — Sets how long PDF-related processing can run before timing out.
FILE_UPLOAD_TIMEOUT_MS — Sets the timeout for uploads to S3 or Azure Blob Storage.
ASSET_STORAGE_CACHE_SIZE — Sets the size of the local asset cache.

Depending on how your integration works, these can also matter:

READ_ANNOTATION_BATCH_TIMEOUT — Useful for annotation-heavy documents if annotation-related requests time out.
ASSET_STORAGE_CACHE_TIMEOUT — Useful if fetching large assets from storage times out.
REMOTE_URL_FETCH_TIMEOUT — Useful if you add documents from remote URLs instead of direct uploads.

Refer to the configuration options guide for the full list.

Configure the whole request path

Large uploads can still fail even when Document Engine is configured correctly if another layer times out first.

Make sure the whole path is aligned:

Client timeout
  -> reverse proxy or ingress timeout and body-size limit
  -> Document Engine upload limit and request timeout
  -> object storage upload timeout

In practice, this means:

Your reverse proxy or ingress must allow the same upload size as Document Engine.
Proxy timeouts should be at least as long as the Document Engine request timeout.
Object storage timeouts must be long enough for your largest uploads.

If any layer has a smaller limit, that layer becomes the effective maximum.

Size your environment for your workload

Large documents put pressure on more than just upload bandwidth.

Plan for enough:

Local disk for temporary upload files and the asset cache
RAM and CPU for the operations your app performs after upload
Network throughput between Document Engine and your object storage

For example, a deployment that only uploads and stores large PDFs has different needs from one that also renders pages, extracts text, or exports annotated copies.

Validate with your own documents

The best way to tune large-document support is to test with representative files from your own application.

We recommend this workflow:

Choose object storage and set an upload limit that’s larger than your largest supported document.
Increase request and storage timeouts based on an educated estimate.
Upload representative large documents and run the operations your app uses most often.
Monitor Document Engine logs and proxy logs for timeouts and warnings.
Adjust settings until the warnings and timeout errors disappear for your supported workload.

Typical validation steps include:

Uploading the document
Opening it in the viewer
Loading annotations or form fields
Adding a comment
Exporting or saving changes

Example starting point

The exact values here depend on your environment, so use this only as a checklist:

ASSET_STORAGE_BACKEND=s3
MAX_UPLOAD_SIZE_BYTES=<larger than your largest supported upload>
SERVER_REQUEST_TIMEOUT=<long enough for upload and processing>
PSPDFKIT_WORKER_TIMEOUT=<long enough for post-upload PDF processing>
FILE_UPLOAD_TIMEOUT_MS=<long enough for your object storage>
ASSET_STORAGE_CACHE_SIZE=<large enough for your working set and local disk budget>

Depending on your workload, you may also need:

READ_ANNOTATION_BATCH_TIMEOUT=<if annotation-heavy requests time out>
ASSET_STORAGE_CACHE_TIMEOUT=<if fetching large assets times out>
REMOTE_URL_FETCH_TIMEOUT=<if you import large documents from URLs>

For ASSET_STORAGE_CACHE_SIZE, a useful starting point is:

average document size
  x number of documents you expect to be active concurrently
  x safety factor

For the safety factor, start with 1.5 and increase it if your workload has frequent cache churn.

For example:

If your average large document is 500 MB and you expect one instance to keep about 20 active documents cached, start around 500 MB x 20 x 1.5 = 15 GB.
If your average large document is 2 GB and you expect 10 active documents per instance, start around 2 GB x 10 x 1.5 = 30 GB.

Keep in mind that ASSET_STORAGE_CACHE_SIZE only covers the local asset cache. You should also leave additional disk space for:

Temporary upload files
Other application data
The operating system and container runtime

In practice, this means the machine or container host should have more free disk space than the cache size alone.

Support large documents

Use object storage for large documents

Increase the right limits

Configure the whole request path

Size your environment for your workload

Validate with your own documents

Example starting point

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.

Support large documents

Use object storage for large documents

Increase the right limits

Configure the whole request path

Size your environment for your workload

Validate with your own documents

Example starting point

Related guides

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.