Support large documents
If your application works with large uploads; scanned document bundles; architectural drawings; or PDFs with many pages, annotations, or form fields, you should plan for large-document support from the start.
Document Engine doesn’t have a separate “large document mode.” Instead, support depends on a combination of:
- The document itself, including file size, page count, annotation count, form fields, and page dimensions
- The storage backend you choose
- The upload and processing timeouts across your stack
- The available CPU, memory, local disk, and network throughput in your deployment
Use object storage for large documents
For production deployments that need to handle large documents, use object storage:
Avoid using the built-in database-backed storage for large documents in production.
Why:
- The built-in storage backend stores assets in PostgreSQL, and PostgreSQL has a hard
1 GBlimit for a field value. Because Document Engine stores built-in assets in abyteacolumn, that limit applies to stored assets. For details, see the PostgreSQL limits documentation(opens in a new tab). - It’s not a good fit for large-object production workloads.
- Object storage performs better and scales more predictably for large PDFs and attachments.
If you expect uploads larger than 1 GB, object storage isn’t just recommended; it’s required.
Increase the right limits
Large-document support usually requires adjusting several settings together.
The most relevant settings are:
ASSET_STORAGE_BACKEND— Chooses3orazurefor large documents in production.MAX_UPLOAD_SIZE_BYTES— Sets the largest upload Document Engine will accept.SERVER_REQUEST_TIMEOUT— Sets the HTTP API request timeout for upload and post-upload processing.PSPDFKIT_WORKER_TIMEOUT— Sets how long PDF-related processing can run before timing out.FILE_UPLOAD_TIMEOUT_MS— Sets the timeout for uploads to S3 or Azure Blob Storage.ASSET_STORAGE_CACHE_SIZE— Sets the size of the local asset cache.
Depending on how your integration works, these can also matter:
READ_ANNOTATION_BATCH_TIMEOUT— Useful for annotation-heavy documents if annotation-related requests time out.ASSET_STORAGE_CACHE_TIMEOUT— Useful if fetching large assets from storage times out.REMOTE_URL_FETCH_TIMEOUT— Useful if you add documents from remote URLs instead of direct uploads.
Refer to the configuration options guide for the full list.
Configure the whole request path
Large uploads can still fail even when Document Engine is configured correctly if another layer times out first.
Make sure the whole path is aligned:
Client timeout -> reverse proxy or ingress timeout and body-size limit -> Document Engine upload limit and request timeout -> object storage upload timeoutIn practice, this means:
- Your reverse proxy or ingress must allow the same upload size as Document Engine.
- Proxy timeouts should be at least as long as the Document Engine request timeout.
- Object storage timeouts must be long enough for your largest uploads.
If any layer has a smaller limit, that layer becomes the effective maximum.
Size your environment for your workload
Large documents put pressure on more than just upload bandwidth.
Plan for enough:
- Local disk for temporary upload files and the asset cache
- RAM and CPU for the operations your app performs after upload
- Network throughput between Document Engine and your object storage
For example, a deployment that only uploads and stores large PDFs has different needs from one that also renders pages, extracts text, or exports annotated copies.
Validate with your own documents
The best way to tune large-document support is to test with representative files from your own application.
We recommend this workflow:
- Choose object storage and set an upload limit that’s larger than your largest supported document.
- Increase request and storage timeouts based on an educated estimate.
- Upload representative large documents and run the operations your app uses most often.
- Monitor Document Engine logs and proxy logs for timeouts and warnings.
- Adjust settings until the warnings and timeout errors disappear for your supported workload.
Typical validation steps include:
- Uploading the document
- Opening it in the viewer
- Loading annotations or form fields
- Adding a comment
- Exporting or saving changes
Example starting point
The exact values here depend on your environment, so use this only as a checklist:
ASSET_STORAGE_BACKEND=s3MAX_UPLOAD_SIZE_BYTES=<larger than your largest supported upload>SERVER_REQUEST_TIMEOUT=<long enough for upload and processing>PSPDFKIT_WORKER_TIMEOUT=<long enough for post-upload PDF processing>FILE_UPLOAD_TIMEOUT_MS=<long enough for your object storage>ASSET_STORAGE_CACHE_SIZE=<large enough for your working set and local disk budget>Depending on your workload, you may also need:
READ_ANNOTATION_BATCH_TIMEOUT=<if annotation-heavy requests time out>ASSET_STORAGE_CACHE_TIMEOUT=<if fetching large assets times out>REMOTE_URL_FETCH_TIMEOUT=<if you import large documents from URLs>For ASSET_STORAGE_CACHE_SIZE, a useful starting point is:
average document size x number of documents you expect to be active concurrently x safety factorFor the safety factor, start with 1.5 and increase it if your workload has frequent cache churn.
For example:
- If your average large document is
500 MBand you expect one instance to keep about20active documents cached, start around500 MB x 20 x 1.5 = 15 GB. - If your average large document is
2 GBand you expect10active documents per instance, start around2 GB x 10 x 1.5 = 30 GB.
Keep in mind that ASSET_STORAGE_CACHE_SIZE only covers the local asset cache. You should also leave additional disk space for:
- Temporary upload files
- Other application data
- The operating system and container runtime
In practice, this means the machine or container host should have more free disk space than the cache size alone.