1.17 release notes
RSSBefore attempting to upgrade to Document Engine 1.17, make sure your application runs as expected on your current version. If you’re on version 1.6.1 or later, you can upgrade directly to 1.17. If you’re on an earlier version, follow the step-by-step upgrade path outlined in our upgrade guide.
Highlights
Remote URL fetch controls
Customer-controlled remote URL fetches now use a central policy with safe defaults and explicit configuration for self-hosted deployments that need to fetch from known private or internal destinations.
Async Build API processing
The Build API can now accept multipart build requests for background processing with Prefer: respond-async. Document Engine returns 202 Accepted with status and result URLs, supports Idempotency-Key for safe retries, supports uploaded files, remote URL inputs, and Document Engine document inputs, and lets clients request an allowed processing queue.
Async results are retained for 1 day by default, while job metadata is retained for 1 week by default. For setup, limitations, and examples, see the async Build API processing section.
Faster text search with a cached full-text index
Plain text search can now be served from a cached full-text search (FTS) index instead of rescanning the source PDF on every query, turning multi-second searches on large documents into sub-second ones. In our large-file benchmarks, whole-document search on a 5,000-page PDF went from several seconds to well under 200 ms. The feature is opt-in and the /search response shape is unchanged.
Faster DOM printing for Nutrient Web SDK
Document Engine now renders a document’s pages for printing in a single batched, streamed request instead of one request per page. This cuts print-preparation time for Nutrient Web SDK’s server-backed DOM printing (PrintMode.DOM), with the largest gains for high-quality prints and on higher-latency connections.
Breaking changes
This release includes breaking changes that may require updates to existing deployments and integrations.
Verified TLS connections now require SAN certificates
Document Engine now runs on Erlang/OTP versions that follow RFC 9525(opens in a new tab) hostname verification behavior and no longer fall back to the certificate subject Common Name when a certificate has no matching Subject Alternative Name entry. Erlang/OTP marks this as a potential incompatibility related to CVE-2026-42790(opens in a new tab) in the SSL/Public Key changelog(opens in a new tab).
The most common startup-impact case is self-hosted deployments that connect Document Engine to PostgreSQL with PGSSL=true and certificate verification enabled. A PostgreSQL server certificate that only identifies the server with CN=<host> can fail hostname verification after the upgrade.
The same certificate requirement can also affect other verified TLS connections made by Document Engine, such as remote HTTPS downloads when certificate verification is configured, Redis TLS, signing service or timestamp authority URLs, and customer-managed HTTPS asset storage endpoints. Public cloud and Nutrient-hosted endpoints are expected to already use Subject Alternative Name certificates.
Regenerate affected server certificates with matching DNS or IP Subject Alternative Name entries, for example DNS:db.example.com or IP:10.0.0.5, before upgrading.
PGSSL_CERT_COMMON_NAME can still configure the hostname Document Engine expects, but it does not restore Common Name fallback. As a temporary workaround only, deployments can disable hostname verification with PGSSL_DISABLE_HOSTNAME_VERIFY=true or disable PostgreSQL certificate verification with PGSSL_DISABLE_VERIFY=true.
Remote URL fetches now allow only public destinations by default
Customer-controlled remote URL fetches now use the public_only policy by default. This applies to document creation from a remote URL, Build API URL file inputs, remote document assets, and Office template image URLs.
With the default policy, Document Engine resolves DNS and rejects local, private, link-local, metadata-service, and other blocked address ranges. This can reject URLs that previously worked in self-hosted deployments, such as URLs pointing to RFC1918 intranet hosts.
If your deployment intentionally fetches from controlled internal destinations, configure the new remote URL fetch options before upgrading. Use REMOTE_URL_FETCH_POLICY=allowlist_only with REMOTE_URL_FETCH_ALLOWED_HOSTS or REMOTE_URL_FETCH_ALLOWED_CIDRS for the strictest configuration for non-critical destinations, or REMOTE_URL_FETCH_POLICY=allow_private when the deployment must allow RFC1918 IPv4 or IPv6 ULA destinations. Localhost, link-local, and cloud metadata addresses remain blocked unless REMOTE_URL_FETCH_POLICY=allow_all is used.
Invalid type values on Instant JSON exports are rejected
The Instant JSON export endpoints (GET /api/documents/:documentId/document.json and the layer variant) now validate the type query parameter. A request with a type value other than full or diff is rejected with a 400 response. Such requests previously ignored the unknown parameter and returned the full export with a 200. Requests without a type parameter are unaffected.
Async job status responses use a unified shape
GET /api/async/jobs/{jobId} now returns the same response shape for all async jobs, including bulk document deletion and document asset migration jobs.
Previously, existing async jobs returned only status and optional details:
{ "data": { "status": "failed", "details": [ { "description": "failed" } ] }}They now return job metadata and normalized error information:
{ "data": { "job_id": "01J...", "status": "failed", "created_at": "2026-05-26T10:00:00Z", "updated_at": "2026-05-26T10:00:00Z", "error": { "reason": "async_job_failed", "description": "failed" } }}Completed jobs now include the same metadata. Fields that do not apply to the current job state are omitted:
{ "data": { "job_id": "01J...", "status": "completed", "created_at": "2026-05-26T10:00:00Z", "updated_at": "2026-05-26T10:00:00Z", "expires_at": "2026-05-28T10:00:00Z" }}Clients that read data.details from async job status responses should switch to data.error.description. Clients that need a machine-readable failure reason should use data.error.reason.
Async job polling requires a job access token
Async job admission responses now include a signed job_access_token. Token values are prefixed with jat_. Clients must send this token in the X-Async-Job-Token header when calling GET /api/async/jobs/{jobId} or GET /api/async/jobs/{jobId}/result.
Async jobs created before this upgrade do not have a job_access_token; wait for existing async jobs to finish before upgrading, or resubmit those operations after upgrade.
This applies to async Build API jobs, bulk document deletion jobs, and document asset migration jobs. The token is scoped to one job and is returned again when an async Build API request is replayed with the same Idempotency-Key.
Bulk document deletion and document asset migration responses keep their existing jobId field name for compatibility. Their new access token field uses job_access_token, matching async Build API admission responses.
Deprecations
This release doesn’t include any deprecations.
Async Build API processing preview
The Build API can now process build requests asynchronously when the request includes the Prefer: respond-async header.
curl -X POST "http://localhost:5000/api/build" \ -H "Authorization: Token token=<secret>" \ -H "Prefer: respond-async" \ -H "Idempotency-Key: <unique-key>" \ -F instructions='{"parts":[{"file":"document"}]}' \ -F 'document=@document.pdf'The initial response is 202 Accepted and includes Location, Preference-Applied, a status URL, a result URL, a signed job_access_token prefixed with jat_, and the job expiry time. Uploaded multipart files are stored durably before the 202 response is returned.
Async Build supports uploaded file inputs, remote URL inputs, Document Engine document inputs, input PDF passwords in Build instructions, and output PDF password fields. Worker-only secrets, such as remote URLs, input PDF passwords, and output PDF passwords, are encrypted in async job metadata before they are stored in the database. Configure ASYNC_JOB_ENCRYPTION_KEYS with a dedicated keyring before accepting async Build jobs that contain these values. If the keyring is not configured, Document Engine rejects those requests with 422 async_job_encryption_keys_required before creating a job.
Document Engine document inputs are checked during async admission. The referenced document and layer must exist and be accessible to the authenticated caller before an async job is created. These inputs remain live sources: If the document or layer is deleted before processing, the job fails with source_unavailable; if the source document, selected layer, or source PDF changes after admission, the job fails with source_changed. Selected-layer changes include document records such as annotations and form data, even when the underlying source PDF asset is unchanged.
Async admission also supports Idempotency-Key. Keys are scoped by tenant, operation, method, and route. Reusing the same key with the same request payload, relevant headers, and queue selection returns the original job response, while reusing it with a different request returns 409 Conflict. Keys can be reused after the async job metadata retention window expires. Replayed admissions return the same job_access_token.
The background worker restores the persisted request, runs the Build API operation, stores the generated output, and removes staged input files after the job reaches a terminal state. Clients can poll the status URL until the job reports completed or failed.
When a job completes, the status response includes result metadata and a result_url. Clients can download the result through:
curl -X GET "http://localhost:5000/api/async/jobs/<job_id>/result" \ -H "Authorization: Token token=<secret>" \ -H "X-Async-Job-Token: <job_access_token>" \ --output result.pdfThe result endpoint returns:
200with the result body when the job completed successfully.404withasync_job_not_foundwhen the job does not exist, access is denied, or the job access token is missing, invalid, or expired.404withasync_result_unavailablewhen the authorized job exists but no result asset is available.409when the job has not completed or the job failed or was cancelled.410when the result has expired.
The current preview rejects PDF password headers and document- or layer-scoped build requests before creating an async job. Put PDF passwords in the Build API instructions instead of nutrient-pdf-password or pspdfkit-pdf-password headers.
Clients can also request a processing queue using the Prefer parameter:
Prefer: respond-async;queue=custom-queueQueue names use the same validation as async job queue configuration: 1 to 127 characters, starting with a lowercase letter, followed by lowercase letters, digits, underscores, or hyphens. If no queue is provided, the default queue is used. Requested queues, including the default queue, must be listed in ASYNC_ADMISSION_QUEUES. The default admission queue is default; set ASYNC_ADMISSION_QUEUES to an empty value to disable async operation admission.
Async Build admission also requires persistent database storage. When persistent database storage is not available, Document Engine rejects Prefer: respond-async Build requests with 503 async_admission_disabled.
Admitting work and processing work are configured separately. At least one Document Engine node must process the selected queue with ASYNC_WORKER_QUEUES, for example ASYNC_WORKER_QUEUES=default=1. Single-node deployments process the default queue by default. If no node processes the selected queue, an accepted job remains not_started. Worker-only async nodes should set CLUSTERING_ENABLED=false; otherwise clustered document routing can forward document-scoped customer requests to them.
For complete setup guidance, response schemas, polling examples, result-download behavior, and troubleshooting status codes, see the Async Jobs API reference(opens in a new tab).
Async job observability
Document Engine now emits async-job-specific metrics and structured logs for background job lifecycle events. Operators can monitor queued, running, completed, failed, retried, expired, and missing-worker async jobs without relying only on generic Oban telemetry.
The new metrics include async job queue time, execution duration, status transitions, retry and failure counters, missing-worker counters, and current queue-state gauges grouped by queue, worker, and Oban state. Metric labels are bounded to operational dimensions such as queue, worker, type, state, previous status, and error kind; async job IDs and Oban job IDs are available in structured logs instead of metric labels.
Queue-state gauges are reported by the Oban leader so clustered deployments do not multiply the same database-backed counts by the number of nodes. Non-leader nodes clear previously emitted queue-state groups to avoid stale gauge series during leadership changes.
Configuration options
Async job processing
Document Engine now centralizes async job configuration and documents the operator-facing environment variables used by background processing.
ASYNC_JOBS_TTL keeps its existing name, but it now describes async job result retention more precisely: the value is applied after an async job reaches a terminal state, such as completed, failed, or cancelled. The default is now 1 day.
This release adds the following async job configuration options:
ASYNC_JOB_TIMEOUTfor the maximum async job processing time, in seconds.ASYNC_JOB_MAX_RETRIESfor the maximum number of retries after the initial async job execution attempt fails. The default is 2 retries.ASYNC_JOB_RETRY_BACKOFF_SECONDSfor the base retry backoff used when failed async jobs are scheduled again. The default is 15 seconds.ASYNC_JOBS_METADATA_TTLfor the amount of time async job metadata is retained. The default is 1 week. It must be greater than or equal toASYNC_JOBS_TTL.ASYNC_ADMISSION_QUEUESfor selecting which background job queues can be used when admitting new async operations. The default isdefault. Set it to an empty value to disable async operation admission.ASYNC_WORKER_QUEUESfor selecting which background job queues a node processes and with what concurrency. The default isdefault=5. Set it to an empty value to make the node skip async background processing.ASYNC_ADMISSION_QUEUE_LIMITSfor optional per-queue admission limits. Unset queues are unlimited. By default, thedefaultqueue is unlimited. Limits are best-effort admission backpressure, not strict concurrency ceilings.ASYNC_JOB_ENCRYPTION_KEYSfor encrypting async job worker metadata, such as remote URLs and PDF passwords, with a dedicated keyring.
These options are intended for deployments that run separate API and worker nodes, or that need to tune background processing separately from request handling. Async operations still require persistent database storage for job metadata. When persistent database storage is not available, async Build admission returns 503 async_admission_disabled. Result-producing operations require a configured asset storage backend for output assets. The full syntax is documented in the configuration reference.
ASYNC_JOB_ENCRYPTION_KEYS accepts comma-separated key_id:base64_32_byte_key[:current] entries. Exactly one entry must be marked :current; older keys must stay configured until all jobs encrypted with them have expired. Async Build requests that contain remote URLs, input PDF passwords, or output PDF passwords require this keyring. If the keyring is not configured, Document Engine rejects those requests with 422 async_job_encryption_keys_required before creating a job. If an old async encryption key is removed too early, affected jobs can fail because worker metadata cannot be decrypted. Restore the missing key and retry the worker before the final async job retry is exhausted.
At least one node must process every queue that receives jobs. Existing async endpoints such as bulk document deletion and document asset migration submit work to the default queue. Single-node deployments do this by default. Split deployments that disable worker queues on API nodes need at least one worker node configured with, for example, ASYNC_WORKER_QUEUES=default=1. If no node processes the queue, jobs can be accepted but remain not_started. Async result and metadata cleanup also runs on the default queue.
Worker-only async nodes should also set CLUSTERING_ENABLED=false. Document Engine clustering can route document-scoped customer requests to any routable cluster member, so a node that only exists to process background queues must stay out of the clustering ring unless it is also intended to serve document requests.
Diff export type for document.json
The Instant JSON export endpoints (GET /api/documents/:documentId/document.json and GET /api/documents/:documentId/layers/:layerName/document.json) now accept an optional type query parameter.
The default, type=full, keeps the existing behavior: the export includes all form fields and widget annotations, including unmodified ones that originate from the uploaded PDF.
With type=diff, form fields and widget annotations that originate from the uploaded PDF and were never modified through Document Engine are excluded, so the exported form fields only describe changes made on top of the uploaded PDF: user-created form fields and modified base-PDF form fields. Use this when applying the export to a copy of the original PDF (for example via apply_instant_json), where the unmodified form fields are already part of the PDF.
The type parameter only affects form fields and their widget annotations. Other record types (annotations, bookmarks, comments) and form field values are included in both export types. Copying a document resets the copy’s modification tracking, so changes made before the copy are treated as part of the copy’s baseline.
An invalid type value is rejected with a 400 response.
HTTP header limit configuration
Document Engine now documents MAX_HEADER_VALUE_LENGTH and adds MAX_HEADERS for deployments that run behind trusted reverse proxies or load balancers that legitimately add large or numerous HTTP request headers.
MAX_HEADER_VALUE_LENGTH controls the maximum accepted size of an individual request header value and defaults to 8192 bytes. MAX_HEADERS controls the maximum accepted number of request headers and defaults to 100.
Preferred annotation fonts
Document Engine now supports PSPDFKIT_PREFERRED_FONTS, a comma-separated list of font names or family names to try before the general annotation font fallback algorithm.
Document Engine first tries the font requested by the annotation. If that font is unavailable, or if it cannot render the annotation text, Document Engine tries the configured preferred fonts in order. If none of the preferred fonts can render the text, Document Engine continues with the default fallback behavior.
This is useful for workflows where Web SDK standalone is configured with custom fonts and the resulting annotation JSON is later processed by Document Engine. Configure the same fonts in Document Engine and include them in PSPDFKIT_PREFERRED_FONTS to keep server-side annotation output closer to the standalone rendering.
Remote URL fetch controls
Document Engine now applies a central policy to customer-controlled remote URL fetches, such as document creation from a remote URL, Build API URL file inputs, remote document assets, and Office template image URLs.
By default, REMOTE_URL_FETCH_POLICY is public_only. In this mode, Document Engine resolves DNS, rejects local, private, link-local, metadata-service, and other blocked address ranges, and connects to the resolved address while preserving the original Host header and TLS SNI.
Self-hosted deployments that intentionally fetch from controlled internal destinations can configure the policy explicitly:
REMOTE_URL_FETCH_ALLOWED_HOSTSaccepts comma-separated hostnames or wildcard DNS patterns, such asassets.example.comor*.assets.example.com.REMOTE_URL_FETCH_ALLOWED_CIDRSaccepts comma-separated CIDR ranges.REMOTE_URL_FETCH_POLICYacceptsallowlist_only,public_only,allow_private, orallow_all.REMOTE_URL_FETCH_ALLOW_EMBEDDED_CREDENTIALScontrols whether URLs such ashttps://user:pass@example.com/file.pdfare accepted. It is disabled by default.
Use allowlist_only to deny all remote destinations except configured hosts or CIDRs, while still keeping critical SSRF ranges blocked. Use allow_private only when a deployment must fetch from RFC1918 IPv4 or IPv6 ULA destinations; localhost, link-local, and cloud metadata addresses remain blocked. allow_all is intended only as an emergency escape hatch for controlled self-hosted deployments.
Faster text search with a cached full-text index
Document Engine can now answer plain text search requests from a cached full-text search (FTS) index instead of rescanning the source PDF on every query. For large documents this turns multi-second searches into ones that complete in well under a second.
Previously, every /search request reopened the source PDF and scanned the requested pages from scratch. That cost grows with document size, so search on large blueprint- or manual-style PDFs could take several seconds per query. The new path builds an FTS index once per source PDF, persists it, and reuses it across requests and nodes.
The search response shape is unchanged, so SDK clients need no changes.
How it works
- When enabled, an eligible text search resolves the document’s current source PDF, fetches the cached FTS index (building it in the background if it is missing), queries the index, and returns results in the existing
/searchJSON shape. - The index is content-addressed by the source PDF, so two documents that share the same source PDF share one index, and replacing a document’s source PDF naturally invalidates its index.
- Indexes are built by background jobs and persisted to your configured asset storage (S3, Azure Blob Storage, or PostgreSQL). A node restart or a request routed to a different node reuses the persisted index rather than rebuilding it.
- Newly persisted indexes are also recorded in the database with their source PDF hash, index format version, and asset storage backend. This metadata is internal and prepares the stored artifacts for future cleanup.
- The index build never blocks upload, source-PDF replacement, or document deletion. If a build fails or an index is missing, the request transparently falls back to the previous per-query scan.
Known limitation: persisted indexes are not yet reclaimed
Persisted index artifacts are not garbage-collected in this release. When a document is deleted or its source PDF is replaced, the old index left in asset storage is not removed automatically, so index storage grows over time. Newly created index artifacts are tracked in the database so they can be found and removed by a future cleanup job, but that cleanup job is not part of this release.
Because indexes are shared by content (every document with the same source PDF shares one index), safe reclamation requires reference counting; that reaper is planned as a follow-up. The artifacts are small relative to the source PDFs, and local on-disk caches are still evicted under their normal size cap — only the asset-storage copies accumulate.
What uses the new path
To keep behavior safe and predictable, FTS handles only plain text search. These requests continue to use the previous search path:
- Searches that include annotations (
include_annotations=true). - Regex and preset searches.
- Case-sensitive searches (
case_sensitive=true). - Any document without a built or reachable index (a build is enqueued and the request falls back for now).
- Very broad queries that reach the current FTS result cap. These fall back to the previous scan so page-window semantics remain correct.
The start and limit page-window semantics are preserved exactly: as before, they select the range of pages to search, not a result offset.
Enabling it
This feature is opt-in and disabled by default; deploying this release does not change search behavior until you enable it. It is controlled by environment variables:
SEARCH_FTS_ENABLED(defaultfalse) — route eligible text searches through the FTS index.SEARCH_FTS_BUILD_ON_WRITE(defaultfalse) — build the index in the background when a document is created or its source PDF is replaced.SEARCH_FTS_ENRICHMENT(defaultdaemon) — how matches are turned into preview text and highlight rects.daemonis the supported value for this release.
Built indexes are always persisted to your configured asset storage so they survive restarts and node hops; this happens automatically once an index is built.
A typical rollout enables build-on-write first so indexes are available, then enables SEARCH_FTS_ENABLED for the query path.
Measured improvements
Internal benchmark runs in a development environment showed substantial latency reductions for large-document text searches when the request was served by the cached FTS index. The exact improvement depends on hardware, concurrent load, document characteristics, query term frequency, and whether the query can complete on the FTS path without falling back.
As one illustration, on a large, content-heavy document an example whole-document text search dropped from roughly 4 seconds on the previous scan to around 220 milliseconds when served by the cached index. This is a single data point, not a guaranteed result: other documents, queries, and deployments will see different figures.
The improvement is most visible on large documents where a repeated scan is expensive. For small documents, where a scan is already fast, the two paths perform similarly.
Database migrations
This release includes database migrations that add async operation metadata columns to the existing async_jobs table:
requestresulterrorownermetadata_expires_at
Existing async job rows remain readable during rolling deploys, but the public status response shape changes as described in the breaking changes notes.
Separate concurrent-index migrations add async job indexes used by cleanup and admission backpressure:
async_jobs_reaper_idxasync_jobs_metadata_reaper_idxasync_jobs_active_queue_idx
This release also adds a private async_job_admissions table used while admitting async operation requests. The table reserves a job ID, tracks durably staged input blobs, and stores idempotency reservation metadata until the public async_jobs row can be committed. It is internal implementation state and is not part of the public async job API.
This release also adds a private search_index_blobs table used to track persisted full-text search index artifacts. Search index bytes continue to live in the configured asset storage backend, but this table records the stored artifact hash, source PDF hash, search index format version, and backend metadata needed to identify those artifacts later.
The migration creates:
search_index_blobssearch_index_blobs_source_pdf_sha256_indexsearch_index_blobs_sha256_backend_info_index
The table is internal implementation state and does not change the public search API. It does not delete existing or newly created search index artifacts by itself; it only records newly persisted artifacts so future cleanup can find them.