503 service unavailable (overload)
Document Engine includes a back-pressure system that prevents a node from accepting more work than it can safely process. When the internal processing queue becomes full, the node responds with an HTTP 503 service unavailable error. This overload condition is intentional: It protects node stability and signals to clients that they must reduce request concurrency and retry failed operations with backoff.
Why overload can occur
Although overload reflects expected back-pressure rather than an unexpected failure, several conditions can increase the likelihood of queue saturation.
Local caching and initial workload
Document Engine maintains a local file system cache on each node. When a document is accessed for the first time on a node, the system must fetch, parse, and prepare it before serving additional operations. This parsing work is CPU-intensive and occupies workers. New nodes, or nodes that restart, begin with empty caches and therefore perform this initial work more often.
Repeated work across multiple nodes
In distributed or autoscaling deployments, multiple nodes may be required to process the same documents before their caches have warmed. Since the local file system cache isn’t shared between nodes, each node repeats the same initial processing steps. During peaks, this duplicated work can increase processing load across the cluster.
Scaling behavior in dynamic environments
Deployments that rapidly create new nodes may result in multiple instances starting with empty caches. If high traffic arrives before these caches have warmed, the additional initial work performed on each node can increase overall processing load and make queue saturation more likely.
Strategies that improve stability
The following configuration patterns help reduce repeated work and improve stability under load. These approaches are grounded in the system’s documented behavior and have been effective in real deployments.
Effective use of local caching
Ensuring that frequently accessed documents remain in the local asset cache reduces the amount of repeated fetch-and-parse work. Document Engine’s caching behavior is documented in the cache configuration guide. You can adjust the size of the on-disk document cache using the ASSET_STORAGE_CACHE_SIZE environment variable; see the configuration options limits and timeouts for details and defaults.
Shared rendering cache
When supported in the deployment environment, using a shared cache for rendered pages allows multiple nodes to reuse completed rendering work. This reduces the per-node CPU load, especially during periods of high activity.
Node stability and cache locality
Some workloads benefit from maintaining a smaller number of stable, appropriately resourced nodes rather than rapidly creating many short-lived instances. A node that remains active with a sufficiently large cache can avoid repeated initialization work and handle load more consistently.
Observability and diagnostics
Understanding overload events requires insight into cache behavior, queue status, and request patterns. Deployments should ensure that metrics remain available even when nodes scale up rapidly, as these metrics provide essential visibility into performance.
Useful metrics include:
cache.fs_hit/cache.fs_miss— Show the effectiveness of the local file-system cachecache.fs_size/cache.fs_free— Indicate cache usage and cleanup frequencyhttp_server.req_end(status tags) — Reveal the distribution of successful and error responses- Rendering cache metrics — When using a shared cache backend
Summary
HTTP 503 service unavailable is the surface signal Document Engine emits when a node’s internal queue is full due to overload. Situations involving new or frequently restarted nodes, repeated work across multiple nodes, or limited cache warmth can increase the likelihood of overload. Ensuring adequate caching, maintaining observability, and using appropriate client-side throttling reduces load pressure and improves resiliency during peak periods.
If overload errors persist, contact Support with structured logs (LOG_STRUCTURED=true) from a peak period and your deployment configuration.