Document Engine includes a back-pressure system that prevents a node from accepting more work than it can safely process. When the internal processing queue becomes full, the node responds with an HTTP 503 service unavailable error. This overload condition is intentional: It protects node stability and signals to clients that they must reduce request concurrency and retry failed operations with backoff.

Why overload can occur

Although overload reflects expected back-pressure rather than an unexpected failure, several conditions can increase the likelihood of queue saturation.

Local caching and initial workload

Document Engine maintains a local file system cache on each node. When a document is accessed for the first time on a node, the system must fetch, parse, and prepare it before serving additional operations. This parsing work is CPU-intensive and occupies workers. New nodes, or nodes that restart, begin with empty caches and therefore perform this initial work more often.

Repeated work across multiple nodes

In distributed or autoscaling deployments, multiple nodes may be required to process the same documents before their caches have warmed. Since the local file system cache isn’t shared between nodes, each node repeats the same initial processing steps. During peaks, this duplicated work can increase processing load across the cluster.

Scaling behavior in dynamic environments

Deployments that rapidly create new nodes may result in multiple instances starting with empty caches. If high traffic arrives before these caches have warmed, the additional initial work performed on each node can increase overall processing load and make queue saturation more likely.

Strategies that improve stability

The following configuration patterns help reduce repeated work and improve stability under load. These approaches are grounded in the system’s documented behavior and have been effective in real deployments.

Effective use of local caching

Ensuring that frequently accessed documents remain in the local asset cache reduces the amount of repeated fetch-and-parse work. Document Engine’s caching behavior is documented in the cache configuration guide. You can adjust the size of the on-disk document cache using the ASSET_STORAGE_CACHE_SIZE environment variable; see the configuration options limits and timeouts for details and defaults.

Shared rendering cache

When supported in the deployment environment, using a shared cache for rendered pages allows multiple nodes to reuse completed rendering work. This reduces the per-node CPU load, especially during periods of high activity.

Node stability and cache locality

Some workloads benefit from maintaining a smaller number of stable, appropriately resourced nodes rather than rapidly creating many short-lived instances. A node that remains active with a sufficiently large cache can avoid repeated initialization work and handle load more consistently.

Observability and diagnostics

Understanding overload events requires insight into cache behavior, queue status, and request patterns. Deployments should ensure that metrics remain available even when nodes scale up rapidly, as these metrics provide essential visibility into performance.

Useful metrics include:

  • cache.fs_hit / cache.fs_miss — Show the effectiveness of the local file-system cache
  • cache.fs_size / cache.fs_free — Indicate cache usage and cleanup frequency
  • http_server.req_end (status tags) — Reveal the distribution of successful and error responses
  • Rendering cache metrics — When using a shared cache backend

Summary

HTTP 503 service unavailable is the surface signal Document Engine emits when a node’s internal queue is full due to overload. Situations involving new or frequently restarted nodes, repeated work across multiple nodes, or limited cache warmth can increase the likelihood of overload. Ensuring adequate caching, maintaining observability, and using appropriate client-side throttling reduces load pressure and improves resiliency during peak periods.

If overload errors persist, contact Support with structured logs (LOG_STRUCTURED=true) from a peak period and your deployment configuration.