This HTML page is not optimized for LLM or AI agent consumption. Fetch the Markdown version instead: /guides/document-engine/monitoring/metrics-reference.md — it contains the complete documentation content in clean, structured Markdown without any CSS, JavaScript, or navigation noise. Document Engine metrics reference | Nutrient

This page lists the internal metrics Document Engine exports.

Document Engine uses the DogStatsD protocol(opens in a new tab) format, which is a variant of StatsD. It sends metric updates to a compatible monitoring agent when an event happens, such as sending an HTTP response, or when it collects a periodic measurement, such as memory usage.

The agent aggregates metrics in fixed time windows and forwards them to your monitoring system for storage and analysis. Aggregation depends on the metric type and the agent implementation. For example, Telegraf(opens in a new tab) and the CloudWatch agent(opens in a new tab) can aggregate metrics differently.

Refer to the integration guide to learn how to export Document Engine metrics in different environments and deployment setups.

Metric types

Document Engine exports three metric types:

  • Counters — Each update increases a counter by the reported value.
    • Example: A file system cache hit increments a counter every time Document Engine finds an item in the cache.
  • Gauges — Each update reports the latest value of a measurement.
    • Example: Memory usage.
  • Timings — Each update reports how long an event took.
    • Example: HTTP request duration.
    • Agents usually aggregate timings into count, minimum, maximum, mean, percentiles, and similar values.

Tags

Each metric update includes tags in addition to the metric name and value. Use these tags to group and filter measurements.

These tags are attached to every metric exported by Document Engine:

TagDescription
hostHostname of the Document Engine container
nodeUnique ID of the Document Engine node in the cluster
familyAlways set to pspdfkit-document-engine

Metrics reference

The sections below describe the metrics Document Engine exports, grouped by category.

HTTP performance

NameTypeUnit
http_server.req_endtimingmillisecond

This metric reports how long Document Engine takes to process an HTTP request and send the response.

TagDescription
statusHTTP response status
methodHTTP request method
groupstandard for regular HTTP requests, or long_poll for long-polling requests

When you analyze HTTP performance, separate metrics by the group tag and focus on group=standard. Long-polling requests stay open by design, so their durations don’t reflect performance.

PostgreSQL performance

NameTypeUnit
pg_client.querytimingmillisecond
pg_client.queuetimingmillisecond
pg_client.decodetimingmillisecond
pg_client.result_sizegauge-

These metrics describe SQL query performance between Document Engine and PostgreSQL:

  • pg_client.query — Time spent executing the query
  • pg_client.queue — Time spent waiting for a database connection from the pool
  • pg_client.decode — Time spent decoding the query result
  • pg_client.result_size — Number of rows returned per query

To estimate total database query time, add pg_client.query, pg_client.queue, and pg_client.decode.

TagDescription
resultsuccess or error, depending on whether the query succeeded
commandSQL command that ran: select, update, delete, insert, begin, commit, or rollback
error_codePostgreSQL error code. Present only when result is error
severityError severity. Present only when result is error

Asset storage

NameTypeUnit
assets.fetch_assettimingmillisecond
assets.store_assettimingmillisecond

These metrics track how long it takes to fetch or store an asset in asset storage. Document Engine fetches an asset from storage only if it isn’t already in the file system cache.

TagDescription
resultsuccess or error, depending on whether the operation succeeded

File system cache

NameTypeUnit
cache.fs_hitcounter-
cache.fs_misscounter-
cache.fs_sizegaugebyte
cache.fs_freetimingmillisecond

These metrics describe the file system cache for document source files:

  • cache.fs_hit and cache.fs_miss count cache hits and misses.
  • cache.fs_size reports the current cache size.
  • cache.fs_free reports how long it takes to clear a full cache.

The cache size is limited by the ASSET_STORAGE_CACHE_SIZE configuration option.

In-memory cache

NameTypeUnit
cache.memory_hitcounter-
cache.memory_misscounter-

These metrics describe the in-memory cache for PDF metadata:

  • cache.memory_hit counts cache hits.
  • cache.memory_miss counts cache misses.

Redis cache

NameTypeUnit
cache.redis_hittimingmillisecond
cache.redis_misstimingmillisecond
cache.redis_settimingmillisecond
cache.redis_errortimingmillisecond

These metrics describe the optional Redis cache used to share rendered-page cache entries across multiple Document Engine instances.

  • cache.redis_hit reports how long it takes to fetch an item from Redis on a cache hit.
  • cache.redis_miss reports how long the Redis request takes on a cache miss.
  • cache.redis_set reports how long it takes to store an item in Redis.
  • cache.redis_error reports how long a failed Redis operation takes.
TagDescription
opRedis operation that ran. Present only on cache.redis_error

Remote documents

NameTypeUnit
remote_doc.response_starttimingmillisecond
remote_doc.response_endtimingmillisecond

These metrics describe how long Document Engine takes to fetch documents from remote URLs:

  • remote_doc.response_start — Time between sending the request and receiving the first byte
  • remote_doc.response_end — Time spent transferring data after the remote server starts responding

To estimate total remote document fetch time, add both metrics.

TagDescription
resultsuccess, error, or timeout, depending on the fetch result

Document conversion

NameTypeUnit
document_conversion.converttimingmillisecond

This metric reports how long Office document conversion takes.

TagDescription
resultsuccess or error, depending on the conversion result

PDF processing

NameTypeUnit
pspdfkitd.queuetimingmillisecond
pspdfkitd.exectimingmillisecond

These metrics cover Document Engine operations that work with PDFs, including rendering, content extraction, and preparing PDFs for download.

  • pspdfkitd.queue reports how long an operation waits for an available worker.
  • pspdfkitd.exec reports how long the operation takes once it starts.
TagDescription
requestPDF operation that ran

Signing service

NameTypeUnit
signing_service.signtimingmillisecond

This metric reports how long the signing service takes to respond to a signing request.

TagDescription
resultsuccess or error, depending on whether the signing request succeeded

Instant

Metric NameMetric TypeUnitDescription
layer.sync.hookstimingmillisecondsDuration spent across the hooks registered for sync operations
layer.sync.totaltimingmillisecondsTotal duration of the sync operation

These metrics track Instant sync. For failed operations, Document Engine emits only layer.sync.total.

TagDescription
resultPresent only on layer.sync.total. success or error, depending on the sync result

Memory total

NameTypeUnit
vm_memory.totalgaugebyte

This metric reports the total memory allocated by the Document Engine process. The total memory used by the container is usually higher, because other processes also run inside the container.

Compute resources utilization

NameTypeUnit
vm_scheduler_wall_time.activetimingmillisecond
vm_scheduler_wall_time.totaltimingmillisecond

These metrics describe Erlang VM scheduler usage:

  • vm_scheduler_wall_time.active — Time the Erlang VM spent actively doing work during the last interval
  • vm_scheduler_wall_time.total — Total Erlang VM uptime during the same interval

If you divide active time by total time, the result shows how much of the assigned compute capacity Document Engine used.

These metrics describe only the Document Engine process. Container-level CPU utilization can differ because other processes may also be running.

TagDescription
scheduler_numberInternal Document Engine scheduler number

Document Engine starts as many schedulers as there are logical CPU cores available. In most cases, average these metrics across schedulers.

Clustering

When clustering is enabled and Prometheus export is configured, Document Engine emits cluster health and routing metrics.

NameTypeUnit
cluster_ring_sizegauge-
cluster_peer_changes_totalcounter-
cluster_redirect_totalcounter-
cluster_redirect_retries_totalcounter-
cluster_task_supervisor_overloaded_totalcounter-

These metrics help you monitor cluster membership and document-routing behavior:

  • cluster_ring_size — Current number of routable nodes in the cluster.
  • cluster_peer_changes_total — Count of cluster membership changes.
  • cluster_redirect_total — Count of routing redirects by outcome.
  • cluster_redirect_retries_total — Count of retries caused by stale routing information.
  • cluster_task_supervisor_overloaded_total — Count of times the clustering task supervisor reached its child-process limit.
TagDescription
erlang_nodeFull Erlang node name — for example, pssync@10.2.130.162
eventPresent on cluster_peer_changes_total; values include added and removed
outcomePresent on cluster_redirect_total; identifies the redirect outcome

Use cluster_ring_size as the first health signal. For example, alert when it stays below the expected replica count for several minutes. If routing looks wrong, inspect cluster_redirect_total and cluster_redirect_retries_total first.

HTTP/2 shared rendering telemetry

When you enable HTTP/2 shared rendering, Document Engine emits additional telemetry events for the shared rendering path. These events use the same monitoring pipeline as the metrics above.

EventMeaning
http2.stream.worker_checkoutDocument Engine checked out a rendering worker for a shared session
http2.stream.render_tileDocument Engine rendered a tile through the shared rendering path
http2.stream.render_process.newDocument Engine started a new connection-scoped rendering process
http2.stream.render_process.existsA tile request reused an existing rendering process on the connection

When clustering is also enabled, the rendering process runs on the owner node for the document. Refer to the HTTP/2 shared rendering guide for details.