Metrics reference
This page lists the internal metrics Document Engine exports.
Document Engine uses the DogStatsD protocol(opens in a new tab) format, which is a variant of StatsD. It sends metric updates to a compatible monitoring agent when an event happens, such as sending an HTTP response, or when it collects a periodic measurement, such as memory usage.
The agent aggregates metrics in fixed time windows and forwards them to your monitoring system for storage and analysis. Aggregation depends on the metric type and the agent implementation. For example, Telegraf(opens in a new tab) and the CloudWatch agent(opens in a new tab) can aggregate metrics differently.
Refer to the integration guide to learn how to export Document Engine metrics in different environments and deployment setups.
Metric types
Document Engine exports three metric types:
- Counters — Each update increases a counter by the reported value.
- Example: A file system cache hit increments a counter every time Document Engine finds an item in the cache.
- Gauges — Each update reports the latest value of a measurement.
- Example: Memory usage.
- Timings — Each update reports how long an event took.
- Example: HTTP request duration.
- Agents usually aggregate timings into count, minimum, maximum, mean, percentiles, and similar values.
Tags
Each metric update includes tags in addition to the metric name and value. Use these tags to group and filter measurements.
These tags are attached to every metric exported by Document Engine:
| Tag | Description |
|---|---|
| host | Hostname of the Document Engine container |
| node | Unique ID of the Document Engine node in the cluster |
| family | Always set to pspdfkit-document-engine |
Metrics reference
The sections below describe the metrics Document Engine exports, grouped by category.
HTTP performance
| Name | Type | Unit |
|---|---|---|
| http_server.req_end | timing | millisecond |
This metric reports how long Document Engine takes to process an HTTP request and send the response.
| Tag | Description |
|---|---|
| status | HTTP response status |
| method | HTTP request method |
| group | standard for regular HTTP requests, or long_poll for long-polling requests |
When you analyze HTTP performance, separate metrics by the group tag and focus on group=standard. Long-polling requests stay open by design, so their durations don’t reflect performance.
PostgreSQL performance
| Name | Type | Unit |
|---|---|---|
| pg_client.query | timing | millisecond |
| pg_client.queue | timing | millisecond |
| pg_client.decode | timing | millisecond |
| pg_client.result_size | gauge | - |
These metrics describe SQL query performance between Document Engine and PostgreSQL:
pg_client.query— Time spent executing the querypg_client.queue— Time spent waiting for a database connection from the poolpg_client.decode— Time spent decoding the query resultpg_client.result_size— Number of rows returned per query
To estimate total database query time, add pg_client.query, pg_client.queue, and pg_client.decode.
| Tag | Description |
|---|---|
| result | success or error, depending on whether the query succeeded |
| command | SQL command that ran: select, update, delete, insert, begin, commit, or rollback |
| error_code | PostgreSQL error code. Present only when result is error |
| severity | Error severity. Present only when result is error |
Asset storage
| Name | Type | Unit |
|---|---|---|
| assets.fetch_asset | timing | millisecond |
| assets.store_asset | timing | millisecond |
These metrics track how long it takes to fetch or store an asset in asset storage. Document Engine fetches an asset from storage only if it isn’t already in the file system cache.
| Tag | Description |
|---|---|
| result | success or error, depending on whether the operation succeeded |
File system cache
| Name | Type | Unit |
|---|---|---|
| cache.fs_hit | counter | - |
| cache.fs_miss | counter | - |
| cache.fs_size | gauge | byte |
| cache.fs_free | timing | millisecond |
These metrics describe the file system cache for document source files:
cache.fs_hitandcache.fs_misscount cache hits and misses.cache.fs_sizereports the current cache size.cache.fs_freereports how long it takes to clear a full cache.
The cache size is limited by the ASSET_STORAGE_CACHE_SIZE configuration option.
In-memory cache
| Name | Type | Unit |
|---|---|---|
| cache.memory_hit | counter | - |
| cache.memory_miss | counter | - |
These metrics describe the in-memory cache for PDF metadata:
cache.memory_hitcounts cache hits.cache.memory_misscounts cache misses.
Redis cache
| Name | Type | Unit |
|---|---|---|
| cache.redis_hit | timing | millisecond |
| cache.redis_miss | timing | millisecond |
| cache.redis_set | timing | millisecond |
| cache.redis_error | timing | millisecond |
These metrics describe the optional Redis cache used to share rendered-page cache entries across multiple Document Engine instances.
cache.redis_hitreports how long it takes to fetch an item from Redis on a cache hit.cache.redis_missreports how long the Redis request takes on a cache miss.cache.redis_setreports how long it takes to store an item in Redis.cache.redis_errorreports how long a failed Redis operation takes.
| Tag | Description |
|---|---|
| op | Redis operation that ran. Present only on cache.redis_error |
Remote documents
| Name | Type | Unit |
|---|---|---|
| remote_doc.response_start | timing | millisecond |
| remote_doc.response_end | timing | millisecond |
These metrics describe how long Document Engine takes to fetch documents from remote URLs:
remote_doc.response_start— Time between sending the request and receiving the first byteremote_doc.response_end— Time spent transferring data after the remote server starts responding
To estimate total remote document fetch time, add both metrics.
| Tag | Description |
|---|---|
| result | success, error, or timeout, depending on the fetch result |
Document conversion
| Name | Type | Unit |
|---|---|---|
| document_conversion.convert | timing | millisecond |
This metric reports how long Office document conversion takes.
| Tag | Description |
|---|---|
| result | success or error, depending on the conversion result |
PDF processing
| Name | Type | Unit |
|---|---|---|
| pspdfkitd.queue | timing | millisecond |
| pspdfkitd.exec | timing | millisecond |
These metrics cover Document Engine operations that work with PDFs, including rendering, content extraction, and preparing PDFs for download.
pspdfkitd.queuereports how long an operation waits for an available worker.pspdfkitd.execreports how long the operation takes once it starts.
| Tag | Description |
|---|---|
| request | PDF operation that ran |
Signing service
| Name | Type | Unit |
|---|---|---|
| signing_service.sign | timing | millisecond |
This metric reports how long the signing service takes to respond to a signing request.
| Tag | Description |
|---|---|
| result | success or error, depending on whether the signing request succeeded |
Instant
| Metric Name | Metric Type | Unit | Description |
|---|---|---|---|
| layer.sync.hooks | timing | milliseconds | Duration spent across the hooks registered for sync operations |
| layer.sync.total | timing | milliseconds | Total duration of the sync operation |
These metrics track Instant sync. For failed operations, Document Engine emits only layer.sync.total.
| Tag | Description |
|---|---|
| result | Present only on layer.sync.total. success or error, depending on the sync result |
Memory total
| Name | Type | Unit |
|---|---|---|
| vm_memory.total | gauge | byte |
This metric reports the total memory allocated by the Document Engine process. The total memory used by the container is usually higher, because other processes also run inside the container.
Compute resources utilization
| Name | Type | Unit |
|---|---|---|
| vm_scheduler_wall_time.active | timing | millisecond |
| vm_scheduler_wall_time.total | timing | millisecond |
These metrics describe Erlang VM scheduler usage:
vm_scheduler_wall_time.active— Time the Erlang VM spent actively doing work during the last intervalvm_scheduler_wall_time.total— Total Erlang VM uptime during the same interval
If you divide active time by total time, the result shows how much of the assigned compute capacity Document Engine used.
These metrics describe only the Document Engine process. Container-level CPU utilization can differ because other processes may also be running.
| Tag | Description |
|---|---|
| scheduler_number | Internal Document Engine scheduler number |
Document Engine starts as many schedulers as there are logical CPU cores available. In most cases, average these metrics across schedulers.
Clustering
When clustering is enabled and Prometheus export is configured, Document Engine emits cluster health and routing metrics.
| Name | Type | Unit |
|---|---|---|
| cluster_ring_size | gauge | - |
| cluster_peer_changes_total | counter | - |
| cluster_redirect_total | counter | - |
| cluster_redirect_retries_total | counter | - |
| cluster_task_supervisor_overloaded_total | counter | - |
These metrics help you monitor cluster membership and document-routing behavior:
cluster_ring_size— Current number of routable nodes in the cluster.cluster_peer_changes_total— Count of cluster membership changes.cluster_redirect_total— Count of routing redirects by outcome.cluster_redirect_retries_total— Count of retries caused by stale routing information.cluster_task_supervisor_overloaded_total— Count of times the clustering task supervisor reached its child-process limit.
| Tag | Description |
|---|---|
| erlang_node | Full Erlang node name — for example, pssync@10.2.130.162 |
| event | Present on cluster_peer_changes_total; values include added and removed |
| outcome | Present on cluster_redirect_total; identifies the redirect outcome |
Use cluster_ring_size as the first health signal. For example, alert when it stays below the expected replica count for several minutes. If routing looks wrong, inspect cluster_redirect_total and cluster_redirect_retries_total first.
HTTP/2 shared rendering telemetry
When you enable HTTP/2 shared rendering, Document Engine emits additional telemetry events for the shared rendering path. These events use the same monitoring pipeline as the metrics above.
| Event | Meaning |
|---|---|
http2.stream.worker_checkout | Document Engine checked out a rendering worker for a shared session |
http2.stream.render_tile | Document Engine rendered a tile through the shared rendering path |
http2.stream.render_process.new | Document Engine started a new connection-scoped rendering process |
http2.stream.render_process.exists | A tile request reused an existing rendering process on the connection |
When clustering is also enabled, the rendering process runs on the owner node for the document. Refer to the HTTP/2 shared rendering guide for details.