Asset storage configuration

Document Engine supports multiple storage backends for PDFs and other assets. This guide is organized around four questions:

What storage options are available?
Which backend should you choose?
How do you configure each backend?
How do you work across multiple backends or with existing file stores?

Storage options overview

Document Engine supports four asset storage modes: running without storage, the built-in database-backed store, S3-compatible object storage, and Azure Blob Storage. The sections below describe each option.

Without storage

Document Engine doesn’t require asset storage to operate. Operating without storage means only stateless operations are available, primarily Build API.

Built-in asset storage

By default, Document Engine stores assets as Binary Large Objects (BLOBs) in the database.

This is the simplest option to run, and it can work well if your document set is relatively small and stable. For production environments with larger PDFs, we recommend using object storage(opens in a new tab). If you plan to support large documents, refer to our large documents guide.

S3 object storage

Document Engine can store assets in Amazon S3(opens in a new tab) and S3-compatible object storage services.

Use this when you want your deployment to be backed by object storage, including AWS S3 or S3-compatible solutions, such as Garage, Ceph, SeaweedFS, or Google Cloud Storage interoperability mode.

Azure Blob Storage

Document Engine can also store assets in Azure Blob Storage(opens in a new tab).

Which storage backend should I use?

The right backend depends mostly on document size, change frequency, and how much infrastructure you want to operate.

If you have a relatively stable number of PDF files that change only occasionally and are small, you can safely use built-in storage. The main advantages are:

You don’t need separate object storage infrastructure.
Backing up the Document Engine PostgreSQL instance also backs up your assets.

The same coupling also has operational implications. Large or frequently changing assets increase the size of the primary database, write-ahead log, replication stream, and database backups. PostgreSQL stores large bytea values through TOAST(opens in a new tab), which transparently compresses and stores oversized values out of line. TOAST makes large values work inside normal rows, but it doesn’t turn PostgreSQL into object storage: The data is still written, replicated, vacuumed, backed up, and restored as part of the database.

For larger and more frequently changing files, we recommend using object storage, typically the S3 object storage, which provides more efficient support for concurrent uploads and downloads.

Using the S3-compatible backend means you need a separate backup routine, but you should consider that:

As Document Engine stores files by their SHA checksums, most of the time, a daily, incremental backup will suffice.
Unless you use a backup solution that orchestrates a point-in-time backup across different storage types, such as AWS Backup, schedule the asset storage backup right after the PostgreSQL database backup to avoid data drifting between the two.

Configuring storage backends

The ASSET_STORAGE_BACKEND environment variable selects which backend Document Engine uses. The sections below cover the configuration for each supported value.

Built-in asset storage

Set ASSET_STORAGE_BACKEND to built-in to use the built-in asset storage.

When deploying with Helm, use the pspdfkit.storage.assetStorageBackend value.

S3 object storage

Set ASSET_STORAGE_BACKEND to s3. Other configuration options depend on whether you’re using AWS S3 or an S3-compatible storage provider.

Here are the available parameters as Helm values(opens in a new tab):

assetStorage:
  # `ASSET_STORAGE_BACKEND`: `built-in`, `s3` or `azure`
  assetStorageBackend: s3
  # S3 backend storage settings, in case `pspdfkit.storage.assetStorageBackend` is set to `s3`
  s3:
    # `ASSET_STORAGE_S3_ACCESS_KEY_ID`
    accessKeyId: "<...>"
    # `ASSET_STORAGE_S3_SECRET_ACCESS_KEY`
    secretAccessKey: "<...>"
    # `ASSET_STORAGE_S3_BUCKET`
    bucket: "<...>"
    # `ASSET_STORAGE_S3_REGION`
    region: "<...>"
    # `ASSET_STORAGE_S3_HOST`
    #host: "os.local"
    # `ASSET_STORAGE_S3_PORT`
    port: 443
    # `ASSET_STORAGE_S3_SCHEME`
    #scheme: "https://"
    # External secret name
    #externalSecretName: ""

AWS S3

When running on AWS S3, you must set ASSET_STORAGE_S3_BUCKET and ASSET_STORAGE_S3_REGION.

If you’re running on AWS, Document Engine will try to resolve access credentials with the following precedence:

ASSET_STORAGE_S3_ACCESS_KEY_ID and ASSET_STORAGE_S3_SECRET_ACCESS_KEY
IAM Role for Service Accounts(opens in a new tab)
ECS Task Role(opens in a new tab)
EC2 Instance Role(opens in a new tab)

We don’t recommend using credentials directly. Prefer role-based permission management when your platform supports it.

AWS S3 bucket and key policy

If you’re using AWS S3, the IAM identity used by Document Engine needs the following permissions:

s3:ListBucket on the configured bucket
s3:PutObject on all objects in the bucket (<bucket-arn>/*)
s3:GetObjectAcl on all objects in the bucket (<bucket-arn>/*)
s3:GetObject on all objects in the bucket (<bucket-arn>/*)
s3:DeleteObject on all objects in the bucket (<bucket-arn>/*)

If you’re using server-side encryption with Key Management Service, the following actions must be allowed on the encryption key:

kms:Decrypt
kms:Encrypt
kms:GenerateDataKey

Other S3-compatible storage providers

When using an object storage provider other than Amazon S3(opens in a new tab), you must always set ASSET_STORAGE_S3_ACCESS_KEY_ID and ASSET_STORAGE_S3_SECRET_ACCESS_KEY.

You can also configure the following options:

ASSET_STORAGE_S3_HOST — Host name of the storage service
ASSET_STORAGE_S3_PORT — Port used to access the storage service. The default port is 443
ASSET_STORAGE_S3_SCHEME — URL scheme used when accessing the service, either http:// or https://. The default is https://

For more details about using Google Cloud Storage(opens in a new tab) as the storage backend, take a look at the Google Cloud Storage interoperability(opens in a new tab) guide.

Required APIs for S3-compatible providers

When using an object storage provider other than Amazon S3, the following table describes the minimum APIs Document Engine’s S3 backend uses in the runtime asset-storage path: object-level read, write, copy, metadata, delete, and multipart upload operations. Bucket-listing and ACL APIs aren’t required for those runtime storage operations. If another section of this guide shows a broader Amazon S3 IAM policy that includes permissions such as s3:ListBucket or s3:GetObjectAcl, treat those as AWS-specific or optional permissions for deployments that also perform bucket-level checks or ACL inspection; they aren’t required for the core runtime path documented here.

S3 API	Why Document Engine needs it
`PutObject`	Writes small objects directly, specifically the `LAYOUT_VERSION` marker used to initialize and migrate the storage layout.
`CreateMultipartUpload`	Starts multipart uploads for stored assets. Document Engine uses streamed multipart uploads for PDFs and attachments rather than single-request uploads.
`UploadPart`	Uploads each multipart chunk for PDFs and attachments during asset storage.
`CompleteMultipartUpload`	Finalizes multipart uploads after all parts have been uploaded.
`GetObject`	Reads stored objects back from S3. Used for direct fetches and for downloading asset contents to local temp files.
`HeadObject`	Checks whether an object exists and reads metadata such as `Content-Length` before download. Also used before trash/restore flows.
`CopyObject`	Moves assets into and out of the trash area by copying `sources/...` or `attachments/...` objects to `trash/...` and back.
`DeleteObject`	Deletes original objects after trashing, deletes trashed objects during cleanup, and handles unsafe direct deletes.

Example S3-compatible providers

Common S3-compatible providers include Garage, Ceph, and SeaweedFS.

Use the standard S3-compatible configuration shown above and set the provider-specific values for:

ASSET_STORAGE_S3_HOST
ASSET_STORAGE_S3_PORT
ASSET_STORAGE_S3_SCHEME
ASSET_STORAGE_S3_ACCESS_KEY_ID
ASSET_STORAGE_S3_SECRET_ACCESS_KEY
ASSET_STORAGE_S3_BUCKET
ASSET_STORAGE_S3_REGION, if your provider requires it

For example, in docker-compose.yml:

environment:
  ASSET_STORAGE_BACKEND: s3
  ASSET_STORAGE_S3_BUCKET: <bucket name>
  ASSET_STORAGE_S3_ACCESS_KEY_ID: <access key>
  ASSET_STORAGE_S3_SECRET_ACCESS_KEY: <secret access key>
  ASSET_STORAGE_S3_SCHEME: http://
  ASSET_STORAGE_S3_HOST: <s3-compatible host>
  ASSET_STORAGE_S3_PORT: 9000
  ASSET_STORAGE_S3_REGION: us-east-1

Refer to your storage provider’s documentation for the exact endpoint, port, credential, and region requirements.

Azure Blob Storage

To configure Azure Blob Storage as the default asset store, set ASSET_STORAGE_BACKEND to azure in your Document Engine configuration.

You also need to provide the following configuration options:

AZURE_STORAGE_ACCOUNT_NAME
AZURE_STORAGE_ACCOUNT_KEY
AZURE_STORAGE_DEFAULT_CONTAINER

Alternatively, instead of providing AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY, you can supply a connection string by setting AZURE_STORAGE_ACCOUNT_CONNECTION_STRING.

Here they are as Helm values(opens in a new tab):

assetStorage:
  # `ASSET_STORAGE_BACKEND`: `built-in`, `s3` or `azure`
  assetStorageBackend: azure
  # Azure backend storage settings, in case `pspdfkit.storage.assetStorageBackend` is set to `azure`
  azure:
    # `AZURE_STORAGE_ACCOUNT_NAME`
    accountName: "<...>"
    # `AZURE_STORAGE_ACCOUNT_KEY`
    accountKey: "<...>"
    # `AZURE_STORAGE_DEFAULT_CONTAINER`
    container: "<...>"
    # `AZURE_STORAGE_ACCOUNT_CONNECTION_STRING`, takes priority over `accountName` and `accountKey`
    #connectionString: ""
    # `AZURE_STORAGE_API_URL` for custom endpoints
    #apiUrl: ""
    # External secret name
    #externalSecretName: ""

Azurite

We recommend using Azurite(opens in a new tab) with Document Engine in development and test environments when ASSET_STORAGE_BACKEND is set to azure.

When using Azurite, you can configure the URL for the Azure Blob Storage service by setting AZURE_STORAGE_API_URL to the address of the Azurite deployment.

Azurite(opens in a new tab) is an open source emulator from Microsoft for testing Azure Blob Storage actions in development and test environments. Our recommended solution when using Azure Blob Storage in production is to use Azurite in development to get closer to dev/prod parity.

To run Azurite in Docker, run the following commands:

docker pull mcr.microsoft.com/azure-storage/azurite
docker run -p 10000:10000 mcr.microsoft.com/azure-storage/azurite

You can then configure Document Engine to use the default storage account on the Azurite instance. Learn more about the default storage account from Microsoft here(opens in a new tab).

In your Docker Compose file, for example, you can have this:

environment:
  AZURE_STORAGE_ACCOUNT_NAME: devstoreaccount1
  AZURE_STORAGE_ACCOUNT_KEY: Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==
  AZURE_STORAGE_DEFAULT_CONTAINER: pspdfkit-dev
  AZURE_STORAGE_API_URL: http://localhost:10000/devstoreaccount1

Upload timeouts for object storage

All upload operations to S3-compatible object storage and Azure Blob Storage have a timeout of 30 seconds.

Working with multiple storage backends

This section covers per-document storage, fallbacks, per-document uploads, and document-level asset migration.

In addition to configuring a default storage backend for all documents with ASSET_STORAGE_BACKEND, you can upload documents to specific storage backends as long as those backends are enabled as fallbacks in your Document Engine configuration.

Enabling fallbacks for asset storage

To use multiple asset stores in your Document Engine instance, configure the main asset store by setting ASSET_STORAGE_BACKEND to built-in, azure, or s3.

Once configured, the backend set in ASSET_STORAGE_BACKEND will be used as the default storage for all documents. To store a specific document in a different backend, that backend must also be enabled as a fallback.

For example, if ASSET_STORAGE_BACKEND is set to azure, all documents and their assets will be stored in Azure Blob Storage by default. However, you can configure a specific document to be stored in AWS S3 when uploading the document. To do this, S3 needs to be enabled as a fallback asset store.

To enable fallback asset storage, set ENABLE_ASSET_STORAGE_FALLBACK to true. After that, enable the specific fallbacks you want by setting any of the following to true:

ENABLE_ASSET_STORAGE_FALLBACK_POSTGRES
ENABLE_ASSET_STORAGE_FALLBACK_S3
ENABLE_ASSET_STORAGE_FALLBACK_AZURE

In addition to enabling the specific fallback, you also need to set the relevant configuration options for every backend you enable. For example, if you enable S3 as an asset fallback, you need to provide the relevant configuration for S3, including the default S3 bucket.

Enabling and using fallback storage backends introduces a slight decrease in performance when fetching assets.

Uploading documents to different storage

You can specify the storage option when uploading a document to Document Engine. This way, documents can be stored in different backends, as long as the selected backend is enabled as either the default storage or as a fallback. Learn more about the available options from our API reference.

Here’s an example of a request uploading a document and specifying the S3 bucket to use for that document:

// With Document Engine running on `http://localhost:5000`.

curl -X POST http://localhost:5000/api/documents \
    -H "Authorization: Token token=secret" \
    -H "Content-Type: multipart/form-data" \
    -F 'file=@blank.pdf' \
    -F 'storage={
      "backend": "s3",
      "bucketName": "a-different-bucket-from-default-s3-bucket",
      "bucketRegion": "us-west-2"
      }'

Migrating a document’s assets to different storage

You can migrate all the assets associated with a document (PDFs, images, file attachments, etc.) and all its layers to another storage backend by making a request to /api/documents/{documentId}/migrate_assets.

Here’s an example curl request to migrate a document’s assets to the built-in storage:

// With Document Engine running on `http://localhost:5000`.

curl -X POST http://localhost:5000/api/documents/{documentID}/migrate_assets \
    -H "Authorization: Token token=secret" \
    -H "Content-type: application/json" \
    -d '{
      "storage": {
        "backend": "built-in"
      }
      }'

Learn more about migrating assets from our API reference.

Multiple S3 buckets

Documents can be uploaded (or migrated) to many different S3 buckets so long as the instance role associated with your Document Engine nodes (or the AWS credentials configured for Document Engine) has the required permissions to access all the buckets you intend to upload or migrate documents to.

This feature is currently only available for S3. With Azure Blob Storage, all documents need to be stored in the default configured storage account, AZURE_STORAGE_ACCOUNT_NAME.

Migrating between default storage backends

It’s possible to migrate from one default storage backend to another by executing the migration command described below. To prevent data loss, a migration doesn’t delete files from the original backend.

Asset storage backend migrations are incremental. You can interrupt the migration process at any time and resume it later. This is useful when you have many documents and want to perform the migration only during periods of low load. You can perform the migration while Document Engine is running.

Before you start the migration process, make sure to set ENABLE_ASSET_STORAGE_FALLBACK to true and specify the storage fallbacks you want enabled. This enables Document Engine to serve assets that haven’t yet been migrated from the old backend.

Remember to set fallback storage back to false when you’ve finished migrating all documents, as it introduces a slight decrease in asset fetch performance.

At any point, you can inspect how many documents are stored in each backend from the Storage tab in the Document Engine dashboard.

All configuration options mentioned in this section are also configurable in the Helm chart values(opens in a new tab).

Migrating to S3 from built-in storage

To migrate from built-in asset storage to S3, follow these steps:

Set ENABLE_ASSET_STORAGE_FALLBACK to true.
Enable the built-in database storage as a fallback by setting ENABLE_ASSET_STORAGE_FALLBACK_POSTGRES to true.
Set ASSET_STORAGE_BACKEND to s3 and configure the rest of the S3 options.
Run the migration script by executing pspdfkit assets:migrate:from-built-in-to-s3 in the Document Engine container.
- If you use docker-compose, run the following command in the directory where you have your docker-compose.yml file: docker-compose run pspdfkit pspdfkit assets:migrate:from-built-in-to-s3.
- If you don’t use docker-compose, first find the name of the Document Engine container using docker ps -a. This will list all running containers and their names. Then run the following command, replacing <container name> with the actual Document Engine container name: docker exec <container name> pspdfkit assets:migrate:from-built-in-to-s3.
When all your documents have been migrated, set ENABLE_ASSET_STORAGE_FALLBACK back to false.

Migrating to built-in storage from S3

To migrate from S3 asset storage to built-in storage, follow these steps:

Set ENABLE_ASSET_STORAGE_FALLBACK to true.
Enable S3 asset storage as a fallback by setting ENABLE_ASSET_STORAGE_FALLBACK_S3 to true.
Set ASSET_STORAGE_BACKEND to built-in. Do not remove any of the S3 configuration options.
Run the migration script by executing pspdfkit assets:migrate:from-s3-to-built-in in the Document Engine container.
- If you use docker-compose, run the following command in the directory where you have your docker-compose.yml file: docker-compose run pspdfkit pspdfkit assets:migrate:from-s3-to-built-in.
- If you don’t use docker-compose, first find the name of the Document Engine container using docker ps -a. This will list all running containers and their names. Then run the following command, replacing <container name> with the actual Document Engine container name: docker exec <container name> pspdfkit assets:migrate:from-s3-to-built-in.
When all your documents have been migrated, set ENABLE_ASSET_STORAGE_FALLBACK back to false and remove all S3 configuration options.

Migrating to and from Azure Blob Storage

We currently don’t support batch migrations of assets to or from Azure Blob Storage. That said, you can still migrate an individual document’s assets to or from Azure. Learn more about this here.

Working with existing file storage in your infrastructure

If you already have a storage solution for PDF files in your infrastructure, Document Engine can integrate with it as long as the PDF files can be accessed via an HTTP endpoint. When integrating Document Engine and the file store, you’ll need to add documents from a URL.

All PDF URLs should be considered permalinks, as Nutrient will always fetch the file when needed, keeping only a local cached copy that can expire at any time.

Avoid accepting arbitrary user input as a URL for a PDF. Document Engine 1.17 and later blocks local, private, link-local, and cloud metadata destinations by default for customer-controlled remote URL fetches. If your deployment intentionally fetches from internal file stores, configure REMOTE_URL_FETCH_POLICY with explicit allowed hosts or CIDRs.

To achieve the best possible performance, ensure Document Engine instances and the file store sit in the same network, whether physical or virtual. This minimizes latency and maximizes download speed.

As of version 2019.4, it’s possible to perform a document editing operation on a document with a remote URL, but the resulting PDF file will need to be stored with one of the supported storage strategies. If you need to copy the transformed file back to the file store, you’ll need to do that manually by fetching the transformed file first.

If your file store requires authentication, we recommend introducing an internal proxy. When adding a document with a URL, the URL would point to the proxy endpoint, where your custom logic could support the required authentication options and redirect to the file store URL of the PDF file. For more information and sample code, visit the relevant guide article.

Asset storage configuration

Storage options overview

Without storage

Built-in asset storage

S3 object storage

Azure Blob Storage

Which storage backend should I use?

Configuring storage backends

Built-in asset storage

S3 object storage

AWS S3

AWS S3 bucket and key policy

Other S3-compatible storage providers

Required APIs for S3-compatible providers

Example S3-compatible providers

Azure Blob Storage

Azurite

Upload timeouts for object storage

Working with multiple storage backends

Enabling fallbacks for asset storage

Uploading documents to different storage

Migrating a document’s assets to different storage

Multiple S3 buckets

Migrating between default storage backends

Migrating to S3 from built-in storage

Migrating to built-in storage from S3

Migrating to and from Azure Blob Storage

Working with existing file storage in your infrastructure

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.