Asset storage configuration
Document Engine supports multiple storage backends for PDFs and other assets. This guide is organized around four questions:
- What storage options are available?
- Which backend should you choose?
- How do you configure each backend?
- How do you work across multiple backends or with existing file stores?
Storage options overview
Document Engine supports four asset storage modes: running without storage, the built-in database-backed store, S3-compatible object storage, and Azure Blob Storage. The sections below describe each option.
Without storage
Document Engine doesn’t require asset storage to operate. Operating without storage means only stateless operations are available, primarily Build API.
Built-in asset storage
By default, Document Engine stores assets as Binary Large Objects (BLOBs) in the database.
This is the simplest option to run, and it can work well if your document set is relatively small and stable. For production environments with larger PDFs, we recommend using object storage(opens in a new tab). If you plan to support large documents, refer to our large documents guide.
S3 object storage
Document Engine can store assets in Amazon S3(opens in a new tab) and S3-compatible object storage services.
Use this when you want your deployment to be backed by object storage, including AWS S3 or S3-compatible solutions, such as Garage, Ceph, SeaweedFS, or Google Cloud Storage interoperability mode.
Azure Blob Storage
Document Engine can also store assets in Azure Blob Storage(opens in a new tab).
Which storage backend should I use?
The right backend depends mostly on document size, change frequency, and how much infrastructure you want to operate.
If you have a relatively stable number of PDF files that change only occasionally and are small, you can safely use built-in storage. The main advantages are:
- You don’t need separate object storage infrastructure.
- Backing up the Document Engine PostgreSQL instance also backs up your assets.
The same coupling also has operational implications. Large or frequently changing assets increase the size of the primary database, write-ahead log, replication stream, and database backups. PostgreSQL stores large bytea values through TOAST(opens in a new tab), which transparently compresses and stores oversized values out of line. TOAST makes large values work inside normal rows, but it doesn’t turn PostgreSQL into object storage: The data is still written, replicated, vacuumed, backed up, and restored as part of the database.
For larger and more frequently changing files, we recommend using object storage, typically the S3 object storage, which provides more efficient support for concurrent uploads and downloads.
Using the S3-compatible backend means you need a separate backup routine, but you should consider that:
- As Document Engine stores files by their SHA checksums, most of the time, a daily, incremental backup will suffice.
- Unless you use a backup solution that orchestrates a point-in-time backup across different storage types, such as AWS Backup, schedule the asset storage backup right after the PostgreSQL database backup to avoid data drifting between the two.
Configuring storage backends
The ASSET_STORAGE_BACKEND environment variable selects which backend Document Engine uses. The sections below cover the configuration for each supported value.
Built-in asset storage
Set ASSET_STORAGE_BACKEND to built-in to use the built-in asset storage.
When deploying with Helm, use the pspdfkit.storage.assetStorageBackend value.
S3 object storage
Set ASSET_STORAGE_BACKEND to s3. Other configuration options depend on whether you’re using AWS S3 or an S3-compatible storage provider.
Here are the available parameters as Helm values(opens in a new tab):
assetStorage: # `ASSET_STORAGE_BACKEND`: `built-in`, `s3` or `azure` assetStorageBackend: s3 # S3 backend storage settings, in case `pspdfkit.storage.assetStorageBackend` is set to `s3` s3: # `ASSET_STORAGE_S3_ACCESS_KEY_ID` accessKeyId: "<...>" # `ASSET_STORAGE_S3_SECRET_ACCESS_KEY` secretAccessKey: "<...>" # `ASSET_STORAGE_S3_BUCKET` bucket: "<...>" # `ASSET_STORAGE_S3_REGION` region: "<...>" # `ASSET_STORAGE_S3_HOST` #host: "os.local" # `ASSET_STORAGE_S3_PORT` port: 443 # `ASSET_STORAGE_S3_SCHEME` #scheme: "https://" # External secret name #externalSecretName: ""AWS S3
When running on AWS S3, you must set ASSET_STORAGE_S3_BUCKET and ASSET_STORAGE_S3_REGION.
If you’re running on AWS, Document Engine will try to resolve access credentials with the following precedence:
ASSET_STORAGE_S3_ACCESS_KEY_IDandASSET_STORAGE_S3_SECRET_ACCESS_KEY- IAM Role for Service Accounts(opens in a new tab)
- ECS Task Role(opens in a new tab)
- EC2 Instance Role(opens in a new tab)
We don’t recommend using credentials directly. Prefer role-based permission management when your platform supports it.
AWS S3 bucket and key policy
If you’re using AWS S3, the IAM identity used by Document Engine needs the following permissions:
s3:ListBucketon the configured buckets3:PutObjecton all objects in the bucket (<bucket-arn>/*)s3:GetObjectAclon all objects in the bucket (<bucket-arn>/*)s3:GetObjecton all objects in the bucket (<bucket-arn>/*)s3:DeleteObjecton all objects in the bucket (<bucket-arn>/*)
If you’re using server-side encryption with Key Management Service, the following actions must be allowed on the encryption key:
kms:Decryptkms:Encryptkms:GenerateDataKey
Other S3-compatible storage providers
When using an object storage provider other than Amazon S3(opens in a new tab), you must always set ASSET_STORAGE_S3_ACCESS_KEY_ID and ASSET_STORAGE_S3_SECRET_ACCESS_KEY.
You can also configure the following options:
ASSET_STORAGE_S3_HOST— Host name of the storage serviceASSET_STORAGE_S3_PORT— Port used to access the storage service. The default port is443ASSET_STORAGE_S3_SCHEME— URL scheme used when accessing the service, eitherhttp://orhttps://. The default ishttps://
For more details about using Google Cloud Storage(opens in a new tab) as the storage backend, take a look at the Google Cloud Storage interoperability(opens in a new tab) guide.
Required APIs for S3-compatible providers
When using an object storage provider other than Amazon S3, the following table describes the minimum APIs Document Engine’s S3 backend uses in the runtime asset-storage path: object-level read, write, copy, metadata, delete, and multipart upload operations. Bucket-listing and ACL APIs aren’t required for those runtime storage operations. If another section of this guide shows a broader Amazon S3 IAM policy that includes permissions such as s3:ListBucket or s3:GetObjectAcl, treat those as AWS-specific or optional permissions for deployments that also perform bucket-level checks or ACL inspection; they aren’t required for the core runtime path documented here.
| S3 API | Why Document Engine needs it |
|---|---|
PutObject | Writes small objects directly, specifically the LAYOUT_VERSION marker used to initialize and migrate the storage layout. |
CreateMultipartUpload | Starts multipart uploads for stored assets. Document Engine uses streamed multipart uploads for PDFs and attachments rather than single-request uploads. |
UploadPart | Uploads each multipart chunk for PDFs and attachments during asset storage. |
CompleteMultipartUpload | Finalizes multipart uploads after all parts have been uploaded. |
GetObject | Reads stored objects back from S3. Used for direct fetches and for downloading asset contents to local temp files. |
HeadObject | Checks whether an object exists and reads metadata such as Content-Length before download. Also used before trash/restore flows. |
CopyObject | Moves assets into and out of the trash area by copying sources/... or attachments/... objects to trash/... and back. |
DeleteObject | Deletes original objects after trashing, deletes trashed objects during cleanup, and handles unsafe direct deletes. |
Example S3-compatible providers
Common S3-compatible providers include Garage, Ceph, and SeaweedFS.
Use the standard S3-compatible configuration shown above and set the provider-specific values for:
ASSET_STORAGE_S3_HOSTASSET_STORAGE_S3_PORTASSET_STORAGE_S3_SCHEMEASSET_STORAGE_S3_ACCESS_KEY_IDASSET_STORAGE_S3_SECRET_ACCESS_KEYASSET_STORAGE_S3_BUCKETASSET_STORAGE_S3_REGION, if your provider requires it
For example, in docker-compose.yml:
environment: ASSET_STORAGE_BACKEND: s3 ASSET_STORAGE_S3_BUCKET: <bucket name> ASSET_STORAGE_S3_ACCESS_KEY_ID: <access key> ASSET_STORAGE_S3_SECRET_ACCESS_KEY: <secret access key> ASSET_STORAGE_S3_SCHEME: http:// ASSET_STORAGE_S3_HOST: <s3-compatible host> ASSET_STORAGE_S3_PORT: 9000 ASSET_STORAGE_S3_REGION: us-east-1Refer to your storage provider’s documentation for the exact endpoint, port, credential, and region requirements.
Azure Blob Storage
To configure Azure Blob Storage as the default asset store, set ASSET_STORAGE_BACKEND to azure in your Document Engine configuration.
You also need to provide the following configuration options:
AZURE_STORAGE_ACCOUNT_NAMEAZURE_STORAGE_ACCOUNT_KEYAZURE_STORAGE_DEFAULT_CONTAINER
Alternatively, instead of providing AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCOUNT_KEY, you can supply a connection string by setting AZURE_STORAGE_ACCOUNT_CONNECTION_STRING.
Here they are as Helm values(opens in a new tab):
assetStorage: # `ASSET_STORAGE_BACKEND`: `built-in`, `s3` or `azure` assetStorageBackend: azure # Azure backend storage settings, in case `pspdfkit.storage.assetStorageBackend` is set to `azure` azure: # `AZURE_STORAGE_ACCOUNT_NAME` accountName: "<...>" # `AZURE_STORAGE_ACCOUNT_KEY` accountKey: "<...>" # `AZURE_STORAGE_DEFAULT_CONTAINER` container: "<...>" # `AZURE_STORAGE_ACCOUNT_CONNECTION_STRING`, takes priority over `accountName` and `accountKey` #connectionString: "" # `AZURE_STORAGE_API_URL` for custom endpoints #apiUrl: "" # External secret name #externalSecretName: ""Azurite
We recommend using Azurite(opens in a new tab) with Document Engine in development and test environments when ASSET_STORAGE_BACKEND is set to azure.
When using Azurite, you can configure the URL for the Azure Blob Storage service by setting AZURE_STORAGE_API_URL to the address of the Azurite deployment.
Azurite(opens in a new tab) is an open source emulator from Microsoft for testing Azure Blob Storage actions in development and test environments. Our recommended solution when using Azure Blob Storage in production is to use Azurite in development to get closer to dev/prod parity.
To run Azurite in Docker, run the following commands:
docker pull mcr.microsoft.com/azure-storage/azuritedocker run -p 10000:10000 mcr.microsoft.com/azure-storage/azuriteYou can then configure Document Engine to use the default storage account on the Azurite instance. Learn more about the default storage account from Microsoft here(opens in a new tab).
In your Docker Compose file, for example, you can have this:
environment: AZURE_STORAGE_ACCOUNT_NAME: devstoreaccount1 AZURE_STORAGE_ACCOUNT_KEY: Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw== AZURE_STORAGE_DEFAULT_CONTAINER: pspdfkit-dev AZURE_STORAGE_API_URL: http://localhost:10000/devstoreaccount1Upload timeouts for object storage
All upload operations to S3-compatible object storage and Azure Blob Storage have a timeout of 30 seconds.
Working with multiple storage backends
This section covers per-document storage, fallbacks, per-document uploads, and document-level asset migration.
In addition to configuring a default storage backend for all documents with ASSET_STORAGE_BACKEND, you can upload documents to specific storage backends as long as those backends are enabled as fallbacks in your Document Engine configuration.
Enabling fallbacks for asset storage
To use multiple asset stores in your Document Engine instance, configure the main asset store by setting ASSET_STORAGE_BACKEND to built-in, azure, or s3.
Once configured, the backend set in ASSET_STORAGE_BACKEND will be used as the default storage for all documents. To store a specific document in a different backend, that backend must also be enabled as a fallback.
For example, if ASSET_STORAGE_BACKEND is set to azure, all documents and their assets will be stored in Azure Blob Storage by default. However, you can configure a specific document to be stored in AWS S3 when uploading the document. To do this, S3 needs to be enabled as a fallback asset store.
To enable fallback asset storage, set ENABLE_ASSET_STORAGE_FALLBACK to true. After that, enable the specific fallbacks you want by setting any of the following to true:
ENABLE_ASSET_STORAGE_FALLBACK_POSTGRESENABLE_ASSET_STORAGE_FALLBACK_S3ENABLE_ASSET_STORAGE_FALLBACK_AZURE
In addition to enabling the specific fallback, you also need to set the relevant configuration options for every backend you enable. For example, if you enable S3 as an asset fallback, you need to provide the relevant configuration for S3, including the default S3 bucket.
Enabling and using fallback storage backends introduces a slight decrease in performance when fetching assets.
Uploading documents to different storage
You can specify the storage option when uploading a document to Document Engine. This way, documents can be stored in different backends, as long as the selected backend is enabled as either the default storage or as a fallback. Learn more about the available options from our API reference.
Here’s an example of a request uploading a document and specifying the S3 bucket to use for that document:
// With Document Engine running on `http://localhost:5000`.
curl -X POST http://localhost:5000/api/documents \ -H "Authorization: Token token=secret" \ -H "Content-Type: multipart/form-data" \ -F 'file=@blank.pdf' \ -F 'storage={ "backend": "s3", "bucketName": "a-different-bucket-from-default-s3-bucket", "bucketRegion": "us-west-2" }'Migrating a document’s assets to different storage
You can migrate all the assets associated with a document (PDFs, images, file attachments, etc.) and all its layers to another storage backend by making a request to /api/documents/{documentId}/migrate_assets.
Here’s an example curl request to migrate a document’s assets to the built-in storage:
// With Document Engine running on `http://localhost:5000`.
curl -X POST http://localhost:5000/api/documents/{documentID}/migrate_assets \ -H "Authorization: Token token=secret" \ -H "Content-type: application/json" \ -d '{ "storage": { "backend": "built-in" } }'Learn more about migrating assets from our API reference.
Multiple S3 buckets
Documents can be uploaded (or migrated) to many different S3 buckets so long as the instance role associated with your Document Engine nodes (or the AWS credentials configured for Document Engine) has the required permissions to access all the buckets you intend to upload or migrate documents to.
This feature is currently only available for S3. With Azure Blob Storage, all documents need to be stored in the default configured storage account, AZURE_STORAGE_ACCOUNT_NAME.
Migrating between default storage backends
It’s possible to migrate from one default storage backend to another by executing the migration command described below. To prevent data loss, a migration doesn’t delete files from the original backend.
Asset storage backend migrations are incremental. You can interrupt the migration process at any time and resume it later. This is useful when you have many documents and want to perform the migration only during periods of low load. You can perform the migration while Document Engine is running.
Before you start the migration process, make sure to set ENABLE_ASSET_STORAGE_FALLBACK to true and specify the storage fallbacks you want enabled. This enables Document Engine to serve assets that haven’t yet been migrated from the old backend.
Remember to set fallback storage back to false when you’ve finished migrating all documents, as it introduces a slight decrease in asset fetch performance.
At any point, you can inspect how many documents are stored in each backend from the Storage tab in the Document Engine dashboard.
All configuration options mentioned in this section are also configurable in the Helm chart values(opens in a new tab).
Migrating to S3 from built-in storage
To migrate from built-in asset storage to S3, follow these steps:
- Set
ENABLE_ASSET_STORAGE_FALLBACKtotrue. - Enable the built-in database storage as a fallback by setting
ENABLE_ASSET_STORAGE_FALLBACK_POSTGREStotrue. - Set
ASSET_STORAGE_BACKENDtos3and configure the rest of the S3 options. - Run the migration script by executing
pspdfkit assets:migrate:from-built-in-to-s3in the Document Engine container.- If you use
docker-compose, run the following command in the directory where you have yourdocker-compose.ymlfile:docker-compose run pspdfkit pspdfkit assets:migrate:from-built-in-to-s3. - If you don’t use
docker-compose, first find the name of the Document Engine container usingdocker ps -a. This will list all running containers and their names. Then run the following command, replacing<container name>with the actual Document Engine container name:docker exec <container name> pspdfkit assets:migrate:from-built-in-to-s3.
- If you use
- When all your documents have been migrated, set
ENABLE_ASSET_STORAGE_FALLBACKback tofalse.
Migrating to built-in storage from S3
To migrate from S3 asset storage to built-in storage, follow these steps:
- Set
ENABLE_ASSET_STORAGE_FALLBACKtotrue. - Enable S3 asset storage as a fallback by setting
ENABLE_ASSET_STORAGE_FALLBACK_S3totrue. - Set
ASSET_STORAGE_BACKENDtobuilt-in. Do not remove any of the S3 configuration options. - Run the migration script by executing
pspdfkit assets:migrate:from-s3-to-built-inin the Document Engine container.- If you use
docker-compose, run the following command in the directory where you have yourdocker-compose.ymlfile:docker-compose run pspdfkit pspdfkit assets:migrate:from-s3-to-built-in. - If you don’t use
docker-compose, first find the name of the Document Engine container usingdocker ps -a. This will list all running containers and their names. Then run the following command, replacing<container name>with the actual Document Engine container name:docker exec <container name> pspdfkit assets:migrate:from-s3-to-built-in.
- If you use
- When all your documents have been migrated, set
ENABLE_ASSET_STORAGE_FALLBACKback tofalseand remove all S3 configuration options.
Migrating to and from Azure Blob Storage
We currently don’t support batch migrations of assets to or from Azure Blob Storage. That said, you can still migrate an individual document’s assets to or from Azure. Learn more about this here.
Working with existing file storage in your infrastructure
If you already have a storage solution for PDF files in your infrastructure, Document Engine can integrate with it as long as the PDF files can be accessed via an HTTP endpoint. When integrating Document Engine and the file store, you’ll need to add documents from a URL.
All PDF URLs should be considered permalinks, as Nutrient will always fetch the file when needed, keeping only a local cached copy that can expire at any time.
Never accept arbitrary user input as a URL for a PDF. Malicious users might leverage this to make Document Engine perform a request on their behalf. This kind of attack, known as Server-Side Request Forgery (SSRF), can be used to interact with services that assume the local network is secure, e.g. cloud automation infrastructure.
To achieve the best possible performance, ensure Document Engine instances and the file store sit in the same network, whether physical or virtual. This minimizes latency and maximizes download speed.
As of version 2019.4, it’s possible to perform a document editing operation on a document with a remote URL, but the resulting PDF file will need to be stored with one of the supported storage strategies. If you need to copy the transformed file back to the file store, you’ll need to do that manually by fetching the transformed file first.
If your file store requires authentication, we recommend introducing an internal proxy. When adding a document with a URL, the URL would point to the proxy endpoint, where your custom logic could support the required authentication options and redirect to the file store URL of the PDF file. For more information and sample code, visit the relevant guide article.