Creating documents from URLs with Document Engine
PSPDFKit Server has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).
When you already have an existing data store for your files or prefer not to store them with PSPDFKit Server, you can create a document from a URL.
When operating on a document from a URL, PSPDFKit Server will fetch the file using the provided URL and cache it in the node file system.
-
Your service sends a document’s URL to PSPDFKit Server, which makes a request to the URL to retrieve the document.
-
The document service returns the document, and PSPDFKit Server saves it and its metadata in the asset storage and PostgreSQL.
-
Your service receives the document ID back, which it can use to reference the document later.
Security Considerations
Please be aware that the workflow of this functionality requires PSPDFKit Server to perform a server-side retrieval of data at the specified URL. As such, creating a document from a URL comes with inherent security limitations.
For increased security at the expense of ease of integration, please consider disabling document creation from a URL and instead use the document creation from upload architecture to have data processed by PSPDFKit Server sourced from a Postgres database or S3-compatible storage.
The design intent of the document creation from a URL feature is for PSPDFKit Server to be able to easily integrate with other known and trusted services by reaching out and collecting data directly via HTTPS.
You should never send an untrusted URL directly to PSPDFKit Server. If your service is working with user input or other untrusted data sources, your service needs to implement checks to prevent untrusted URLs from being directly sent to PSPDFKit Server so as to mitigate the risk of server-side request forgery.
Because document creation from a URL requires an authorized API call, PSPDFKit Server doesn’t place limitations on the network sources it’ll attempt to resolve and retrieve data from. As an exercise in the principle of least privilege, consider limiting outbound network traffic at the container or network firewall level so that PSPDFKit Server can only communicate outbound to the sources it’s expected to retrieve data from.