Large document triage
Use this recipe when a large document is slow to inspect, hard to validate manually, or needs a support-ready diagnostics packet.
This workflow is optimized for support and triage:
- Upload the document.
- Capture document metadata.
- Check document properties.
- Verify text extraction and search behavior.
- Render a first-page preview image.
- Bundle the results into an escalation packet.
Triage checklist
Use these endpoints in order:
- Upload the document —
POST /viewer/documents - Fetch document information —
GET /viewer/documents/{documentId}/document_info - Fetch document properties —
GET /viewer/documents/{documentId}/properties - Search the document —
GET /viewer/documents/{documentId}/search - Fetch page text —
GET /viewer/documents/{documentId}/pages/{pageIndex}/text - Render a first-page preview —
GET /viewer/documents/{documentId}/pages/{pageIndex}/image
Recommended workflow
Follow these steps to understand a document’s shape, identify potential issues, and gather the necessary data for support escalation if needed.
1. Upload the document
Start by uploading the file and saving the returned document_id:
curl -X POST https://api.nutrient.io/viewer/documents \ -H "Authorization: Bearer <api_key>" \ -H "Content-Type: application/pdf" \ --data-binary @large-document.pdf \ --failResponse:
{ "data": { "document_id": "<document_id>", "title": "large-document" }}2. Capture document information
Fetch top-level document information:
curl -X GET "https://api.nutrient.io/viewer/documents/<document_id>/document_info" \ -H "Authorization: Bearer <api_key>" \ --failUse this response to confirm:
- Page count
- Page dimensions
- Permissions
- Metadata such as author, title, producer, and modification dates
This is the fastest way to understand the document’s shape before deeper inspection.
3. Capture document properties
Fetch document properties:
curl -X GET "https://api.nutrient.io/viewer/documents/<document_id>/properties" \ -H "Authorization: Bearer <api_key>" \ --failThis is useful for support triage because it includes details such as:
- Byte size
- Password-protection status
- Source PDF SHA-256
- Storage type
- Created-at timestamp
4. Check text extraction and search
If the issue involves missing or suspicious text, inspect the first page’s extracted text:
curl -X GET "https://api.nutrient.io/viewer/documents/<document_id>/pages/0/text" \ -H "Authorization: Bearer <api_key>" \ --failThen run a targeted search against a known word or phrase from the file:
curl -G "https://api.nutrient.io/viewer/documents/<document_id>/search" \ -H "Authorization: Bearer <api_key>" \ --data-urlencode "q=invoice" \ --failUse these together:
- If text extraction is empty or clearly wrong, the issue may be with the source file, OCR quality, or embedded text layer.
- If the page text looks correct but search results are unexpected, include both responses in your escalation.
5. Render a first-page preview image
Render page 0 as a PNG preview:
curl -X GET "https://api.nutrient.io/viewer/documents/<document_id>/pages/0/image?width=1600" \ -H "Authorization: Bearer <api_key>" \ -H "Accept: image/png" \ --fail \ -o page-0.pngThis preview helps support confirm whether the problem is visible in server-side rendering without needing the full browser integration.
Escalation packet
For a support-ready packet, include:
- The original input file, if shareable
- The returned
document_id document_inforesponsepropertiesresponse- One page-text response from an affected page
- One representative search response
- The rendered first-page preview image
- A short note describing the expected result versus the observed result
Complete Node.js example
For a script that automates this workflow and writes the outputs to disk, refer to the Node.js large document triage example.