Node.js large document triage
This example automates a support-oriented triage workflow for large documents.
The script:
- Uploads a local file
- Extracts the returned
document_id - Fetches
document_info - Fetches
properties - Fetches text from page 0
- Optionally runs a search query
- Renders page 0 as a PNG preview
- Saves all outputs to disk
Prerequisites
- A DWS Viewer API key in
NUTRIENT_DWS_VIEWER_API_KEY - Node.js 18 or later
- A local input file
Complete example
The following script can be run with node and accepts command-line arguments for the input file path, MIME type, and an optional search query:
import { mkdir, readFile, writeFile } from "node:fs/promises";import path from "node:path";
const apiKey = process.env.NUTRIENT_DWS_VIEWER_API_KEY;const inputPath = process.argv[2];const mimeType = process.argv[3] ?? "application/pdf";const searchQuery = process.argv[4] ?? "";
if (!apiKey) { throw new Error("Missing NUTRIENT_DWS_VIEWER_API_KEY");}
if (!inputPath) { throw new Error( "Usage: node triage.mjs <inputPath> [mimeType] [optionalSearchQuery]", );}
const fileBuffer = await readFile(inputPath);
const uploadResponse = await fetch("https://api.nutrient.io/viewer/documents", { method: "POST", headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": mimeType, "Content-Length": fileBuffer.length.toString(), }, body: fileBuffer,});
if (!uploadResponse.ok) { const errorText = await uploadResponse.text(); throw new Error(`Upload failed: ${uploadResponse.status} ${errorText}`);}
const uploadResult = await uploadResponse.json();const documentId = uploadResult.data?.document_id;
if (!documentId) { throw new Error("No document_id found in upload response");}
const outputDir = path.resolve(`./dws-triage-${documentId}`);await mkdir(outputDir, { recursive: true });
const requestJson = async (pathname) => { const response = await fetch(`https://api.nutrient.io${pathname}`, { headers: { Authorization: `Bearer ${apiKey}`, Accept: "application/json", }, });
if (!response.ok) { const errorText = await response.text(); throw new Error(`${pathname} failed: ${response.status} ${errorText}`); }
return response.json();};
const requestBinary = async (pathname, accept) => { const response = await fetch(`https://api.nutrient.io${pathname}`, { headers: { Authorization: `Bearer ${apiKey}`, Accept: accept, }, });
if (!response.ok) { const errorText = await response.text(); throw new Error(`${pathname} failed: ${response.status} ${errorText}`); }
return Buffer.from(await response.arrayBuffer());};
const documentInfo = await requestJson( `/viewer/documents/${documentId}/document_info`,);const documentProperties = await requestJson( `/viewer/documents/${documentId}/properties`,);const firstPageText = await requestJson( `/viewer/documents/${documentId}/pages/0/text`,);
let searchResults = null;if (searchQuery) { const params = new URLSearchParams({ q: searchQuery }); searchResults = await requestJson( `/viewer/documents/${documentId}/search?${params.toString()}`, );}
const previewImage = await requestBinary( `/viewer/documents/${documentId}/pages/0/image?width=1600`, "image/png",);
await Promise.all([ writeFile( path.join(outputDir, "upload-response.json"), JSON.stringify(uploadResult, null, 2), ), writeFile( path.join(outputDir, "document-info.json"), JSON.stringify(documentInfo, null, 2), ), writeFile( path.join(outputDir, "document-properties.json"), JSON.stringify(documentProperties, null, 2), ), writeFile( path.join(outputDir, "page-0-text.json"), JSON.stringify(firstPageText, null, 2), ), writeFile(path.join(outputDir, "page-0.png"), previewImage),]);
if (searchResults) { await writeFile( path.join(outputDir, "search-results.json"), JSON.stringify(searchResults, null, 2), );}
console.log(`Saved triage packet to ${outputDir}`);console.log(`document_id: ${documentId}`);Run the script
Run the script with the required environment variable and command-line arguments — for example:
export NUTRIENT_DWS_VIEWER_API_KEY=your_api_key_herenode triage.mjs ./large-document.pdf application/pdf invoiceExample output directory:
./dws-triage-<document_id>/├── document-info.json├── document-properties.json├── page-0-text.json├── page-0.png├── search-results.json└── upload-response.jsonIf you don’t want to run a search check, omit the final argument.
What this packet helps you validate
Whether:
- The upload succeeded and returned the expected
document_id - The document metadata and properties look reasonable
- Page 0 exposes usable extracted text
- A representative term can be found with search
- The first page renders correctly as a server-generated image