How to merge PDFs using an SDK: A developer’s guide

Hulya Masharipov

November 14, 2024

How to merge PDFs using an SDK: A developer’s guide

Merging PDFs is a common requirement in document-heavy applications, enabling users to consolidate multiple files into a single PDF. This capability is helpful for simplifying document management, creating portfolios, organizing files, and more. This post will explore how developers can merge PDFs programmatically, comparing Nutrient — previously PSPDFKit — a commercial SDK, with open source alternatives like pdf-lib(opens in a new tab). It’ll also look at how merging PDFs can be implemented using Nutrient’s DWS API.

Why merge PDFs?

Before diving into the code, it’s first helpful to quickly review why merging PDFs can be beneficial for developers and users alike.

Document organization — Merging PDFs makes it easy to compile and organize information. For example, merging multiple invoices into one file can make reporting more manageable.
Reducing attachments — Combining documents can reduce clutter, especially when sharing files via email or other platforms that limit the number of attachments.
Creating document portfolios — Consolidating documents, like reports or contracts, into a single PDF can help users create organized portfolios.
Streamlined document processing — For applications that process documents, merging PDFs simplifies workflows by minimizing the number of files handled.

However, challenges like maintaining document integrity, handling different PDF formats, and ensuring compatibility across devices can make PDF merging complex. This post aims to simplify this process with step-by-step examples.

Merging PDFs with Nutrient Web SDK

Nutrient is a powerful PDF SDK(opens in a new tab) offering a wide array of features for document management, including PDF merging, editing, and annotating. Its comprehensive API and extensive documentation make it a popular choice for enterprise companies that require advanced document capabilities. Nutrient’s suite of tools also includes DWS API, which empowers developers to automate and track workflows, making it easier to manage and process documents efficiently within applications.

Steps to merge PDFs with Nutrient Web SDK

Here’s an example of how to merge PDF files using Nutrient. This method is efficient and well-suited for projects that require seamless document management.

To start, install Nutrient Web SDK in your project using npm or Yarn:

npm install pspdfkit
# or
yarn add pspdfkit

Copy the Nutrient Web SDK files to your assets directory:

cp -R ./node_modules/pspdfkit/dist/ ./assets/

Ensure your server has the correct MIME type for WebAssembly (application/wasm).
For the merging process, you should have your PDF files ready. In this case, you’ll be merging two files — document.pdf and imported.pdf.
Add an empty <div> to your HTML file where Nutrient Web SDK will be mounted:

<div id="pspdfkit" style="width: 100%; height: 100vh;"></div>

Add this script tag to your HTML file to load the index.js file:

<script type="module" src="index.js"></script>

In your JavaScript file (index.js), import Nutrient Web SDK and initialize it:

import './assets/pspdfkit.js';

// Define the base URL for the Nutrient Web SDK assets.
const baseUrl = `${window.location.protocol}//${window.location.host}/assets/`;

PSPDFKit.load({
  baseUrl,
  container: '#pspdfkit',
  document: 'document.pdf', // The main document you want to load.
})
  .then((instance) => {
    console.log('PSPDFKit loaded', instance);

    // Now merge another PDF using the `importDocument` operation.
    fetch('imported.pdf')
      .then((res) => {
        if (!res.ok) {
          throw res;
        }
        return res.blob();
      })
      .then((blob) => {
        // Perform the `importDocument` operation to merge PDFs.
        instance.applyOperations([
          {
            type: 'importDocument',
            importedPageIndexes: [2, 4, [7, 8]], // Specify the pages to import.
            beforePageIndex: 3, // Import the document before page 3.
            document: blob, // The blob representing the imported document.
            treatImportedDocumentAsOnePage: false, // Treat the imported document as separate pages.
          },
        ]);
      })
      .catch((error) =>
        console.error('Error importing document:', error),
      );
  })
  .catch((error) => {
    console.error('Error loading PSPDFKit:', error.message);
  });

Key parameters in the importDocument operation

beforePageIndex or afterPageIndex— Specifies where the imported document should be added in the current document.
treatImportedDocumentAsOnePage — When set to true, all pages of the imported document are treated as a single page for subsequent operations. If set to false, each page is treated separately.
importedPageIndexes — This array allows you to specify pages or a range of pages to import from the document. If omitted, the entire document is imported.

Serving your project

Use a simple HTTP server to serve your project’s files. You can use the serve package to serve your project locally:

npm install --global serve

serve -l 8080 .

Navigate to http://localhost:8080 in your browser to view your PDF.

Exporting the merged PDF

After applying the operations (i.e. merging the documents), you can export the final merged document using instance.exportPDF():

instance.exportPDF().then((pdfData) => {
  // You now have the merged PDF data as an ArrayBuffer.
  console.log('Merged PDF data:', pdfData);
  // You can save it, display it, or send it to the server.
});

By using Nutrient’s importDocument operation, you can easily merge multiple PDF files in the browser, by adding either entire documents or specific pages, and perform additional operations like rotation or text extraction. This solution is fully serverless and leverages WebAssembly for fast, secure, and private PDF rendering and editing directly in the browser.

You can try Nutrient without needing a trial key, although your document will include a watermark during this period. If you prefer to use the SDK without a watermark, you can easily get a 30-day full access trial by requesting a trial key — no additional setup is required.

Merging PDFs using Nutrient DWS API

First, make sure you have the required libraries installed.

Install Axios(opens in a new tab) for making HTTP requests:

npm install axios

Install Form-Data(opens in a new tab) to handle file uploads:

npm install form-data

Ensure you have fs (file system(opens in a new tab)) built in with Node.js, so that there’s no need to install it separately.
Create a folder for your project (e.g. pdf-merge), and inside this folder, place the PDFs you want to merge (first_half.pdf and second_half.pdf). The folder structure will look like this:

pdf-merge/
  ├── first_half.pdf
  ├── second_half.pdf
  ├── mergePDFs.js

Create a new file named mergePDFs.js in the project folder, and follow the code below.

At the top of your file, import the libraries you’ll use:

// Import the required libraries.
const axios = require('axios'); // For making HTTP requests.
const FormData = require('form-data'); // For handling form data (file uploads).
const fs = require('fs'); // For reading files from the file system.

axios is used for making HTTP requests to the API.
form-data helps construct the multipart/form-data request needed for file uploads.
fs allows you to read files from your local machine.

Create a new FormData object to prepare the data you’ll send in the API request:

// Create a `FormData` instance.
const formData = new FormData();

This will be used to append the PDFs and the instructions for merging.

Add an instruction object to the FormData that tells the API which PDFs to merge:

// Append the instructions that tell the API how to merge the PDFs.
formData.append(
  'instructions',
  JSON.stringify({
    parts: [
      {
        file: 'first_half', // Reference to the first PDF.
      },
      {
        file: 'second_half', // Reference to the second PDF.
      },
    ],
  }),
);

The instructions field tells the API which files to merge and in what order.
"first_half" and "second_half" correspond to the file names used in the next steps.

Now, attach the actual PDF files (first_half.pdf and second_half.pdf) to the FormData. This is done by reading the files and appending them:

// Attach the actual PDF files to the `FormData`.
formData.append('first_half', fs.createReadStream('first_half.pdf')); // Attach the first PDF.
formData.append('second_half', fs.createReadStream('second_half.pdf')); // Attach the second PDF.

fs.createReadStream() reads the file and prepares it to be sent as part of the form data.

You’ll now create an async function to send the request to Nutrient DWS API. This function will handle making the request and saving the merged PDF:

// Create an async function to send the request to DWS API.
(async () => {
  try {
    // Send a POST request to the API with the form data and authorization header.
    const response = await axios.post(
      'https://api.nutrient.io/build',
      formData,
      {
        headers: formData.getHeaders({
          Authorization: 'Bearer your_api_key_here', // Replace with your actual API key.
        }),
        responseType: 'stream', // Set the response type to 'stream' to handle large files.
      },
    );

    // Pipe the merged PDF result into a file called "result.pdf."
    response.data.pipe(fs.createWriteStream('result.pdf'));
    console.log('PDFs merged successfully!');
  } catch (e) {
    // If there is an error, log it.
    const errorString = await streamToString(e.response.data);
    console.log('Error merging PDFs:', errorString);
  }
})();

axios.post() sends the request to the DWS API.
responseType: "stream" allows you to handle the large merged PDF as a stream.
The merged PDF will be written to a file called result.pdf.

If the request fails, you’ll need a helper function to handle the error response and log it:

// Helper function to handle the response stream and convert it to a string (for error handling).
function streamToString(stream) {
  const chunks = [];
  return new Promise((resolve, reject) => {
    stream.on('data', (chunk) => chunks.push(Buffer.from(chunk)));
    stream.on('error', (err) => reject(err));
    stream.on('end', () =>
      resolve(Buffer.concat(chunks).toString('utf8')),
    );
  });
}

This function listens to the response stream and converts it to a string, allowing you to handle error messages properly.

After you’ve written your mergePDFs.js file, run the script in the terminal:

node mergePDFs.js

This will send a request to DWS API, merge the PDFs, and save the result as result.pdf in your project folder.

After the script has finished running, you’ll see a new file called result.pdf in your folder. Open this file to verify the PDFs were merged correctly.

By following these steps, you can merge PDFs using the Nutrient API. The code makes a POST request to the API, attaches the PDF files, and provides instructions on how to merge them. If the operation is successful, the merged PDF is saved as result.pdf.

Open source alternative for merging PDFs: pdf-lib

For those looking for a free, customizable solution, pdf-lib(opens in a new tab) is a lightweight, open source library for PDF manipulation. It supports creating and modifying PDFs, including merging documents.

Step-by-step guide to merging PDFs with pdf-lib

Install the pdf-lib package using npm:

npm install pdf-lib

Start by importing the PDFDocument class from pdf-lib and the fs module for handling file operations:

import { PDFDocument } from 'pdf-lib';
import fs from 'fs';

Define an async function, mergePDFsWithPDFLib(), to handle the merging process. In this function, load two PDF files from the file system using fs.readFileSync(). These files should be in the same directory as your code, or you can provide a full file path if they’re located elsewhere:

async function mergePDFsWithPDFLib() {
    // Load the PDF files.
    const pdfDoc1 = await PDFDocument.load(fs.readFileSync("first.pdf"));
    const pdfDoc2 = await PDFDocument.load(fs.readFileSync("second.pdf"));

Create a new empty PDF document where you’ll store the merged content of the PDFs. This document will act as the final output for your merged files:

// Create a new PDF document.
const mergedPDF = await PDFDocument.create();

Copy all pages from the first PDF (pdfDoc1) and add them to the new mergedPDF document. This is done using the copyPages() function, which lets you copy specific pages by index:

// Copy pages from the first PDF.
const copiedPages1 = await mergedPDF.copyPages(
  pdfDoc1,
  pdfDoc1.getPageIndices(),
);
copiedPages1.forEach((page) => mergedPDF.addPage(page));

Repeat the same process for the second PDF (pdfDoc2). This copies all pages from pdfDoc2 and adds them to mergedPDF:

// Copy pages from the second PDF.
const copiedPages2 = await mergedPDF.copyPages(
  pdfDoc2,
  pdfDoc2.getPageIndices(),
);
copiedPages2.forEach((page) => mergedPDF.addPage(page));

Once all pages from both PDFs have been added to mergedPDF, save the document to your file system. Use the save() method to get the merged PDF content as a byte array, and then write this to a new file (e.g. merged.pdf):

// Save the merged PDF.
const mergedPDFBytes = await mergedPDF.save();
fs.writeFileSync("merged.pdf", mergedPDFBytes);
}

Now, call the function and handle any errors with .catch(console.error):

mergePDFsWithPDFLib().catch(console.error);

Full code example

Here’s the complete code for merging two PDF files:

import { PDFDocument } from 'pdf-lib';
import fs from 'fs';

async function mergePDFsWithPDFLib() {
  // Load the PDF files.
  const pdfDoc1 = await PDFDocument.load(fs.readFileSync('first.pdf'));
  const pdfDoc2 = await PDFDocument.load(
    fs.readFileSync('second.pdf'),
  );

  // Create a new PDF document.
  const mergedPDF = await PDFDocument.create();

  // Copy pages from the first PDF.
  const copiedPages1 = await mergedPDF.copyPages(
    pdfDoc1,
    pdfDoc1.getPageIndices(),
  );
  copiedPages1.forEach((page) => mergedPDF.addPage(page));

  // Copy pages from the second PDF.
  const copiedPages2 = await mergedPDF.copyPages(
    pdfDoc2,
    pdfDoc2.getPageIndices(),
  );
  copiedPages2.forEach((page) => mergedPDF.addPage(page));

  // Save the merged PDF.
  const mergedPDFBytes = await mergedPDF.save();
  fs.writeFileSync('merged.pdf', mergedPDFBytes);
}

mergePDFsWithPDFLib().catch(console.error);

Explanation of key steps

Load PDFs — PDFDocument.load() reads each PDF and loads it as a document object.
Create a new document — PDFDocument.create() initializes an empty PDF to store merged pages.
Copy pages — copyPages() copies pages from each PDF and inserts them into the new document.
Save the merged PDF — mergedPDF.save() converts the new document into a byte array, which is then saved as merged.pdf.

Using pdf-lib makes it straightforward to merge multiple PDFs into one. With these steps, you can easily extend this process to merge any number of PDFs by following the same approach. This open source solution is lightweight and efficient, ideal for smaller projects or when you need customizable PDF handling without a commercial SDK.

Comparison of Nutrient and open source libraries

Feature	Nutrient Web SDK	Nutrient DWS API	pdf-lib
Ease of use	High, easy to integrate	High, easy to integrate with server-side automation	Moderate, requires setup
Features	Comprehensive PDF manipulation and annotation support	Server-side PDF merging, document workflow automation, scalability	Basic PDF manipulation, no annotation support
Performance	Optimized for enterprise use	Highly scalable and optimized for enterprise environments	Good for small projects
Cost	Free license available with watermark, paid license for full features	Free trial with 100 credits, 5 MB file size limit	Free and open source

Nutrient Web SDK offers a free version with a watermark, allowing users to try the solution before purchasing a full license. Nutrient DWS API also provides a free trial with a watermark, enabling users to test its server-side PDF merging and automation capabilities. pdf-lib remains a free and open source library for basic PDF manipulation tasks.

Best practices for merging PDFs

Merging PDFs effectively requires some best practices to ensure smooth functionality and maintain performance:

Optimize file sizes — Reducing PDF size can improve performance and load times. Use compression libraries or remove unnecessary metadata to keep file sizes manageable.
Handle errors gracefully — When dealing with multiple file types and sources, add error handling to ensure smooth merging, even with problematic files.
Privacy and security — For sensitive documents, use secure file handling methods and consider merging encrypted PDFs when necessary.

Conclusion

In this post, you learned about PDF merging using Nutrient Web SDK and DWS API, alongside the open source pdf-lib(opens in a new tab). Nutrient Web SDK offers an enterprise-level solution with advanced features, while DWS API focuses on scalable document processing. pdf-lib is a free, flexible option for simpler merging needs. Choose the solution that best fits your requirements.

FAQ

What is Nutrient Web SDK, and how does it compare to DWS API for PDF merging?

Nutrient Web SDK is a comprehensive toolkit for PDF manipulation, including merging, editing, and securing PDFs within an application. DWS API, a cloud-based service, allows for merging PDFs via HTTP requests, making it suitable for automated and server-side operations.

Which is better for merging PDFs: Nutrient Web SDK or DWS API?

The choice depends on your project needs:

Nutrient Web SDK — Ideal for client-side applications that require extensive PDF manipulation features beyond merging.
DWS API — Perfect for cloud-based workflows, allowing multiple PDF operations through HTTP endpoints without managing local resources.

Can pdf-lib handle PDF merging, and is it a good alternative?

Yes, pdf-lib is a popular open source JavaScript library for PDF manipulation, including merging capabilities. While it’s sufficient for basic PDF merging tasks, it lacks some advanced functionalities and optimizations for large files, which SDKs like Nutrient and DWS API provide.

How does the performance of pdf-lib compare with Nutrient Web SDK and DWS API for large PDF files?

For large PDFs:

Nutrient Web SDK — Optimized for handling complex, large files with minimal lag.
DWS API — Suitable for large files processed on the server side, utilizing cloud resources to reduce strain on the client side.
pdf-lib — May struggle with very large files, as it’s designed for smaller applications and may lack advanced optimizations.

Can I use an API to merge PDFs with pdf-lib?

No, pdf-lib is a JavaScript library that operates within a JavaScript environment, and it doesn’t have an HTTP API. For API-based merging, you need to use DWS API or Nutrient’s server-side solutions.

Are there any security settings in Nutrient Web SDK and DWS API when merging PDFs?

Yes, both Nutrient Web SDK and DWS API support security features such as password protection, encryption, and setting permissions on merged PDFs. pdf-lib, however, has limited security settings compared to these SDKs.

Is pdf-lib free, and how does that affect its use for PDF merging?

pdf-lib is free and open source, making it a great option for small-scale applications or when budget constraints exist. However, for larger, more robust applications requiring high performance and security, Nutrient Web SDK or DWS API might be better choices despite their licensing costs.