Last updated
How PDF Merging Works
PDF merging combines multiple PDF files into a single document by concatenating their page streams. Each PDF contains a cross-reference table (xref) that maps object numbers to byte offsets. When merging, the objects from each source PDF are renumbered to avoid conflicts, and a new xref table is written for the combined document.
PDF Structure Overview
PDF file structure:
%PDF-1.7 ← header
1 0 obj ← catalog object
2 0 obj ← pages tree
3 0 obj ← page 1
4 0 obj ← page 1 content stream
...
xref ← cross-reference table
trailer ← trailer dictionary
%%EOF ← end of file
Using pdf-lib in JavaScript
import { PDFDocument } from 'pdf-lib';
async function mergePdfs(pdfFiles) {
const merged = await PDFDocument.create();
for (const file of pdfFiles) {
const bytes = await file.arrayBuffer();
const pdf = await PDFDocument.load(bytes);
const pages = await merged.copyPages(pdf, pdf.getPageIndices());
pages.forEach(page => merged.addPage(page));
}
const mergedBytes = await merged.save();
const blob = new Blob([mergedBytes], { type: 'application/pdf' });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'merged.pdf';
a.click();
}
Common Issues
- Password-protected PDFs: Encrypted PDFs must be decrypted before merging. pdf-lib supports loading with a password via
PDFDocument.load(bytes, { password: 'secret' }). - Font embedding: Fonts referenced but not embedded in source PDFs may not render correctly after merging.
- Large files: Browser-based merging loads all PDFs into memory. For files over 100MB, server-side tools like PyPDF2 or Ghostscript are more appropriate.