In the last post, I covered merging PDFs in the browser using pdf-lib. Today we're doing the opposite: splitting them.
Splitting is surprisingly common:
- A 200-page report where you only need chapters 3 and 7
- A scanned contract where the signature page needs to go to legal
- A PDF that's too large to email, so you split it into two attachments
- Extracting specific pages from a bulk download
Just like merging, we'll do this entirely in the browser — no server uploads, no privacy trade-offs.
Why Browser-Side Splitting Matters
The same privacy argument applies. When you upload a PDF to an online splitter, your file leaves your device. For documents containing personal data, financial records, or client contracts, that's a risk you don't need to take.
Browser-side splitting keeps everything local. The PDF loads into your browser tab, pdf-lib manipulates it in memory, and you download the result. The server never sees the content.
The trade-off: Browser memory. A 300MB scanned document will strain a tab. For typical office documents (under 50-100MB), splitting is instant.
Basic Split: Extract a Single Page Range
The simplest use case: "Give me pages 5 through 10."
import { PDFDocument } from 'pdf-lib';
async function extractPages(file, startPage, endPage) {
// Load the source PDF
const arrayBuffer = await file.arrayBuffer();
const sourcePdf = await PDFDocument.load(arrayBuffer);
// Create a new empty PDF
const newPdf = await PDFDocument.create();
// Convert to 0-based indices
const pageIndices = [];
for (let i = startPage - 1; i < endPage; i++) {
pageIndices.push(i);
}
// Copy pages from source to new document
const copiedPages = await newPdf.copyPages(sourcePdf, pageIndices);
copiedPages.forEach(page => newPdf.addPage(page));
// Save and return
const pdfBytes = await newPdf.save();
return new Blob([pdfBytes], { type: 'application/pdf' });
}
// Usage: extract pages 5-10
const blob = await extractPages(file, 5, 10);
downloadBlob(blob, 'pages-5-to-10.pdf');
What's happening:
- Load the full source PDF into memory
- Create a blank destination PDF
- Copy only the requested pages using
copyPages() - Save the new, smaller document
The source PDF isn't modified — we're creating a new document containing only the pages we want.
Split by Page Ranges: One Input, Multiple Outputs
Real-world scenario: a user uploads a 50-page document and wants three separate files — pages 1-10, 11-30, and 31-50.
async function splitByRanges(file, ranges) {
// ranges = [{ name: 'intro', start: 1, end: 10 }, ...]
const arrayBuffer = await file.arrayBuffer();
const sourcePdf = await PDFDocument.load(arrayBuffer);
const results = [];
for (const range of ranges) {
const newPdf = await PDFDocument.create();
const pageIndices = [];
for (let i = range.start - 1; i < range.end; i++) {
pageIndices.push(i);
}
const copiedPages = await newPdf.copyPages(sourcePdf, pageIndices);
copiedPages.forEach(page => newPdf.addPage(page));
const pdfBytes = await newPdf.save();
results.push({
name: range.name,
blob: new Blob([pdfBytes], { type: 'application/pdf' })
});
}
return results;
}
// Usage
const ranges = [
{ name: 'chapter-1', start: 1, end: 10 },
{ name: 'chapter-2', start: 11, end: 30 },
{ name: 'chapter-3', start: 31, end: 50 }
];
const files = await splitByRanges(pdfFile, ranges);
files.forEach(f => downloadBlob(f.blob, `${f.name}.pdf`));
Memory note: We load the source PDF once, then create multiple new documents from it. Each new document only holds the pages it needs, so memory usage stays proportional to the output size, not the input size.
Advanced: Split by File Size (Email-Friendly Chunks)
Here's a trickier problem: "Split this PDF into chunks under 20MB each so I can email it."
Unlike page-range splitting, we don't know the page count in advance. We need to accumulate pages until adding one more would exceed the limit, then start a new chunk.
async function splitBySize(file, maxSizeBytes = 20 * 1024 * 1024) {
const arrayBuffer = await file.arrayBuffer();
const sourcePdf = await PDFDocument.load(arrayBuffer);
const totalPages = sourcePdf.getPageCount();
const chunks = [];
let currentChunk = await PDFDocument.create();
let currentSize = 0;
let currentStartPage = 1;
for (let i = 0; i < totalPages; i++) {
// Try adding this page to the current chunk
const testChunk = await PDFDocument.create();
// Copy all pages currently in the chunk plus the new one
const pagesToCopy = [];
for (let j = currentStartPage - 1; j <= i; j++) {
pagesToCopy.push(j);
}
const copiedPages = await testChunk.copyPages(sourcePdf, pagesToCopy);
copiedPages.forEach(page => testChunk.addPage(page));
const testBytes = await testChunk.save();
if (testBytes.length > maxSizeBytes && currentStartPage <= i) {
// This page would push us over the limit. Finalize current chunk.
const finalChunk = await PDFDocument.create();
const finalPages = [];
for (let j = currentStartPage - 1; j < i; j++) {
finalPages.push(j);
}
const finalCopied = await finalChunk.copyPages(sourcePdf, finalPages);
finalCopied.forEach(page => finalChunk.addPage(page));
const finalBytes = await finalChunk.save();
chunks.push(new Blob([finalBytes], { type: 'application/pdf' }));
// Start new chunk with current page
currentStartPage = i + 1;
}
}
// Don't forget the last chunk
if (currentStartPage <= totalPages) {
const finalChunk = await PDFDocument.create();
const finalPages = [];
for (let j = currentStartPage - 1; j < totalPages; j++) {
finalPages.push(j);
}
const finalCopied = await finalChunk.copyPages(sourcePdf, finalPages);
finalCopied.forEach(page => finalChunk.addPage(page));
const finalBytes = await finalChunk.save();
chunks.push(new Blob([finalBytes], { type: 'application/pdf' }));
}
return chunks;
}
Why this is expensive: We save a test document on every page to check the size. For a 100-page document, that's 100 save operations. In practice, I optimize this by:
- Estimating page size from the first few pages
- Using binary search instead of linear checking
- Adding a small buffer (aim for 18MB instead of exactly 20MB)
For production, a simpler heuristic works well: if the original is 45MB and 50 pages, each page is roughly 0.9MB. Split every 22 pages to stay safely under 20MB.
Extracting Every Nth Page
Another common pattern: "I only need the odd pages" or "Extract every 5th page for a summary."
async function extractEveryNthPage(file, n, offset = 0) {
const arrayBuffer = await file.arrayBuffer();
const sourcePdf = await PDFDocument.load(arrayBuffer);
const totalPages = sourcePdf.getPageCount();
const newPdf = await PDFDocument.create();
const pageIndices = [];
for (let i = offset; i < totalPages; i += n) {
pageIndices.push(i);
}
const copiedPages = await newPdf.copyPages(sourcePdf, pageIndices);
copiedPages.forEach(page => newPdf.addPage(page));
const pdfBytes = await newPdf.save();
return new Blob([pdfBytes], { type: 'application/pdf' });
}
// Extract odd pages only
const oddPages = await extractEveryNthPage(file, 2, 0);
// Extract even pages only
const evenPages = await extractEveryNthPage(file, 2, 1);
Handling Page Rotation and Metadata
When you split a PDF, you might want to preserve or modify metadata:
async function splitWithMetadata(file, startPage, endPage) {
const arrayBuffer = await file.arrayBuffer();
const sourcePdf = await PDFDocument.load(arrayBuffer);
const newPdf = await PDFDocument.create();
// Copy pages
const pageIndices = [];
for (let i = startPage - 1; i < endPage; i++) {
pageIndices.push(i);
}
const copiedPages = await newPdf.copyPages(sourcePdf, pageIndices);
copiedPages.forEach(page => newPdf.addPage(page));
// Preserve or update metadata
const author = sourcePdf.getAuthor();
const creator = sourcePdf.getCreator();
newPdf.setTitle(`Extracted Pages ${startPage}-${endPage}`);
newPdf.setAuthor(author || 'Unknown');
newPdf.setCreator('sotool PDF Splitter');
newPdf.setProducer('pdf-lib');
newPdf.setCreationDate(new Date());
const pdfBytes = await newPdf.save();
return new Blob([pdfBytes], { type: 'application/pdf' });
}
This is useful when the recipient needs to know where the extract came from.
Performance: The 200-Page Test
I tested the page-range splitter on a 200-page, 85MB scanned document:
| Operation | Time | Memory Peak |
|---|---|---|
| Load PDF | 2.1s | 180MB |
| Extract pages 50-100 | 1.8s | 95MB |
| Split into 4 chunks | 4.2s | 210MB |
| Extract every 10th page | 0.9s | 45MB |
The key insight: copyPages() is fast because it copies page references and shared resources efficiently. You're not deep-cloning the entire document for each output.
Optimization tip: If you're splitting a PDF into many small chunks, load the source once and reuse the PDFDocument instance. Don't reload it for each chunk.
Limitations (Honest Assessment)
| Limitation | Why | Workaround |
|---|---|---|
| Bookmarks lost | Splitting creates new document trees | Acceptable for most extracts |
| Internal links break | Page references change | Rarely an issue for extracts |
| Large scans choke | Browser memory limits | Split server-side for 500MB+ files |
| Form data complexity | Some forms have cross-page dependencies | Test thoroughly with fillable PDFs |
For 95% of use cases — extracting a chapter, splitting a contract, creating email-friendly chunks — these limitations don't matter.
Try the Live Tool
If you want to test browser-based PDF splitting without writing code:
Features:
- Extract specific page ranges
- Split by fixed page count (every N pages)
- Split by file size (email-friendly chunks)
- Visual page thumbnails for selection
- Pure browser-side processing
- Free, no signup
The source code is open source if you want to see how the Vue 3 + pdf-lib integration works.
What's Next?
If you enjoyed this post, check out the companion piece on merging PDFs in the browser.
Next up in the series:
- Compressing PDFs by downsampling embedded images
- Adding watermarks with text and image overlays
- Encrypting and password-protecting PDFs client-side
Have you built browser-based document tools? What's your approach to handling large files or complex page operations? Let me know in the comments.
Top comments (0)