sunshey

Posted on Jun 4

How to Split PDF Files in the Browser (No Server Required)

#webdev #javascript #pdf #tutorial

In the last post, I covered merging PDFs in the browser using pdf-lib. Today we're doing the opposite: splitting them.

Splitting is surprisingly common:

A 200-page report where you only need chapters 3 and 7
A scanned contract where the signature page needs to go to legal
A PDF that's too large to email, so you split it into two attachments
Extracting specific pages from a bulk download

Just like merging, we'll do this entirely in the browser — no server uploads, no privacy trade-offs.

Why Browser-Side Splitting Matters

The same privacy argument applies. When you upload a PDF to an online splitter, your file leaves your device. For documents containing personal data, financial records, or client contracts, that's a risk you don't need to take.

Browser-side splitting keeps everything local. The PDF loads into your browser tab, pdf-lib manipulates it in memory, and you download the result. The server never sees the content.

The trade-off: Browser memory. A 300MB scanned document will strain a tab. For typical office documents (under 50-100MB), splitting is instant.

Basic Split: Extract a Single Page Range

The simplest use case: "Give me pages 5 through 10."

import { PDFDocument } from 'pdf-lib';

async function extractPages(file, startPage, endPage) {
  // Load the source PDF
  const arrayBuffer = await file.arrayBuffer();
  const sourcePdf = await PDFDocument.load(arrayBuffer);

  // Create a new empty PDF
  const newPdf = await PDFDocument.create();

  // Convert to 0-based indices
  const pageIndices = [];
  for (let i = startPage - 1; i < endPage; i++) {
    pageIndices.push(i);
  }

  // Copy pages from source to new document
  const copiedPages = await newPdf.copyPages(sourcePdf, pageIndices);
  copiedPages.forEach(page => newPdf.addPage(page));

  // Save and return
  const pdfBytes = await newPdf.save();
  return new Blob([pdfBytes], { type: 'application/pdf' });
}

// Usage: extract pages 5-10
const blob = await extractPages(file, 5, 10);
downloadBlob(blob, 'pages-5-to-10.pdf');

What's happening:

Load the full source PDF into memory
Create a blank destination PDF
Copy only the requested pages using copyPages()
Save the new, smaller document

The source PDF isn't modified — we're creating a new document containing only the pages we want.

Split by Page Ranges: One Input, Multiple Outputs

Real-world scenario: a user uploads a 50-page document and wants three separate files — pages 1-10, 11-30, and 31-50.

async function splitByRanges(file, ranges) {
  // ranges = [{ name: 'intro', start: 1, end: 10 }, ...]
  const arrayBuffer = await file.arrayBuffer();
  const sourcePdf = await PDFDocument.load(arrayBuffer);

  const results = [];

  for (const range of ranges) {
    const newPdf = await PDFDocument.create();

    const pageIndices = [];
    for (let i = range.start - 1; i < range.end; i++) {
      pageIndices.push(i);
    }

    const copiedPages = await newPdf.copyPages(sourcePdf, pageIndices);
    copiedPages.forEach(page => newPdf.addPage(page));

    const pdfBytes = await newPdf.save();
    results.push({
      name: range.name,
      blob: new Blob([pdfBytes], { type: 'application/pdf' })
    });
  }

  return results;
}

// Usage
const ranges = [
  { name: 'chapter-1', start: 1, end: 10 },
  { name: 'chapter-2', start: 11, end: 30 },
  { name: 'chapter-3', start: 31, end: 50 }
];

const files = await splitByRanges(pdfFile, ranges);
files.forEach(f => downloadBlob(f.blob, `${f.name}.pdf`));

Memory note: We load the source PDF once, then create multiple new documents from it. Each new document only holds the pages it needs, so memory usage stays proportional to the output size, not the input size.

Advanced: Split by File Size (Email-Friendly Chunks)

Here's a trickier problem: "Split this PDF into chunks under 20MB each so I can email it."

Unlike page-range splitting, we don't know the page count in advance. We need to accumulate pages until adding one more would exceed the limit, then start a new chunk.

async function splitBySize(file, maxSizeBytes = 20 * 1024 * 1024) {
  const arrayBuffer = await file.arrayBuffer();
  const sourcePdf = await PDFDocument.load(arrayBuffer);
  const totalPages = sourcePdf.getPageCount();

  const chunks = [];
  let currentChunk = await PDFDocument.create();
  let currentSize = 0;
  let currentStartPage = 1;

  for (let i = 0; i < totalPages; i++) {
    // Try adding this page to the current chunk
    const testChunk = await PDFDocument.create();

    // Copy all pages currently in the chunk plus the new one
    const pagesToCopy = [];
    for (let j = currentStartPage - 1; j <= i; j++) {
      pagesToCopy.push(j);
    }

    const copiedPages = await testChunk.copyPages(sourcePdf, pagesToCopy);
    copiedPages.forEach(page => testChunk.addPage(page));

    const testBytes = await testChunk.save();

    if (testBytes.length > maxSizeBytes && currentStartPage <= i) {
      // This page would push us over the limit. Finalize current chunk.
      const finalChunk = await PDFDocument.create();
      const finalPages = [];
      for (let j = currentStartPage - 1; j < i; j++) {
        finalPages.push(j);
      }
      const finalCopied = await finalChunk.copyPages(sourcePdf, finalPages);
      finalCopied.forEach(page => finalChunk.addPage(page));

      const finalBytes = await finalChunk.save();
      chunks.push(new Blob([finalBytes], { type: 'application/pdf' }));

      // Start new chunk with current page
      currentStartPage = i + 1;
    }
  }

  // Don't forget the last chunk
  if (currentStartPage <= totalPages) {
    const finalChunk = await PDFDocument.create();
    const finalPages = [];
    for (let j = currentStartPage - 1; j < totalPages; j++) {
      finalPages.push(j);
    }
    const finalCopied = await finalChunk.copyPages(sourcePdf, finalPages);
    finalCopied.forEach(page => finalChunk.addPage(page));

    const finalBytes = await finalChunk.save();
    chunks.push(new Blob([finalBytes], { type: 'application/pdf' }));
  }

  return chunks;
}

Why this is expensive: We save a test document on every page to check the size. For a 100-page document, that's 100 save operations. In practice, I optimize this by:

Estimating page size from the first few pages
Using binary search instead of linear checking
Adding a small buffer (aim for 18MB instead of exactly 20MB)

For production, a simpler heuristic works well: if the original is 45MB and 50 pages, each page is roughly 0.9MB. Split every 22 pages to stay safely under 20MB.

Extracting Every Nth Page

Another common pattern: "I only need the odd pages" or "Extract every 5th page for a summary."

async function extractEveryNthPage(file, n, offset = 0) {
  const arrayBuffer = await file.arrayBuffer();
  const sourcePdf = await PDFDocument.load(arrayBuffer);
  const totalPages = sourcePdf.getPageCount();

  const newPdf = await PDFDocument.create();
  const pageIndices = [];

  for (let i = offset; i < totalPages; i += n) {
    pageIndices.push(i);
  }

  const copiedPages = await newPdf.copyPages(sourcePdf, pageIndices);
  copiedPages.forEach(page => newPdf.addPage(page));

  const pdfBytes = await newPdf.save();
  return new Blob([pdfBytes], { type: 'application/pdf' });
}

// Extract odd pages only
const oddPages = await extractEveryNthPage(file, 2, 0);

// Extract even pages only
const evenPages = await extractEveryNthPage(file, 2, 1);

Handling Page Rotation and Metadata

When you split a PDF, you might want to preserve or modify metadata:

async function splitWithMetadata(file, startPage, endPage) {
  const arrayBuffer = await file.arrayBuffer();
  const sourcePdf = await PDFDocument.load(arrayBuffer);
  const newPdf = await PDFDocument.create();

  // Copy pages
  const pageIndices = [];
  for (let i = startPage - 1; i < endPage; i++) {
    pageIndices.push(i);
  }
  const copiedPages = await newPdf.copyPages(sourcePdf, pageIndices);
  copiedPages.forEach(page => newPdf.addPage(page));

  // Preserve or update metadata
  const author = sourcePdf.getAuthor();
  const creator = sourcePdf.getCreator();

  newPdf.setTitle(`Extracted Pages ${startPage}-${endPage}`);
  newPdf.setAuthor(author || 'Unknown');
  newPdf.setCreator('sotool PDF Splitter');
  newPdf.setProducer('pdf-lib');
  newPdf.setCreationDate(new Date());

  const pdfBytes = await newPdf.save();
  return new Blob([pdfBytes], { type: 'application/pdf' });
}

This is useful when the recipient needs to know where the extract came from.

Performance: The 200-Page Test

I tested the page-range splitter on a 200-page, 85MB scanned document:

Operation	Time	Memory Peak
Load PDF	2.1s	180MB
Extract pages 50-100	1.8s	95MB
Split into 4 chunks	4.2s	210MB
Extract every 10th page	0.9s	45MB

The key insight: copyPages() is fast because it copies page references and shared resources efficiently. You're not deep-cloning the entire document for each output.

Optimization tip: If you're splitting a PDF into many small chunks, load the source once and reuse the PDFDocument instance. Don't reload it for each chunk.

Limitations (Honest Assessment)

Limitation	Why	Workaround
Bookmarks lost	Splitting creates new document trees	Acceptable for most extracts
Internal links break	Page references change	Rarely an issue for extracts
Large scans choke	Browser memory limits	Split server-side for 500MB+ files
Form data complexity	Some forms have cross-page dependencies	Test thoroughly with fillable PDFs

For 95% of use cases — extracting a chapter, splitting a contract, creating email-friendly chunks — these limitations don't matter.

Try the Live Tool

If you want to test browser-based PDF splitting without writing code:

👉 en.sotool.top/split

Features:

Extract specific page ranges
Split by fixed page count (every N pages)
Split by file size (email-friendly chunks)
Visual page thumbnails for selection
Pure browser-side processing
Free, no signup

The source code is open source if you want to see how the Vue 3 + pdf-lib integration works.

What's Next?

If you enjoyed this post, check out the companion piece on merging PDFs in the browser.

Next up in the series:

Compressing PDFs by downsampling embedded images
Adding watermarks with text and image overlays
Encrypting and password-protecting PDFs client-side

Have you built browser-based document tools? What's your approach to handling large files or complex page operations? Let me know in the comments.

DEV Community