monkeymore studio

Posted on Apr 9

Building a Browser-Based PDF Splitting Tool with pdf-lib and JSZip

#webdev #javascript #tutorial #frontend

In this article, we'll explore how to implement a pure client-side PDF splitting tool that runs entirely in the browser. This tool can split PDFs by file size, split pages vertically or horizontally, making it perfect for managing large documents and creating printer-friendly layouts.

Why Browser-Based PDF Splitting?

Traditional PDF splitting typically requires:

Uploading large files to a server
Backend processing with storage limitations
Downloading multiple split files

Browser-based processing solves all these issues:

✅ Files never leave your computer - complete privacy
✅ No file size upload limits
✅ Instant processing with no network delays
✅ Zero server costs
✅ Automatic ZIP packaging for multiple output files

The Challenge: Multiple Splitting Strategies

This tool supports three different splitting approaches:

By File Size: Split large PDFs into smaller chunks (e.g., 20MB each)
Vertical Split: Split each page vertically into two separate pages
Horizontal Split: Split each page horizontally into two separate pages

Each strategy requires different PDF manipulation techniques.

Architecture Overview

Key Data Structures

Split Type Options

// Split strategy types
const splitTypes = [
  { name: "pages", title: "By Size" },      // Split by file size
  { name: "horizontal", title: "Horizontal" }, // Split pages horizontally
  { name: "vertical", title: "Vertical" },    // Split pages vertically
  { name: "size", title: "By Size" },        // Alias for pages
] as const;

// Size units
const sizeUnits = ["MB", "KB"] as const;

WorkerFunctions Interface

// hooks/usepdflib.ts
interface WorkerFunctions {
  split: (file: File, maxSizeKb: number) => Promise<ArrayBuffer | null>;
  splitPagesVertically: (file: File) => Promise<ArrayBuffer | null>;
  splitPagesHorizontally: (file: File) => Promise<ArrayBuffer | null>;
  // ... other functions
}

Split Result Structure

// Internal structure for split PDFs
interface SplitPdfInfo {
  name: string;        // Filename with page range
  bytes: Uint8Array;   // PDF bytes
}

// Example output:
// { name: "part1_pages1-5.pdf", bytes: Uint8Array }
// { name: "part2_pages6-10.pdf", bytes: Uint8Array }

Implementation Deep Dive

1. User Interface Component

The split component provides multiple splitting options:

// app/[locale]/_components/qpdf/split.tsx
export const Merge = () => {
  const [files, setFiles] = useState<File[]>([]);
  const [splitType, setSplitType] = useState("pages");
  const t = useTranslations("Split");

  const {
    value: splitSize,
    onChange: handleSplitSizeChange,
    setValue: setSplitSize,
  } = useInputValue<number>(0);
  const [splitSizeUnit, setSplitSizeUnit] = useState("MB");

  const { split, splitPagesVertically, splitPagesHorizontally } = usePdflib();

  const mergeInMain = async () => {
    console.log("Processing PDF split:", files[0]?.name);

    let outputFile: ArrayBuffer | null = null;

    if (splitType == "vertical") {
      // Split each page vertically into two pages
      outputFile = await splitPagesVertically(files[0]!);
      if (outputFile) {
        autoDownloadBlob(new Blob([outputFile]), "split.pdf");
      }
    } else if (splitType == "pages" || splitType == "size") {
      // Split by file size
      outputFile = await split(
        files[0]!,
        splitSizeUnit == "MB" ? splitSize * 1024 : splitSize,
      );
      if (outputFile) {
        autoDownloadBlob(new Blob([outputFile]), "split.zip");
      }
    } else if (splitType == "horizontal") {
      // Split each page horizontally into two pages
      outputFile = await splitPagesHorizontally(files[0]!);
      if (outputFile) {
        autoDownloadBlob(new Blob([outputFile]), "split.pdf");
      }
    }
  };

  const changeUnit = () => {
    if (splitSizeUnit == "MB") {
      setSplitSizeUnit("KB");
    } else {
      setSplitSizeUnit("MB");
    }
  };

  return (
    <PdfPage
      process={mergeInMain}
      onFiles={onPdfFiles}
      multiple={false}
      title={t("title")}
      desp={t("desp")}
    >
      <div className="p-5">
        <Radio
          defaultValue="pages"
          values={[
            { name: "pages", title: t("pages") },
            { name: "horizontal", title: t("horizontal") },
            { name: "vertical", title: t("vertical") },
            { name: "size", title: t("size") },
          ]}
          onValueChange={(e) => {
            setSplitType(e);
            if (e === "pages") {
              setSplitSize(0);
            }
          }}
        />

        {splitType == "size" && (
          <>
            <label className="label">{t("size_desp")}</label>
            <div className="flex">
              <input
                type="number"
                className="input validator"
                required
                placeholder="Type a number"
                onChange={handleSplitSizeChange}
                min="1"
                max="1000"
              />
              <button
                className="btn btn-primary join-item"
                onClick={changeUnit}
              >
                {splitSizeUnit}
              </button>
            </div>
          </>
        )}

        <p className="validator-hint">{t("size_value")}</p>
      </div>
    </PdfPage>
  );
};

Key features:

Radio button selection for split type
Size input with MB/KB unit toggle
Different output formats (PDF for page splits, ZIP for size splits)

2. Size-Based Splitting Algorithm

Splits a PDF into multiple files based on maximum size:

// hooks/pdflib.worker.js
async function splitPdf(inputFile, maxSizeKb) {
  // Convert KB to bytes with 5% overhead buffer
  const MAX_SIZE_BYTES = maxSizeKb * 1024 * 0.95;

  // Read the input PDF
  const pdfBytes = await inputFile.arrayBuffer();
  const originalPdf = await PDFDocument.load(pdfBytes);
  const totalPages = originalPdf.getPageCount();
  const splitPdfs = [];

  // Pre-calculate size of each individual page
  const pageSizes = [];
  for (let i = 0; i < totalPages; i++) {
    const tempPdf = await PDFDocument.create();
    const [copiedPage] = await tempPdf.copyPages(originalPdf, [i]);
    tempPdf.addPage(copiedPage);
    pageSizes.push((await tempPdf.save()).length);
  }

  // Split logic: Group pages into chunks that fit the size limit
  let currentPdf = await PDFDocument.create();
  let currentTotalSize = 0;
  let partNum = 1;
  let startPageIdx = 0;

  for (let i = 0; i < totalPages; i++) {
    const currentPageSize = pageSizes[i];

    // Check if adding this page would exceed the limit
    if (
      currentTotalSize + currentPageSize > MAX_SIZE_BYTES &&
      currentPdf.getPageCount() > 0
    ) {
      // Save current chunk
      const pdfBytes = await currentPdf.save();
      splitPdfs.push({
        name: `part${partNum}_pages${startPageIdx + 1}-${i}.pdf`,
        bytes: pdfBytes,
      });

      // Start new chunk
      currentPdf = await PDFDocument.create();
      currentTotalSize = 0;
      partNum++;
      startPageIdx = i;
    }

    // Add current page to chunk
    const [copiedPage] = await currentPdf.copyPages(originalPdf, [i]);
    currentPdf.addPage(copiedPage);
    currentTotalSize += currentPageSize;

    // Handle last page
    if (i === totalPages - 1) {
      const pdfBytes = await currentPdf.save();
      splitPdfs.push({
        name: `part${partNum}_pages${startPageIdx + 1}-${i + 1}.pdf`,
        bytes: pdfBytes,
      });
    }
  }

  return splitPdfs;
}

Algorithm explanation:

Pre-calculation: Calculate size of each page individually
Greedy grouping: Add pages to current chunk until size limit reached
Chunk creation: Save current chunk and start new one
Naming: Generate descriptive filenames with page ranges

3. ZIP Packaging

Packages multiple PDFs into a single ZIP file:

// hooks/pdflib.worker.js
async function zipSplitPdfs(splitPdfs, originalFileName) {
  const zip = new JSZip();

  // Add each PDF to the ZIP
  for (const pdf of splitPdfs) {
    zip.file(pdf.name, pdf.bytes);
  }

  // Generate ZIP with compression
  const zipBlob = await zip.generateAsync({
    type: "arraybuffer",
    compression: "DEFLATE",
    compressionOptions: { level: 6 },
  });

  console.log(
    `ZIP generated: ${splitPdfs.length} PDF files`,
    zipBlob.byteLength,
  );

  return Comlink.transfer(zipBlob, [zipBlob]);
}

JSZip configuration:

type: "arraybuffer": Output as ArrayBuffer for easy transfer
compression: "DEFLATE": Standard ZIP compression
level: 6: Balanced compression (1-9 scale)

4. Vertical Page Splitting

Splits each page vertically into two separate pages:

// hooks/pdflib.worker.js
async function splitPagesVertically(file) {
  const inputPdfBytes = await file.arrayBuffer();
  const inputPdfDoc = await PDFDocument.load(inputPdfBytes);
  const outputPdfDoc = await PDFDocument.create();

  // Copy pages using embedPages for efficient rendering
  const copiedPages = await outputPdfDoc.embedPages(inputPdfDoc.getPages());
  const helveticaFont = await outputPdfDoc.embedFont(StandardFonts.Helvetica);

  copiedPages.forEach((originalPage, pageIndex) => {
    // Get original page dimensions
    const { width: originalWidth, height: originalHeight } = inputPdfDoc
      .getPage(pageIndex)
      .getSize();

    // New page dimensions: same width, half height
    const newPageWidth = originalWidth;
    const newPageHeight = originalHeight / 2;

    // Create top half page
    const topPage = outputPdfDoc.addPage([newPageWidth, newPageHeight]);
    topPage.drawPage(originalPage, {
      x: 0,
      y: -newPageHeight,  // Offset to show top half
      width: originalWidth,
      height: originalHeight,
    });
    topPage.drawText(` ${pageIndex * 2 + 1} `, {
      x: 10,
      y: newPageHeight - 20,
      size: 10,
      font: helveticaFont,
    });

    // Create bottom half page
    const bottomPage = outputPdfDoc.addPage([newPageWidth, newPageHeight]);
    bottomPage.drawPage(originalPage, {
      x: 0,
      y: 0,  // No offset, shows bottom half
      width: originalWidth,
      height: originalHeight,
    });
    bottomPage.drawText(` ${pageIndex * 2 + 2} `, {
      x: 10,
      y: newPageHeight - 20,
      size: 10,
      font: helveticaFont,
    });
  });

  return outputPdfDoc.save();
}

Visual explanation:

Original Page (A4):
+------------------+
|                  |
|    TOP HALF      |  --> New Page 1
|                  |
+------------------+
|                  |
|   BOTTOM HALF    |  --> New Page 2
|                  |
+------------------+

5. Horizontal Page Splitting

Splits each page horizontally into two separate pages:

// hooks/pdflib.worker.js
async function splitPagesHorizontally(file) {
  const inputPdfBytes = await file.arrayBuffer();
  const inputPdfDoc = await PDFDocument.load(inputPdfBytes);
  const outputPdfDoc = await PDFDocument.create();

  const copiedPages = await outputPdfDoc.embedPages(inputPdfDoc.getPages());
  const helveticaFont = await outputPdfDoc.embedFont(StandardFonts.Helvetica);

  copiedPages.forEach((originalPage, pageIndex) => {
    const { width: originalWidth, height: originalHeight } = inputPdfDoc
      .getPage(pageIndex)
      .getSize();

    // New page dimensions: half width, same height
    const newPageWidth = originalWidth / 2;
    const newPageHeight = originalHeight;

    // Create left half page
    const leftPage = outputPdfDoc.addPage([newPageWidth, newPageHeight]);
    leftPage.drawPage(originalPage, {
      x: 0,
      y: 0,
      width: originalWidth,
      height: originalHeight,
    });
    leftPage.drawText(` ${pageIndex * 2 + 1} `, {
      x: 10,
      y: newPageHeight - 20,
      size: 10,
      font: helveticaFont,
    });

    // Create right half page
    const rightPage = outputPdfDoc.addPage([newPageWidth, newPageHeight]);
    rightPage.drawPage(originalPage, {
      x: -newPageWidth,  // Offset to show right half
      y: 0,
      width: originalWidth,
      height: originalHeight,
    });
    rightPage.drawText(` ${pageIndex * 2 + 2} `, {
      x: 10,
      y: newPageHeight - 20,
      size: 10,
      font: helveticaFont,
    });
  });

  return outputPdfDoc.save();
}

Visual explanation:

Original Page (A4):
+--------+--------+
|        |        |
|  LEFT  | RIGHT  |
|  HALF  |  HALF  |
|        |        |
+--------+--------+
    |         |
    v         v
 New Page 1  New Page 2

Complete Processing Flow

Key Technical Decisions

1. Why Pre-calculate Page Sizes?

Pre-calculating individual page sizes allows for:

Accurate size-based splitting
Optimal page grouping
Prevention of oversized chunks

Without pre-calculation, we'd have to guess or use a trial-and-error approach.

2. Why JSZip?

JSZip provides:

Pure JavaScript ZIP creation (no server needed)
Compression to reduce download size
Browser-compatible ArrayBuffer output
Easy file organization

3. Why embedPages()?

The embedPages() method from pdf-lib:

Efficiently embeds existing pages into new documents
Preserves all page content (text, images, annotations)
Allows positioning with x/y offsets for splitting
Better performance than copying page content manually

4. Page Numbering

Each split page gets a visible page number:

page.drawText(` ${pageIndex * 2 + 1} `, {
  x: 10,
  y: newPageHeight - 20,
  size: 10,
  font: helveticaFont,
  color: rgb(0, 0, 0),
});

This helps users keep track of the original page order.

Benefits of This Architecture

Privacy First: Files never leave the browser
No Size Limits: Handle PDFs of any size
Multiple Strategies: Size-based, vertical, or horizontal splitting
Automatic Packaging: ZIP file for multiple outputs
Page Tracking: Visible page numbers on split pages
Responsive UI: Web Workers prevent blocking

Try It Yourself

Want to split your PDFs without uploading them to a server? Try our free browser-based tool:

Split PDF Online →

All processing happens locally in your browser - your files never leave your computer!

Conclusion

Building a browser-based PDF splitting tool demonstrates how pdf-lib combined with JSZip can handle complex PDF manipulation tasks entirely client-side. The three different splitting strategies (size-based, vertical, horizontal) showcase the flexibility of the pdf-lib library.

This approach is ideal for:

Splitting large PDFs for email attachments
Creating printer-friendly page layouts
Managing document archives
Preparing documents for mobile viewing

The automatic ZIP packaging makes it easy to download multiple split files, while the visible page numbering helps users maintain document organization.

DEV Community