Process PDFs in Browser Without Uploading: A Practical Guide

#javascript #webdev #tutorial #productivity

I built this because I watched a lawyer upload a client's contract to a "free PDF tool" with a .ru domain. Never again.
What We're Building
A browser-based PDF processor that extracts text, merges pages, and adds watermarks. Zero server roundtrips. The PDF never leaves the machine.
Step 1: The Library


// pdf-lib handles manipulation, pdfjs-dist handles extraction
import { PDFDocument } from 'pdf-lib';
import * as pdfjs from 'pdfjs-dist';

// pdfjs needs its worker loaded manually in most bundlers
pdfjs.GlobalWorkerOptions.workerSrc = `https://cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjs.version}/pdf.worker.min.js`;

Step 2: Read Without Uploading

async function processLocalPDF(file) {
  // File stays in browser memory only
  const arrayBuffer = await file.arrayBuffer();
  const pdf = await PDFDocument.load(arrayBuffer);

  // Extract text from page 1
  const pdfJsDoc = await pdfjs.getDocument({ data: arrayBuffer }).promise;
  const page = await pdfJsDoc.getPage(1);
  const textContent = await page.getTextContent();

  return textContent.items.map(item => item.str).join(' ');
}

*Step 3: Modify and Download
*

async function watermarkAndSave(pdfBytes, watermarkText) {
  const pdf = await PDFDocument.load(pdfBytes);
  const pages = pdf.getPages();

  // Add watermark to each page
  pages.forEach(page => {
    page.drawText(watermarkText, {
      x: 50,
      y: page.getHeight() - 50,
      size: 12,
      color: { r: 0.9, g: 0.1, b: 0.1 },
      opacity: 0.5,
    });
  });

  const modified = await pdf.save();

  // Trigger download, no server involved
  const blob = new Blob([modified], { type: 'application/pdf' });
  const url = URL.createObjectURL(blob);
  const a = document.createElement('a');
  a.href = url;
  a.download = 'processed.pdf';
  a.click();
  URL.revokeObjectURL(url);
}

*The Gotcha *
PDFs with embedded fonts are 10x larger in memory than their file size. A 5MB PDF can balloon to 80MB when pdf-lib parses it. I cap processing at 50MB input files—above that, I warn users that their tab might crash.
**One Thing I'd Do Differently
**I initially tried to parse PDFs with regex. Don't. The spec is 800 pages of chaos. Use the libraries. They're battle-tested by Mozilla and maintained by people who've read the spec so you don't have to.
**Question ?
**Has anyone solved client-side PDF creation from scratch (not manipulation) at reasonable speeds? Generating a 100-page report from JSON data takes 4 seconds in my tests. Acceptable, but feels wrong.

DEV Community

Process PDFs in Browser Without Uploading: A Practical Guide

Top comments (0)