I built this because I watched a lawyer upload a client's contract to a "free PDF tool" with a .ru domain. Never again.
What We're Building
A browser-based PDF processor that extracts text, merges pages, and adds watermarks. Zero server roundtrips. The PDF never leaves the machine.
Step 1: The Library
// pdf-lib handles manipulation, pdfjs-dist handles extraction
import { PDFDocument } from 'pdf-lib';
import * as pdfjs from 'pdfjs-dist';
// pdfjs needs its worker loaded manually in most bundlers
pdfjs.GlobalWorkerOptions.workerSrc = `https://cdnjs.cloudflare.com/ajax/libs/pdf.js/${pdfjs.version}/pdf.worker.min.js`;
Step 2: Read Without Uploading
async function processLocalPDF(file) {
// File stays in browser memory only
const arrayBuffer = await file.arrayBuffer();
const pdf = await PDFDocument.load(arrayBuffer);
// Extract text from page 1
const pdfJsDoc = await pdfjs.getDocument({ data: arrayBuffer }).promise;
const page = await pdfJsDoc.getPage(1);
const textContent = await page.getTextContent();
return textContent.items.map(item => item.str).join(' ');
}
*Step 3: Modify and Download
*
async function watermarkAndSave(pdfBytes, watermarkText) {
const pdf = await PDFDocument.load(pdfBytes);
const pages = pdf.getPages();
// Add watermark to each page
pages.forEach(page => {
page.drawText(watermarkText, {
x: 50,
y: page.getHeight() - 50,
size: 12,
color: { r: 0.9, g: 0.1, b: 0.1 },
opacity: 0.5,
});
});
const modified = await pdf.save();
// Trigger download, no server involved
const blob = new Blob([modified], { type: 'application/pdf' });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'processed.pdf';
a.click();
URL.revokeObjectURL(url);
}
*The Gotcha *
PDFs with embedded fonts are 10x larger in memory than their file size. A 5MB PDF can balloon to 80MB when pdf-lib parses it. I cap processing at 50MB input files—above that, I warn users that their tab might crash.
**One Thing I'd Do Differently
**I initially tried to parse PDFs with regex. Don't. The spec is 800 pages of chaos. Use the libraries. They're battle-tested by Mozilla and maintained by people who've read the spec so you don't have to.
**Question ?
**Has anyone solved client-side PDF creation from scratch (not manipulation) at reasonable speeds? Generating a 100-page report from JSON data takes 4 seconds in my tests. Acceptable, but feels wrong.
Top comments (0)