I've been building browser-based PDF tools for the past year and I want to talk about something that bugs me every time I use an online PDF merger: why am I uploading my tax return to some random server?
Seriously. Think about it. You need to merge two PDFs — maybe a contract and an addendum — and the first thing every tool asks you to do is upload your files to their infrastructure. For a merge operation. That's like mailing your diary to someone so they can staple two pages together.
So I stopped using servers
About a year ago I started building PDFGem, a set of PDF tools that process everything in the browser. No uploads, no server, nothing leaves your machine. The tab is the server. Close it, data's gone.
The core idea is dead simple: WebAssembly lets you run serious code in the browser now. PDF manipulation, OCR, image conversion — stuff that used to require a beefy backend can run on your laptop's browser tab.
Here's roughly how a merge works:
import { PDFDocument } from 'pdf-lib';
async function mergePDFs(files) {
const merged = await PDFDocument.create();
for (const file of files) {
const bytes = await file.arrayBuffer();
const doc = await PDFDocument.load(bytes);
const pages = await merged.copyPages(doc, doc.getPageIndices());
pages.forEach(page => merged.addPage(page));
}
return await merged.save(); // Uint8Array, stays in browser memory
}
That save() call returns a Uint8Array. It never touches a network socket. You turn it into a Blob, create an object URL, and the user downloads it. Done.
The stack (and why I picked it)
- Astro 5 — static site gen, ships zero JS by default. Tool pages only load React when the user actually interacts.
- React 19 — islands architecture. Each tool is a self-contained island.
- pdf-lib — pure JS PDF manipulation. MIT license. Handles merging, splitting, watermarks, page numbers, signatures, metadata... basically everything except rendering.
- pdfjs-dist — Mozilla's PDF renderer. I use it for preview thumbnails and for text extraction.
- Tesseract.js — Wasm port of the Tesseract OCR engine. This one deserves its own section.
- Cloudflare Pages — static hosting. The bill is literally $0 for the hosting part because there's no compute.
28 tools total right now. Most are client-side only. A few (Word→PDF, Excel→PDF, PPT→PDF) need server-side rendering because browser engines can't faithfully reproduce Office layouts — for those I use Cloudflare Browser Rendering, which is a different tradeoff but at least the files get deleted immediately after conversion.
OCR was the hard part
Everything else was straightforward compared to OCR. Tesseract.js works — same engine as server-side Tesseract, same accuracy — but the DX is... rough.
First load downloads about 4-12MB of language data depending on the language pack (English fast model is ~4MB, the "best" quality model is closer to 12MB). It's cached after the first run, but that initial load is painful on slow connections. I ended up adding a progress bar just for the model download, which felt ridiculous but users actually appreciated it.
Speed-wise, expect 2-5 seconds per page on a modern machine. A 10-page scanned document takes 30-40 seconds. Server-side Tesseract does the same in maybe 8-10 seconds. So yeah, it's slower. But the alternative is uploading your scanned medical records to a server in who-knows-where, and I think that's a worse tradeoff for most people.
The actual Tesseract.js API is fine:
import { createWorker } from 'tesseract.js';
const worker = await createWorker('eng');
const { data: { text } } = await worker.recognize(imageBlob);
// text is your OCR result
await worker.terminate();
Simple enough. The pain is in the edge cases — memory leaks if you don't terminate workers, Safari being Safari with large canvases, and the occasional segfault in the Wasm module that you can't really debug because, well, it's Wasm.
Memory: the real enemy
Here's something nobody warns you about with client-side file processing: memory.
A 200MB PDF eats 200MB+ of browser RAM. Process it, and now you've got the input AND output in memory simultaneously. On mobile Safari, you'll hit the memory limit and the tab just dies. No error, no warning — just gone.
I handle this by processing pages in chunks:
async function processInChunks(pdf, chunkSize = 50) {
const totalPages = pdf.getPageCount();
const results = [];
for (let i = 0; i < totalPages; i += chunkSize) {
const end = Math.min(i + chunkSize, totalPages);
const chunk = await processPageRange(pdf, i, end);
results.push(chunk);
await new Promise(r => setTimeout(r, 0)); // yield to main thread
}
return mergeChunks(results);
}
That setTimeout(0) isn't a hack — it's essential. Without it, the UI freezes for the entire operation and the user thinks the page crashed. With it, you can update a progress bar between chunks.
Even with chunking, there's a practical limit around 500MB for most browsers. I'm upfront about that in the UI. If your file is bigger than that, yeah, you probably need a server.
Same approach works for images too
I applied the same architecture to Vizua — browser-based image tools. Compression, format conversion, resizing, metadata stripping. Same principle, but honestly easier because individual images are usually 1-20MB instead of the 100MB+ monsters you see with PDFs.
The Canvas API handles a lot of image transformations natively, so you don't even need Wasm for everything. Resize? Canvas. Crop? Canvas. Format conversion? Canvas toBlob() with a different MIME type. It's only when you need codec-level control (like specific JPEG quality settings or WebP encoding with fine-tuned parameters) that Wasm codecs become necessary.
Is it actually faster than server-side?
Depends on the operation and the file size. For most things under 50MB — merging, splitting, rotating, adding watermarks — client-side is faster because you skip the upload entirely. A 10MB file on a typical broadband connection takes 2-3 seconds just to upload. Client-side processing of that same file takes under a second.
For OCR and heavy conversions, servers still win on raw processing speed. But you have to factor in upload time, and for sensitive documents, the privacy benefit isn't just a nice-to-have — it's the whole point.
I don't have rigorous benchmarks to share (I should really set those up) but the general rule I've found: if the file is under ~50MB, client-side wins. Above that, it depends on your connection speed vs. your CPU.
What I'd do differently
If I started over:
Web Workers from day one. I bolted them on later for some tools but not all. Having every tool run in a Worker from the start would've made the UI consistently smooth.
Better error boundaries. Wasm crashes are ugly. The module just dies and you get a cryptic error. I've wrapped most things in try/catch now, but early on, users were seeing raw Wasm errors and thinking the tool was broken.
Streaming for large files. Right now the whole file loads into memory. The Streams API could theoretically let you process chunks without holding the entire file in RAM, but pdf-lib doesn't support streaming yet, so this is still aspirational.
TL;DR
WebAssembly makes it practical to process PDFs and images entirely in the browser. It's not always faster than server-side, but it's private by design — your files physically cannot leak because they never leave your device. The tricky parts are memory management and mobile browser limitations, but for documents under ~50MB (which covers 95% of real-world use), it works great.
If you're building a tool that handles user files, seriously consider whether you need that server. Your users' files will be better off without it.
Got questions about any of this? I'm happy to dig into specifics in the comments. Especially if you've hit the same Safari memory issues — I have opinions.
Top comments (0)