How WebAssembly turned a 50-year-old PDF compression engine into a privacy-first browser tool
A few weeks ago a friend sent me a scanned contract. It was 18MB — too large to attach to a reply email, too sensitive to upload to a random compression website.
The "too sensitive" part is the one most people skip past. They just upload it. Most free PDF compressors work by sending your file to their servers, running compression remotely, and sending the result back. That's a reasonable architecture. It's also a data flow that includes your document passing through infrastructure you don't control, being stored temporarily on someone else's disk, and processed by systems whose security posture you can't verify.
For a contract, a tax return, an NDA — that matters.
So I built the compression tool inside ihatepdf.cv differently. Everything runs in your browser tab. Your file never leaves your device. Here's exactly how.
The engine: Ghostscript compiled to WebAssembly
Ghostscript is not new software. It has been the gold standard for PostScript and PDF processing since 1988. Adobe Acrobat uses it internally. Professional print shops use it. It is written in C and has been battle-tested on hundreds of millions of documents.
The key insight: Ghostscript can be compiled to WebAssembly. That means the same engine that runs on servers can run inside a browser tab, at near-native speed, with no server required.
The compression pipeline uses a Web Worker so the main thread stays responsive while Ghostscript processes the PDF:
javascriptconst worker = new Worker('/background-worker.js');
worker.postMessage({
data: {
psDataURL: blobUrl,
config: config,
},
target: 'wasm',
});
worker.onmessage = async (e) => {
const response = await fetch(e.data);
const compressedBlob = await response.blob();
// download the result
};
The user uploads a file, it becomes a Blob URL, the Blob URL is handed to the worker, Ghostscript processes it inside WebAssembly, and the result comes back as another Blob. Nothing goes over the network. Nothing is stored on any server. The whole thing happens inside the browser's sandboxed environment.
The five optimizations Ghostscript applies simultaneously
This is what separates proper PDF compression from naive approaches that just re-export the file.
- Image downsampling Photos and raster graphics embedded in PDFs are often stored at their original resolution — 600 DPI from a scanner, 300 DPI from a camera. For screen viewing, 150 DPI is visually identical. Ghostscript resamples images using bicubic interpolation, which is the highest-quality downsampling algorithm available. The result is smaller with no perceptible visual difference at normal viewing sizes.
- JPEG recompression After downsampling, images are re-encoded at a quality level matched to the compression preset: javascriptconst qualityToJpegQuality = { '/screen': 40, // 72 DPI — maximum compression '/ebook': 60, // 150 DPI — balanced '/printer': 80, // 300 DPI — print quality '/prepress': 92, // 300 DPI — professional print };
- Font subsetting This one surprises people. Embedded fonts in PDFs often contain the entire font family — every character, every glyph, including ones your document never uses. A single embedded font can be 200–400KB. Font subsetting trims the embedded data to only characters actually present in your document. For documents using common fonts, this alone can reduce file size by 20–30%.
- Metadata stripping Every PDF created by Word, Acrobat, Google Docs, or any other tool embeds metadata: author name, creation software, revision history, thumbnail previews. This data adds size and exposes information you probably don't need to include when sharing. Ghostscript strips it during compression.
- Stream recompression The content streams that encode page content are recompressed using the most efficient lossless algorithm. Text and vector graphics are entirely unaffected — they are never converted to pixels, which is why text stays perfectly sharp at every compression level.
Why text never gets blurry
This is the most common concern and the most important thing to understand about PDF compression.
Text in a PDF is not stored as an image. It is stored as vector instructions: draw this glyph, at these coordinates, with this font, at this size. Vector data has no resolution. It is mathematically perfect at every zoom level.
Compression only affects raster content — photos, scanned pages, embedded images. A document that is entirely text and vector graphics will compress almost nothing using image downsampling, because there are no raster images to downsample. But it will still benefit from font subsetting and stream recompression.
The three presets and when to use each
Rather than exposing Ghostscript's raw configuration options, the tool maps to three practical use cases:
Light (20–30% reduction): Uses /printer quality — 300 DPI images, 80% JPEG quality. Output is indistinguishable from the original. Use this for design portfolios, documents you will print, or any case where you want the smallest reduction in quality.
Medium (40–50% reduction): Uses /ebook quality — 150 DPI images, 60% JPEG quality. This is the sweet spot for CVs, contracts, reports, and email attachments. Looks identical on screen. Most people cannot tell the difference between a Medium-compressed document and the original when reading normally.
Heavy (60–70% reduction): Uses /screen quality — 72 DPI images, 40% JPEG quality. Maximum compression. Text stays perfectly sharp; photos become noticeably softer at high zoom. Use this for archiving, upload portals with strict size limits, or anywhere you need to get under 1–2MB.
The memory management problem
Processing large PDFs in a browser is harder than it sounds. A 50MB PDF with high-resolution images can consume 200–300MB of RAM during processing — 3–5× overhead is normal. On mobile devices with 2–4GB total RAM, this matters.
The tool estimates memory requirements before starting:
javascriptconst getDeviceCapabilities = () => {
const isMobile = /Android|iPhone/i.test(navigator.userAgent);
const deviceMem = navigator.deviceMemory || 4;
if (isMobile && screen.width < 768) {
return { maxFileSize: 50 * 1024 * 1024 }; // 50MB on phones
}
if (deviceMem < 4) {
return { maxFileSize: 100 * 1024 * 1024 }; // 100MB on low memory
}
return { maxFileSize: 150 * 1024 * 1024 }; // 150MB on desktop
};
navigator.deviceMemory is not available in Safari, so the fallback assumes 4GB — conservative enough to handle most cases without crashing.
The privacy architecture — verifiable, not just claimed
Most privacy claims are policies. "We delete files after 2 hours." "We don't share data." These are statements you have to trust.
The architecture here makes the claim verifiable. Open DevTools → Network tab → compress a PDF. Watch the network requests. You will see the Ghostscript WASM file load once and get cached. You will see zero upload requests for your document. No bytes of your PDF travel over the network.
After first load, the tool also works fully offline. The service worker caches the WebAssembly libraries:
javascriptself.addEventListener('install', (e) => {
e.waitUntil(
caches.open('ihatepdf-v1').then((cache) => cache.addAll([
'/',
'https://unpkg.com/pdf-lib@1.17.1/dist/pdf-lib.min.js',
// ghostscript wasm + other libraries
]))
);
});
Disconnect from WiFi. Reload the page. Compress a PDF. It works. There is no server to connect to because there is no server involved.
The honest trade-off
A cloud server with a dedicated CPU will compress a 150MB PDF faster than Ghostscript-WASM running in a browser tab on a four-year-old laptop.
If you are compressing large volumes of very large files and speed is the priority, a server-based tool is genuinely better for that use case.
For everything else — privacy, offline use, no account, no size limits beyond your device's RAM, no watermarks on the output — the local approach wins. The Ghostscript engine is identical either way. The compression quality is the same. The only difference is where the processing happens.
Try it
ihatepdf.cv/compress-pdf
No account. No upload. No watermark. The source of the compression quality is the same Ghostscript engine that professional tools use — it just runs on your device instead of theirs.
If you process sensitive documents and have questions about how the architecture works, or if something is broken, I read comments.
This is part of an ongoing series on building a privacy-first PDF toolkit entirely in the browser using WebAssembly. The technical deep-dive on the full architecture is at ihatepdf.cv/technical-blog
Top comments (0)