Most PDF processing tools send your file to a server. The browser makes a multipart POST request, the server runs LibreOffice or Ghostscript or a Python library, and you download the result.
There's a better approach for many PDF operations: process the file entirely in the browser using PDF-lib and Web Workers. No server involved. No file transmission. Works offline once the page is loaded.
This post explains how browser-side PDF compression works and when it's the right choice.
The Architecture
Client-side PDF processing in the browser uses three components:
- PDF-lib — a pure JavaScript PDF manipulation library. Reads and writes PDF structure without any native dependencies.
- Web Workers — moves the CPU-intensive processing off the main thread so the UI stays responsive.
- Canvas API — used to re-encode embedded images at reduced quality.
The processing flow:
User drops PDF
→ FileReader reads it as ArrayBuffer
→ ArrayBuffer passed to Web Worker via postMessage()
→ Worker uses PDF-lib to parse the PDF
→ Worker extracts embedded image data
→ Worker re-encodes images via Canvas API at lower quality
→ Worker re-embeds compressed images into PDF-lib document
→ Worker returns the new ArrayBuffer
→ Main thread creates a Blob URL for download
What "Compression" Actually Means for PDFs
PDFs don't have a single compression knob. What most PDF compressors actually do:
Image re-encoding: The biggest wins come from re-encoding embedded JPEG images at lower quality. A JPEG at quality 90 is roughly 3× larger than the same image at quality 60 with minimal perceptible difference for most document content.
Content stream compression: PDF-lib applies zlib compression to content streams. Most PDFs already have this applied, so gains here are minimal.
Font subsetting: Embedding only the glyphs actually used from a font, rather than the full font file. Significant savings for PDFs using large CJK fonts.
Object deduplication: Some PDF generators embed the same resource multiple times. Deduplication finds and merges these.
For typical office documents, image re-encoding is where 80-90% of the compression gain comes from.
Code: PDF Image Compression with PDF-lib
import { PDFDocument, decodePDFRawStream } from 'pdf-lib';
async function compressPdf(inputArrayBuffer, quality = 0.7) {
const pdfDoc = await PDFDocument.load(inputArrayBuffer);
const pages = pdfDoc.getPages();
for (const page of pages) {
const { node } = page;
// Get all XObjects (images) on this page
const xObjects = node.XObject?.();
if (!xObjects) continue;
for (const [name, xObjectRef] of Object.entries(xObjects.dict)) {
const xObject = pdfDoc.context.lookup(xObjectRef);
if (xObject?.Subtype?.name !== 'Image') continue;
// Get image dimensions
const width = xObject.Width?.numberValue();
const height = xObject.Height?.numberValue();
if (!width || !height) continue;
// Decode image data
const rawData = decodePDFRawStream(xObject).decode();
// Re-encode via Canvas at target quality
const canvas = new OffscreenCanvas(width, height);
const ctx = canvas.getContext('2d');
const imageData = new ImageData(
new Uint8ClampedArray(rawData), width, height
);
ctx.putImageData(imageData, 0, 0);
const blob = await canvas.convertToBlob({
type: 'image/jpeg',
quality
});
const compressedData = new Uint8Array(await blob.arrayBuffer());
// Replace image bytes in PDF
xObject.contents = compressedData;
xObject.dict.set(pdfDoc.context.obj('Filter'), pdfDoc.context.obj('DCTDecode'));
}
}
return await pdfDoc.save();
}
Note: The above is a simplified illustration. Real-world PDF image extraction requires handling various filter types (DCTDecode for JPEG, FlateDecode for PNG-embedded images, JPXDecode for JPEG 2000), colour space conversions, and different stream encoding combinations. The Compress PDF tool at Ultimate Tools handles these edge cases.
Web Worker Integration
Running PDF-lib on the main thread freezes the UI for large PDFs. Move it to a Worker:
// compress-worker.js
import { PDFDocument } from 'pdf-lib';
self.onmessage = async ({ data: { buffer, quality } }) => {
try {
const result = await compressPdf(buffer, quality);
self.postMessage({ success: true, buffer: result }, [result.buffer]);
} catch (err) {
self.postMessage({ success: false, error: err.message });
}
};
// Main thread
const worker = new Worker('/compress-worker.js', { type: 'module' });
worker.postMessage({ buffer: pdfArrayBuffer, quality: 0.6 }, [pdfArrayBuffer]);
worker.onmessage = ({ data }) => {
if (data.success) {
const blob = new Blob([data.buffer], { type: 'application/pdf' });
const url = URL.createObjectURL(blob);
// Trigger download
}
};
The [pdfArrayBuffer] in the postMessage call is the transferables list — it transfers ownership of the buffer to the Worker without copying, which matters for large PDFs.
When to Use Server-Side vs Client-Side
| Factor | Client-Side | Server-Side |
|---|---|---|
| File privacy | Better — file never leaves device | Depends on service |
| Large files (>100MB) | Limited by browser memory | No limit |
| Scanned PDFs (image-heavy) | Works well | Works well |
| Font subsetting | Limited in PDF-lib | Full support (Ghostscript) |
| Compression ratio | Good (60-80% for image PDFs) | Better (80-90% possible) |
| Infrastructure cost | Zero | Server required |
| Works offline | Yes (after page load) | No |
| GDPR/privacy compliance | Easier | More complex |
For documents under ~50MB with embedded images, browser-side compression typically achieves 40–70% reduction — sufficient for most use cases.
Try It
The Compress PDF tool at Ultimate Tools implements this approach — PDF-lib + Web Worker + Canvas re-encoding. Free, no upload, works in your browser.
Source patterns worth studying: the PDF-lib documentation and the PDF-lib GitHub examples cover the full API including encryption, form filling, and page manipulation.
Other browser-based PDF tools using the same architecture: Merge PDF, Split PDF, Remove Pages from PDF.
Top comments (0)