Your PDFs contain sensitive data. Tax forms, contracts, medical records. Yet most online tools want you to upload them to mysterious servers in who-knows-where.
I built Kreotar's PDF compressor to solve this exact paranoia. Everything happens in your browser. Here's exactly how you can implement the same architecture.
Step 1: The Architecture Decision
We use PDF-lib (client-side JS) combined with custom WASM modules for image compression. The key is handling everything in a Web Worker so the UI stays responsive during heavy processing.
JavaScript
Copy
// pdf-processor.worker.js
import * as PDFLib from 'pdf-lib';
import { PDFDocument } from 'pdf-lib';
import { createImageCompressionWasm } from './wasm-image-compress';
self.onmessage = async (event) => {
const { fileBuffer, quality = 0.7 } = event.data;
try {
const pdfDoc = await PDFDocument.load(fileBuffer);
const pages = pdfDoc.getPages();
let totalSaved = 0;
// Process each page
for (let i = 0; i < pages.length; i++) {
const page = pages[i];
// Extract images from page
const images = await extractImagesFromPage(page);
for (const image of images) {
const originalSize = image.data.length;
// Compress using WASM (mozjpeg compiled to WASM)
const compressed = await createImageCompressionWasm({
data: image.data,
quality: quality * 100,
format: 'jpeg'
});
totalSaved += (originalSize - compressed.length);
// Replace image in PDF
await replaceImageInPage(page, image.ref, compressed);
}
// Report progress
self.postMessage({
type: 'progress',
current: i + 1,
total: pages.length
});
}
const pdfBytes = await pdfDoc.save();
self.postMessage({
type: 'complete',
result: pdfBytes,
compressionRatio: totalSaved / fileBuffer.length
});
} catch (error) {
self.postMessage({ type: 'error', message: error.message });
}
};
Step 2: The React Integration
Here's how to wire it up in your frontend:
import { useState, useRef, useCallback } from 'react';
import PdfWorker from './pdf-processor.worker?worker';
const PdfCompressor = () => {
const [status, setStatus] = useState('idle');
const [progress, setProgress] = useState(0);
const [compressionStats, setCompressionStats] = useState(null);
const workerRef = useRef(null);
const processPdf = useCallback(async (file) => {
setStatus('processing');
setProgress(0);
// Initialize worker
const worker = new PdfWorker();
workerRef.current = worker;
// Read file as ArrayBuffer
const arrayBuffer = await file.arrayBuffer();
return new Promise((resolve, reject) => {
worker.onmessage = (e) => {
const { type, current, total, result, compressionRatio, message } = e.data;
switch(type) {
case 'progress':
setProgress((current / total) * 100);
break;
case 'complete':
setStatus('complete');
setCompressionStats({
originalSize: file.size,
newSize: result.length,
ratio: (1 - compressionRatio) * 100
});
// Create download blob
const blob = new Blob([result], { type: 'application/pdf' });
const url = URL.createObjectURL(blob);
// Auto-download
const a = document.createElement('a');
a.href = url;
a.download = `compressed-${file.name}`;
a.click();
URL.revokeObjectURL(url);
resolve(result);
worker.terminate();
break;
case 'error':
setStatus('error');
reject(new Error(message));
worker.terminate();
break;
}
};
// Start processing
worker.postMessage({
fileBuffer: arrayBuffer,
quality: 0.7 // Compression quality
}, [arrayBuffer]); // Transfer ownership for performance
});
}, []);
return (
<div className="pdf-compressor">
<input
type="file"
accept=".pdf"
onChange={(e) => e.target.files?.[0] && processPdf(e.target.files[0])}
disabled={status === 'processing'}
/>
{status === 'processing' && (
<div className="progress-bar">
<div style={{ width: `${progress}%` }} />
<span>{Math.round(progress)}% compressed</span>
</div>
)}
{compressionStats && (
<div className="stats">
<p>Original: {(compressionStats.originalSize / 1024).toFixed(2)} KB</p>
<p>Saved: {compressionStats.ratio.toFixed(1)}%</p>
</div>
)}
</div>
);
};
Step 3: Handling the Gotchas
Memory limits: Browsers crash around 2GB of RAM usage. For large PDFs, I process pages in chunks:
// Handle large PDFs in chunks to avoid memory crashes
const processInChunks = async (pages, chunkSize = 5) => {
const results = [];
for (let i = 0; i < pages.length; i += chunkSize) {
const chunk = pages.slice(i, i + chunkSize);
// Force garbage collection between chunks (hacky but necessary)
if (i > 0) {
await new Promise(resolve => setTimeout(resolve, 100));
}
const processed = await Promise.all(chunk.map(processPage));
results.push(...processed);
}
return results;
};
CORS issues: If your WASM module is on a CDN, ensure proper headers:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
The Privacy Win
Your file never leaves your laptop. Check the Network tab in DevTools - zero uploads. That's the magic of client-side processing.
I made the mistake early on of trying to use serverless functions for this. The latency killed the UX. Plus, who wants to upload their tax documents to a random Lambda function?
Try it yourself: Kreotar PDF Compressor
What other PDF operations are you trying to client-side? I might already have a tool for it in the sitemap above.
Top comments (0)