DEV Community

Cover image for 🕵️‍♂️ I Built a Privacy-First PDF Toolkit Because Most “Free” Tools Are Sketchy
Kushagra
Kushagra

Posted on

🕵️‍♂️ I Built a Privacy-First PDF Toolkit Because Most “Free” Tools Are Sketchy

Whenever you Google “merge pdf online free”, you’re basically playing Russian roulette with your data.

Most of the top tools upload your files to their servers.
Contracts. Bank statements. ID scans. Confidential PDFs.

Even if they say “we delete it after 1 hour” — that’s still 1 hour too long.

So I decided to build something different.

I built PixelDoc — a 100% client-side PDF toolkit that runs entirely in your browser.

👉 Live: https://pixeldoc.netlify.app


🧠 The Moment That Triggered This

I needed to merge large bank statements for a loan application.

Problems I hit:
• Upload failed because of weak internet
• File size limit (50MB+) behind paywall
• “Upgrade to Pro” wall for basic features
• Privacy anxiety

That’s when it hit me:

Modern browsers are powerful enough to handle PDF processing locally.

Why are we still uploading everything to random servers?


🔐 The Core Philosophy

PixelDoc follows one strict rule:

If your browser can do it — the server shouldn’t.

There is no backend processing.
No document uploads.
No temporary storage.
No tracking scripts spying on files.

Everything runs in your device memory.


🛠 Tech Stack Breakdown

This wasn’t just a weekend HTML project.
I wanted performance + privacy + good UX.

⚛️ Frontend
• React + Vite (fast build + lightning HMR)
• Tailwind CSS (clean utility design)
• Framer Motion (smooth transitions)

📄 PDF Processing
• pdf-lib (pure JavaScript PDF manipulation)

🧠 AI / ML (In Browser)
@imgly/background-removal (WebAssembly + ONNX)
• tesseract.js (OCR in Web Workers)

🖼 Image Tools
• browser-image-compression
• Canvas API

🚀 Hosting
• Netlify (static deployment)


🔎 How It Works (Under the Hood)

The interesting part?
There is no API.

Everything is done using ArrayBuffers directly in memory.


1️⃣ Merging PDFs Locally

Instead of uploading files to a server, the browser reads them as ArrayBuffers and merges them in memory:

import { PDFDocument } from 'pdf-lib';

async function mergePDFs(files) {
  const mergedPdf = await PDFDocument.create();

  for (const file of files) {
    const arrayBuffer = await file.arrayBuffer();
    const pdf = await PDFDocument.load(arrayBuffer);
    const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());
    copiedPages.forEach((page) => mergedPdf.addPage(page));
  }

  const savedBytes = await mergedPdf.save();
  // Trigger download
}
Enter fullscreen mode Exit fullscreen mode

No network request.
No server storage.
Just browser memory.


2️⃣ AI Background Removal (Fully Client-Side)

This was the hardest part.

The ONNX model is downloaded once and cached.
Inference runs via WebAssembly using the user’s CPU/GPU.

import imglyRemoveBackground from "@imgly/background-removal";

const removeBg = async (imageSrc) => {
  try {
    const blob = await imglyRemoveBackground(imageSrc);
    const url = URL.createObjectURL(blob);
    setResult(url);
  } catch (error) {
    console.error("AI Error:", error);
  }
};
Enter fullscreen mode Exit fullscreen mode

This means:
• No cloud AI
• No image uploads
• No API keys
• No usage limits


✨ Current Features

PixelDoc is no longer “just merge”.

📄 PDF Tools
• Merge
• Split
• Compress
• Protect / Unlock
• OCR
• PDF → Text

🖼 Image Tools
• AI Background Removal
• HEIC → JPG (iPhone support)
• Compress
• Resize

📑 Converters
• Word → PDF
• Excel → PDF


⚡ Performance Optimizations

Because it’s client-side, initial load time is critical.

So I implemented:

🧩 Code Splitting

Heavy libraries like tesseract.js and pdf-lib are lazy-loaded only when needed.

🧠 Service Worker Caching

Models and large WASM files are cached after first use.

📈 SEO Strategy

Even though it’s a React SPA, I added:
• Rich JSON-LD (FAQPage schema)
• Structured How-to guides
• Tool-specific landing content

Trying to avoid the “SPA = SEO disaster” problem.


💭 What I Learned
1. Browsers are insanely powerful now.
2. Privacy-first tools are possible — we just don’t prioritize them.
3. WebAssembly changes what’s possible in frontend engineering.
4. Most SaaS tools add servers where they aren’t necessary.
5. Performance thinking makes you a better engineer.


Tell me where it fails.

👉 https://pixeldoc.netlify.app

If you found this interesting, drop a ❤️
I’m planning to write more deep-dive breakdowns on privacy-first engineering.

Top comments (0)