TateLyman

Posted on Mar 28

The Privacy Problem with Online PDF Tools (and How I Fixed It)

#privacy #security #tools #webdev

Last year I needed to merge two PDFs before sending them to my landlord. I Googled "merge PDF online," clicked the first result, and uploaded my lease agreement and bank statement to some random website.

Then I thought about what I'd just done.

I'd handed a stranger my full legal name, address, bank account number, and monthly income. The site's privacy policy was 4,000 words of legalese that boiled down to "we can do whatever we want with your data."

That bugged me enough to build my own PDF tools. They run entirely in the browser. Your files never touch a server.

How Bad Is It, Really?

I tested the top 10 PDF tools from Google search results. Here's what I found:

8 out of 10 upload your file to their server for processing
3 store files for "up to 24 hours" (their words)
2 had privacy policies that allowed sharing data with "partners"
1 didn't even have HTTPS on the upload endpoint (yikes)

Think about the kinds of documents people put through PDF tools:

Legal: contracts, NDAs, court filings
Medical: lab results, insurance claims, prescriptions
Financial: tax returns, bank statements, pay stubs
Business: internal memos, hiring docs, IP filings

Every one of those is a goldmine for anyone who intercepts or scrapes them. And you're trusting that some free PDF tool with ads plastered everywhere is handling your data responsibly.

Client-Side PDF Processing Is Real

Here's the thing — you don't actually need a server for most PDF operations. The pdf-lib library lets you manipulate PDFs entirely in JavaScript, running in the browser.

Merging PDFs

Merging is probably the most common operation. With pdf-lib, it's straightforward:

import { PDFDocument } from 'pdf-lib';

async function mergePDFs(pdfFiles) {
  const merged = await PDFDocument.create();

  for (const file of pdfFiles) {
    const bytes = await file.arrayBuffer();
    const doc = await PDFDocument.load(bytes);
    const pages = await merged.copyPages(doc, doc.getPageIndices());
    pages.forEach(page => merged.addPage(page));
  }

  const mergedBytes = await merged.save();
  return new Blob([mergedBytes], { type: 'application/pdf' });
}

That's it. The file goes from <input type="file"> to arrayBuffer to pdf-lib, and back out as a downloadable blob. At no point does it leave the browser.

Splitting PDFs

Splitting works the same way — you're just copying specific pages instead of all of them:

async function splitPDF(pdfFile, pageRanges) {
  const bytes = await pdfFile.arrayBuffer();
  const source = await PDFDocument.load(bytes);
  const results = [];

  for (const range of pageRanges) {
    const newDoc = await PDFDocument.create();
    const indices = [];
    for (let i = range.start - 1; i < range.end; i++) {
      indices.push(i);
    }
    const pages = await newDoc.copyPages(source, indices);
    pages.forEach(page => newDoc.addPage(page));
    results.push(await newDoc.save());
  }

  return results;
}

Want pages 1-3 as one file and pages 4-10 as another? Pass [{start: 1, end: 3}, {start: 4, end: 10}] and you get two separate PDFs back.

Compressing PDFs

Compression is trickier because pdf-lib doesn't have a built-in "compress" function. But most PDF bloat comes from embedded images. The approach I took:

Parse the PDF and find all embedded images
Decode each image
Re-encode it at lower quality using Canvas + toBlob
Replace the original image data in the PDF

async function compressImage(imageBytes, quality = 0.7) {
  const blob = new Blob([imageBytes]);
  const bitmap = await createImageBitmap(blob);

  const canvas = document.createElement('canvas');
  canvas.width = bitmap.width;
  canvas.height = bitmap.height;
  const ctx = canvas.getContext('2d');
  ctx.drawImage(bitmap, 0, 0);

  return new Promise(resolve => {
    canvas.toBlob(resolve, 'image/jpeg', quality);
  });
}

This typically shrinks image-heavy PDFs by 40-60%. Text-only PDFs don't change much, but those are usually small anyway.

PDF to Image

For rendering PDF pages as images, I use pdf.js (Mozilla's PDF renderer). It draws each page onto a Canvas element:

async function pdfPageToImage(pdfDoc, pageNum, scale = 2) {
  const page = await pdfDoc.getPage(pageNum);
  const viewport = page.getViewport({ scale });

  const canvas = document.createElement('canvas');
  canvas.width = viewport.width;
  canvas.height = viewport.height;

  await page.render({
    canvasContext: canvas.getContext('2d'),
    viewport
  }).promise;

  return canvas.toDataURL('image/png');
}

The scale factor matters a lot. Scale 1 gives you screen-resolution output. Scale 2 gives you print-quality. Scale 3 if you really need sharp text. Each jump doubles the pixel count, so don't go crazy with it.

The Verification Problem

Here's the hard part: how does a user actually verify their files aren't being uploaded? They shouldn't have to take my word for it.

A few things I did to make this transparent:

No network requests after page load. Open DevTools, go to the Network tab, and watch. After the initial page load, there's zero network activity when you process a file. You can even disconnect your WiFi and it still works.

Open source approach. All the processing logic is in the client-side JavaScript. View source and you can trace exactly what happens to your file. There's no minified blob hiding a sneaky fetch() call.

No analytics on file content. I track page views (everyone does), but I have zero visibility into what files users process. I literally couldn't access your data if I wanted to.

Why Don't More Tools Work This Way?

Money, mostly. Server-side processing means:

You can gate features behind paywalls more easily
You can collect data for advertising
You can impose artificial limits to push upgrades
You control the processing pipeline

Client-side tools give up all of that. The tradeoff is that users get better privacy and faster processing (no upload/download wait), but the developer has fewer monetization levers.

I'm fine with that tradeoff. The tools are free, they work offline, and your files stay yours.

Try Them Out

All the PDF tools are live:

Drop your files in, get your result out. Nothing uploaded, nothing stored, nothing tracked. If you want to verify, open DevTools and watch the Network tab — you'll see exactly zero outbound requests during processing.

DEV Community