sunshey

Posted on May 30

How I Built a Privacy-First PDF Toolkit with Vue 3 and WebAssembly

#javascript #privacy #showdev #vue

A year ago, I got fed up with online PDF tools.

Every time I needed to merge a contract or compress a resume, I had to upload my file to someone else's server. As a developer, I knew exactly what that meant — my file was sitting on a cloud server I didn't control, processed by code I couldn't see, and potentially logged or cached somewhere I couldn't audit.

So I decided to build an alternative: a PDF toolkit that runs entirely in the browser. No server uploads. No registration walls. No privacy trade-offs.

This post is a recap of the technical decisions, the libraries I chose, and the traps I fell into along the way.

The Goal

I wanted something simple on the surface but technically non-trivial under the hood:

20+ PDF tools (merge, split, compress, rotate, watermark, convert, encrypt, etc.)
Pure browser-side processing — files never leave the user's device
Free and registration-free — open the page, use the tool, download the result
Fast and responsive — no upload/download round-trips

The live project is at en.sotool.top and the code is open source on GitHub.

Tech Stack

Frontend: Vue 3 + Vite

I chose Vue 3 because the Composition API makes complex file state management surprisingly clean. When you're juggling multiple uploaded files, their processing status, progress bars, and download URLs, having reactive state organized into composables is a massive productivity boost.

Vite was a no-brainer. The build speed and HMR experience demolish Webpack for this type of project. The only wrinkle: some heavy libraries need explicit optimizeDeps.include configuration or the dev server pre-bundling step hangs forever.

PDF Processing Libraries

Library	Role
pdf-lib	PDF creation, modification, merge, split, encryption, watermarking
pdfjs-dist (Mozilla's PDF.js)	PDF rendering and page-to-image conversion (Canvas → PNG/JPEG)
html2pdf.js + jspdf	HTML-to-PDF conversion for the Word/Excel-to-PDF preview pipeline
mammoth	DOCX parsing to HTML
xlsx	Excel spreadsheet parsing

pdf-lib is the star of the show. Its API is modern and predictable — you can merge PDFs, add watermarks, set passwords, and manipulate pages without touching a server. The fact that it runs in a browser with zero polyfills is still kind of magical to me.

pdfjs-dist handles the rendering side. Extracting a PDF page to a Canvas element and then exporting it as a high-DPI image is straightforward, but getting the worker file to load correctly in production was... less straightforward (more on that below).

The Traps I Fell Into

1. Browser Memory Limits

A 50MB scanned PDF is nothing for a desktop app. For a browser tab, it can be a crash trigger.

Browsers cap memory per tab aggressively. When users dropped massive scanned documents into the tool, the tab would freeze or the out-of-memory killer would step in.

The fix: For operations that don't need the full file (splitting, deleting pages, extracting ranges), I read only the necessary pages into memory instead of loading the entire document. pdf-lib supports incremental saves — after processing, only the modified sections are rewritten, which keeps memory pressure down.

2. Chinese Font Embedding

This was a surprise. When merging PDFs that contained Chinese text, the output would occasionally show garbled characters or tofu blocks.

The root cause: pdf-lib only embeds the 14 standard PDF fonts by default. Non-Latin characters (Chinese, Japanese, emoji) need explicit font registration. Embedding a full Chinese font file balloons the output size by several megabytes.

The fix: Where possible, preserve the original document's font references instead of re-embedding. When embedding is unavoidable, use subsetted font files that only include the glyphs actually used in the document.

3. Web Worker Path Hell

pdfjs-dist uses a Web Worker to parse PDFs off the main thread. In development, this works flawlessly. In production, the worker file would 404 because Vite's bundler moved or renamed it.

The fix: Two options work. You can use Vite's ?worker import syntax (import PdfWorker from 'pdfjs-dist/build/pdf.worker?worker'), or you can explicitly configure the worker source path in your build pipeline so the file lands where pdfjs-dist expects it.

4. Batch Processing UI Lockups

When processing dozens of files in a batch, the main thread gets busy and the UI freezes. The progress bar stops updating. Users think the page crashed.

The fix: Chunk the work into microtasks. Process one file, update the progress bar with setTimeout(..., 0) or requestIdleCallback, then yield control back to the browser before starting the next file. It adds a few milliseconds per file but keeps the UI responsive.

The SEO Problem (and Solution)

SPAs and search engines don't play nice. A crawler hitting your site sees an empty <div id="app"></div> and nothing else.

I wrote a prerender script using Playwright. During the build step, it launches a headless browser, navigates to every route, waits for Vue to mount, and saves the rendered HTML as static files. Each tool page and each guide article gets its own index.html.

The result: Google can index every individual tool page, and users sharing specific features get proper Open Graph previews.

The Trade-offs

Browser-side processing isn't perfect. Here are the honest limitations:

Limitation	Why it happens
Large files struggle	Browser memory caps (300MB+ files can choke)
No OCR	Would require Tesseract.js or a server-side pipeline
Bookmarks/links lost	Merging restructures the document tree; preserving internal links is complex
CPU-bound	Heavy operations block the tab; no true multithreading

For 95% of daily PDF tasks — merging a few contracts, compressing a resume, converting a page to an image — these limitations don't matter. For the remaining 5%, desktop software still wins.

What's Next

The project is live at en.sotool.top. The Chinese version is at sotool.top.

If you're building browser-based tools, I'd love to hear about your approach to handling large files, workers, or SEO. Drop a comment below.

This post is part of my #buildinpublic journey. Follow along for more technical deep dives and the occasional painful lesson learned.

DEV Community