A year ago, I got fed up with online PDF tools.
Every time I needed to merge a contract or compress a resume, I had to upload my file to someone else's server. As a developer, I knew exactly what that meant — my file was sitting on a cloud server I didn't control, processed by code I couldn't see, and potentially logged or cached somewhere I couldn't audit.
So I decided to build an alternative: a PDF toolkit that runs entirely in the browser. No server uploads. No registration walls. No privacy trade-offs.
This post is a recap of the technical decisions, the libraries I chose, and the traps I fell into along the way.
The Goal
I wanted something simple on the surface but technically non-trivial under the hood:
- 20+ PDF tools (merge, split, compress, rotate, watermark, convert, encrypt, etc.)
- Pure browser-side processing — files never leave the user's device
- Free and registration-free — open the page, use the tool, download the result
- Fast and responsive — no upload/download round-trips
The live project is at en.sotool.top and the code is open source on GitHub.
Tech Stack
Frontend: Vue 3 + Vite
I chose Vue 3 because the Composition API makes complex file state management surprisingly clean. When you're juggling multiple uploaded files, their processing status, progress bars, and download URLs, having reactive state organized into composables is a massive productivity boost.
Vite was a no-brainer. The build speed and HMR experience demolish Webpack for this type of project. The only wrinkle: some heavy libraries need explicit optimizeDeps.include configuration or the dev server pre-bundling step hangs forever.
PDF Processing Libraries
| Library | Role |
|---|---|
| pdf-lib | PDF creation, modification, merge, split, encryption, watermarking |
| pdfjs-dist (Mozilla's PDF.js) | PDF rendering and page-to-image conversion (Canvas → PNG/JPEG) |
| html2pdf.js + jspdf | HTML-to-PDF conversion for the Word/Excel-to-PDF preview pipeline |
| mammoth | DOCX parsing to HTML |
| xlsx | Excel spreadsheet parsing |
pdf-lib is the star of the show. Its API is modern and predictable — you can merge PDFs, add watermarks, set passwords, and manipulate pages without touching a server. The fact that it runs in a browser with zero polyfills is still kind of magical to me.
pdfjs-dist handles the rendering side. Extracting a PDF page to a Canvas element and then exporting it as a high-DPI image is straightforward, but getting the worker file to load correctly in production was... less straightforward (more on that below).
The Traps I Fell Into
1. Browser Memory Limits
A 50MB scanned PDF is nothing for a desktop app. For a browser tab, it can be a crash trigger.
Browsers cap memory per tab aggressively. When users dropped massive scanned documents into the tool, the tab would freeze or the out-of-memory killer would step in.
The fix: For operations that don't need the full file (splitting, deleting pages, extracting ranges), I read only the necessary pages into memory instead of loading the entire document. pdf-lib supports incremental saves — after processing, only the modified sections are rewritten, which keeps memory pressure down.
2. Chinese Font Embedding
This was a surprise. When merging PDFs that contained Chinese text, the output would occasionally show garbled characters or tofu blocks.
The root cause: pdf-lib only embeds the 14 standard PDF fonts by default. Non-Latin characters (Chinese, Japanese, emoji) need explicit font registration. Embedding a full Chinese font file balloons the output size by several megabytes.
The fix: Where possible, preserve the original document's font references instead of re-embedding. When embedding is unavoidable, use subsetted font files that only include the glyphs actually used in the document.
3. Web Worker Path Hell
pdfjs-dist uses a Web Worker to parse PDFs off the main thread. In development, this works flawlessly. In production, the worker file would 404 because Vite's bundler moved or renamed it.
The fix: Two options work. You can use Vite's ?worker import syntax (import PdfWorker from 'pdfjs-dist/build/pdf.worker?worker'), or you can explicitly configure the worker source path in your build pipeline so the file lands where pdfjs-dist expects it.
4. Batch Processing UI Lockups
When processing dozens of files in a batch, the main thread gets busy and the UI freezes. The progress bar stops updating. Users think the page crashed.
The fix: Chunk the work into microtasks. Process one file, update the progress bar with setTimeout(..., 0) or requestIdleCallback, then yield control back to the browser before starting the next file. It adds a few milliseconds per file but keeps the UI responsive.
The SEO Problem (and Solution)
SPAs and search engines don't play nice. A crawler hitting your site sees an empty <div id="app"></div> and nothing else.
I wrote a prerender script using Playwright. During the build step, it launches a headless browser, navigates to every route, waits for Vue to mount, and saves the rendered HTML as static files. Each tool page and each guide article gets its own index.html.
The result: Google can index every individual tool page, and users sharing specific features get proper Open Graph previews.
The Trade-offs
Browser-side processing isn't perfect. Here are the honest limitations:
| Limitation | Why it happens |
|---|---|
| Large files struggle | Browser memory caps (300MB+ files can choke) |
| No OCR | Would require Tesseract.js or a server-side pipeline |
| Bookmarks/links lost | Merging restructures the document tree; preserving internal links is complex |
| CPU-bound | Heavy operations block the tab; no true multithreading |
For 95% of daily PDF tasks — merging a few contracts, compressing a resume, converting a page to an image — these limitations don't matter. For the remaining 5%, desktop software still wins.
What's Next
The project is live at en.sotool.top. The Chinese version is at sotool.top.
If you're building browser-based tools, I'd love to hear about your approach to handling large files, workers, or SEO. Drop a comment below.
This post is part of my #buildinpublic journey. Follow along for more technical deep dives and the occasional painful lesson learned.
Top comments (0)