DEV Community

Cover image for I built an open-source, privacy-first PDF toolkit (80+ tools) to replace Adobe. Here is the stack.
Tony Larry
Tony Larry

Posted on

I built an open-source, privacy-first PDF toolkit (80+ tools) to replace Adobe. Here is the stack.

The "Why"

If you are a developer, you probably hate uploading sensitive documents (tax forms, contracts, bank statements) to random "Free PDF Merger" websites. You know that "Free" usually means "You are the product."

I wanted a tool that:

  1. Runs 100% locally (files never leave the browser).
  2. Is Open Source (so I can verify the code).
  3. Doesn't suck (clean UI, no ads, no "3 files per day" limits).

So, I built PDFCraft. It's an MIT-licensed, client-side PDF toolkit built with Next.js and WebAssembly.

The Tech Stack

I chose a modern stack to ensure performance and maintainability:

  • Frontend Framework: Next.js (React). I needed the static site generation (SSG) capabilities for SEO and fast initial load times.
  • Styling: Tailwind CSS. For a clean, responsive UI that works on mobile.
  • Core Engine (The Heavy Lifting):
  • WebAssembly (Wasm): This is key. It allows us to run heavy image and PDF processing logic in the browser at near-native speed.
  • Libraries: pdf-lib for manipulation, pdf.js for rendering, and tesseract.js for client-side OCR.

Architecture: Zero-Server Processing

The most interesting part of PDFCraft is what it doesn't have: A backend API for file processing.

In a traditional architecture:
User Uploads File -> Server (AWS/GCP) Processes it -> User Downloads

In PDFCraft:
User Selects File -> Browser (Wasm/WebWorkers) Processes it -> User Downloads

Why this matters for devs:

  1. Privacy: I literally cannot see your files. There is no database to hack.
  2. Cost: Hosting is dirt cheap (static files only).
  3. Speed: No network latency for uploading large files.

Key Features Implemented

It started as a simple merger but grew into a suite of 80+ tools. Here are a few technical highlights:

  • Client-Side OCR: Using tesseract.js and Web Workers to extract text from scanned PDFs without freezing the main thread.
  • Conversion: Converting PDF to Office formats (Word/Excel) and Images (JPG/PNG/HEIC) directly in the browser.
  • Security: AES encryption/decryption handling purely on the client.

Self-Hosting & Extension

Since privacy is the main goal, I made sure you can run it yourself.
You can clone the repo and deploy it anywhere (Vercel, Netlify, or your own Docker container).

I also included a Chrome Extension (you can find the zip in the repo) for quick access without opening a new tab.

Open Source & Roadmap

The project is fully open source (MIT License).

I'm currently looking for contributors to help with:

  • Improving the PDF Viewer performance for huge files (500MB+).
  • Adding more language support for OCR.

If you are interested in WebAssembly or PDF manipulation, check out the code!

🔗 GitHub: https://github.com/PDFCraftTool/pdfcraft

(If you find it useful, a star ⭐ would keep me motivated!)

Top comments (2)

Collapse
 
prinsapps profile image
prins premnath

I did tried and it works. good job

Collapse
 
pccprint profile image
reactjsfav • Edited

Love this project! Do you have any plans to add a feature to sign PDFs with a digital ID in the future?