Hey everyone! I’m gearing up to launch a new project I’ve been pouring a lot of love into. It's called Cloak.
The Problem
We constantly feed data into LLMs, but scrubbing sensitive Personal Identifiable Information (PII) manually is tedious and risky. I wanted a robust tool that could instantly redact sensitive data, but with one absolute rule: the data must never leave the device.
Enter Cloak
Cloak is a privacy-first web application designed to redact PII from text, images, and PDFs instantly. I wanted to nail a highly immersive, Apple-inspired interface, so making the experience feel native, fluid, and heavily polished was a massive priority during development.
Here are the core features:
- Zero Server Uploads: Drag and drop text, images, or PDFs into the app. Everything is processed entirely within your browser.
- On-Device AI Detection: It uses standard regex patterns for predictable formats (like SSNs, credit cards, and bank accounts), but also includes an optional "Deep Scan". This utilizes an on-device NER model (
Xenova/bert-base-NER) running via Web Workers to catch trickier entities. - Client-Side OCR: It extracts and redacts text directly from images utilizing
tesseract.js. - LLM Response Restorer: Instead of just blacking out text, Cloak can generate a "Synthetic" version of your document. It swaps real names and IDs for fake ones. Once your LLM generates a response using the fake data, Cloak’s restorer maps your original data back into the output.
- Visual Redaction Styles: You can toggle between Black Box, Blur, or Pixelate styles for image and text redactions.
The Stack
Building a heavy computational tool that stays completely client-side meant relying on some great libraries:
- Framework: Next.js
- Styling & Animations: Tailwind CSS v4 alongside Framer Motion for buttery-smooth, native-feeling transitions and pill menus.
- Database:
dexiefor saving your session history locally via IndexedDB. - Document Handling:
pdf-libandpdfjs-distfor client-side PDF parsing and rendering.
I’m finalizing the build and polishing the final animations before the official launch. I’d love to hear your thoughts on building local-first tools or dealing with PII in the age of LLMs. Let me know what you think in the comments!
Top comments (4)
Cool project! Using the browser for PII redaction without relying on server-side processing is a solid approach for privacy. I've worked on similar client-side processing pipelines for confidential computing—browser WebAssembly can be a great tool for keeping data local. If you're handling sensitive data, maybe consider how GPU acceleration (like via WebGPU or VoltageGPU) could speed things up for heavier workloads.
Hey, thanks so much!
Really appreciate the feedback.You bring up a fantastic point.
Right now, I'm offloading the heavier processing like the bert-base-NER model and the tesseract.js OCR to Web Workers just to ensure the main thread stays unblocked and the UI stays fluid.
But you're completely right, as I start testing with larger, multi-page PDFs or heavier image batches, CPU-bound processing starts to show its limits. Transitioning the AI models to leverage WebGPU is definitely a major milestone on my roadmap, and I'll definitely have to look into VoltageGPU as part of that research to help push the performance boundaries.
Thanks again for the suggestion and for checking the project out ❤️
The synthetic swap and restore is the clever piece, way better than black boxes that leave the document useless to the LLM. The thing I'd worry about most is a missed entity, since the whole promise breaks the first time the NER model doesn't recognize a name and it sails through unredacted. bert-base-NER is solid but it'll miss unusual names and anything far from what it was trained on. Do you surface a confidence pass or let people eyeball what got caught before it leaves, or is regex plus NER the whole net? For a privacy tool the false negative is the scary case, not the false positive.
Hey Nazar, thanks so much, I really appreciate the feedback.
You hit the absolute most critical point, a false negative is the worst-case scenario for a privacy tool. You are 100% right that
bert-base-NER(and standard regex) won't catch every edge case or highly unusual name.To combat this, the entire workflow is built around a "human-in-the-loop" philosophy. It is not a blind pass-through. Here is how Cloak handles the safety net:
The AI and Regex are really just there to do 95% of the tedious heavy lifting, but the user always gets the final visual sign-off before hitting that "Copy Synthetic" button.
Thanks again for bringing this up, it’s the exact right mindset to have when evaluating security tools ❤️