As developers, we handle tons of sensitive data every day—API keys, legal contracts, financial statements, and user metrics. A few months ago, I needed to redact some Personally Identifiable Information (PII) from a batch of PDF documents.
I looked into existing online tools, but they all required one terrifying thing: uploading confidential files to their cloud servers. For anyone handling HIPAA, GDPR, or corporate legal data, that's an absolute dealbreaker. Even worse, many standard PDF editors only place a black vector box over the text, leaving the underlying sensitive data completely scrapable.
So, I spent the last 2 months building a local-first, privacy-focused desktop app to solve this permanently: PII Blackout.
Here is how I built it, the tech stack behind it, and the engineering challenges I faced.
🛠️ The Tech Stack & Architecture
I wanted the app to be cross-platform, highly performant, and capable of running heavy AI inference locally without breaking a sweat.
-
GUI Framework:
PySide6(Qt for Python). It provides native desktop performance and smooth UI rendering, which is essential for handling massive PDF files. -
Core PII Engine: Powered by Microsoft's Presidio architecture. I chose Presidio because of its extensible orchestrator design, which allowed me to easily combine rule-based pattern matchers (Regex) with advanced NER models (like
GLiNER). This combination ensures the app intelligently auto-detects names, emails, phone numbers, and physical addresses out of the box with production-grade accuracy. - PDF Processing: Custom backend that flattens and burns the blackouts directly into the image layer of the document, making it mathematically impossible to recover or reverse-engineer the redacted data.
🛡️ Core Engineering Challenges
1. Adapting Microsoft Presidio for Local Execution
Microsoft Presidio is fantastic, but it's often deployed as a cloud service or containerized API. Adapting its architecture to run entirely inside a local Python client environment required optimizing the loading times of the underlying AI models and managing memory efficiently.
💡 Lesson learned: A minimum of 12GB RAM is the sweet spot for smooth local AI processing when parsing multi-page documents simultaneously.
2. Destructive Redaction vs. Lazy Masking
Many PDF tools just change the background color of the text to black. If you copy-paste that section, the hidden text is revealed. PII Blackout completely flattens the document architecture, destroying the sensitive data layers and rendering them as a unified image layer.
3. Batch Processing Optimization
To make it useful for professionals, I implemented a drag-and-drop batch processing system. Users can drop an entire folder of hundreds of PDFs, and the app will queue and redact them locally in seconds.
🚀 Check it out (and Feedback Welcome!)
I've just officially released v1.0.2 and would love to get the DEV community's feedback on the UI/UX and the local performance.
- Official Web: piiblackout.com
- GitHub Releases: PII-Blackout/releases
- Windows Download: Direct Setup Link (.exe) or via Microsoft Store
- Quick Workflow Demo: Watch on YouTube
🎁 Free Tier for Developers & Solo Users
The free tier lets you process up to 3 PDFs per day (max 15 pages per document, with a light watermark). If you need heavy enterprise usage, there are Pro tiers available too.
If you are familiar with Microsoft Presidio, building local-first AI tools, or working with PySide6, let's connect in the comments! How are you handling PII security in your own workflows?
Top comments (0)