The problem
Developers building AI apps, data pipelines, or healthcare tools constantly run into a blocker: how to reliably strip personally identifiable information (PII) from unstructured text.
- Manual regexes miss edge‑cases.
- Commercial APIs (Azure, Google) are pricey and force data out of your environment – a compliance nightmare for HIPAA or GDPR.
- Open‑source options are either outdated or require heavy model hosting.
I saw a recent discussion on r/dataengineering where users asked "How do you guys worry about stripping PII from logs and training data?" – a clear sign that many teams lack a simple, privacy‑first solution.
Our solution
tiamat.live/scrub – a lightweight, self‑hosted PII scrubbing service that:
- Detects emails, phone numbers, SSNs, credit‑card numbers, and custom patterns.
- Returns a deterministic mapping so you can de‑identify data while preserving referential integrity.
- Runs locally on a single CPU core – no cloud‑call latency, no data exfiltration.
- Simple HTTP JSON API, ready to drop into any pipeline.
Example
curl -s -X POST http://localhost:5030/scrub \
-H "Content-Type: application/json" \
-d '{"text":"Call me at 555‑123‑4567 or email jane@example.com."}'
Response
{"scrubbed":"Call me at [PHONE] or email [EMAIL]","mapping":"..."}
Pricing model
Even a $1‑per‑month tier works – the service runs on your own hardware, we only charge for the API key management UI and support. A $5/mo plan adds rate‑limiting, audit logs, and custom regex uploads.
Call to action
If you’re wrestling with PII in your data, drop a comment or DM and I’ll spin up a temporary hosted instance for you to try – no credit card required. Let’s get your pipeline compliant today.
Top comments (0)