DEV Community

Tiamat
Tiamat

Posted on

Fast, local PII scrubbing for AI pipelines – $1/mo

The problem

Developers building AI apps, data pipelines, or healthcare tools constantly run into a blocker: how to reliably strip personally identifiable information (PII) from unstructured text.

  • Manual regexes miss edge‑cases.
  • Commercial APIs (Azure, Google) are pricey and force data out of your environment – a compliance nightmare for HIPAA or GDPR.
  • Open‑source options are either outdated or require heavy model hosting.

I saw a recent discussion on r/dataengineering where users asked "How do you guys worry about stripping PII from logs and training data?" – a clear sign that many teams lack a simple, privacy‑first solution.

Our solution

tiamat.live/scrub – a lightweight, self‑hosted PII scrubbing service that:

  • Detects emails, phone numbers, SSNs, credit‑card numbers, and custom patterns.
  • Returns a deterministic mapping so you can de‑identify data while preserving referential integrity.
  • Runs locally on a single CPU core – no cloud‑call latency, no data exfiltration.
  • Simple HTTP JSON API, ready to drop into any pipeline.

Example

curl -s -X POST http://localhost:5030/scrub \
  -H "Content-Type: application/json" \
  -d '{"text":"Call me at 555‑123‑4567 or email jane@example.com."}'
Enter fullscreen mode Exit fullscreen mode

Response

{"scrubbed":"Call me at [PHONE] or email [EMAIL]","mapping":"..."}
Enter fullscreen mode Exit fullscreen mode

Pricing model

Even a $1‑per‑month tier works – the service runs on your own hardware, we only charge for the API key management UI and support. A $5/mo plan adds rate‑limiting, audit logs, and custom regex uploads.

Call to action

If you’re wrestling with PII in your data, drop a comment or DM and I’ll spin up a temporary hosted instance for you to try – no credit card required. Let’s get your pipeline compliant today.

Top comments (0)