DEV Community

Cover image for I built an open-source tool that stops personal data from leaking into AI chatbots
Twisted-Code'r
Twisted-Code'r

Posted on • Edited on

I built an open-source tool that stops personal data from leaking into AI chatbots

Ever copy-pasted something into ChatGPT and immediately
thought "wait, should I have done that?"

If you're building an AI app that handles user data, you need to know what's leaking into your LLM API before a regulator does.

That's the problem ShadowAudit solves.

It sits between your app and any LLM API and scans every
prompt before it leaves your system — catching emails,
phone numbers, API keys, and Indian national IDs like
Aadhaar and PAN numbers.

Two lines to integrate:

sa = ShadowAudit.from_config("shadowaudit.yaml")
client = sa.wrap(openai.OpenAI())
Enter fullscreen mode Exit fullscreen mode

That's it. Everything else stays the same.

It also generates GDPR Article 30 compliance reports
automatically from your audit log — one command, done.

Demo of ShadowAudit scanning API and masking it.

Built this over summer as part of my open-source portfolio.
Would love feedback from the community.

GitHub: github.com/Jeffrin-dev/ShadowAudit

Top comments (3)

Collapse
 
sunychoudhary profile image
Suny Choudhary

This is a solid direction; but also feels like a patch for a deeper issue.

Right now, most teams treat “don’t leak data into AI” as a user problem (be careful what you paste)… when it’s really a system design problem. People will paste logs, code, customer data; especially under pressure.

Tools like this help, but they’re kind of compensating for the fact that the default behavior of most AI tools is still “trust the input, send it out.”

What’s interesting is how many different ways data can leak:

  • prompts with secrets
  • connected tools/plugins
  • even memory over time

So it’s not just about filtering input; it’s about rethinking what the model is allowed to see and retain in the first place.

Feels like we’re heading toward a world where:

  • either every team builds guardrails like this
  • or AI tools themselves become privacy-aware by default

Curious; do you see this staying as a tooling layer, or eventually becoming part of the model/runtime itself?

Collapse
 
jeffrin-dev profile image
Twisted-Code'r

You're right — it's a patch, not a cure. And I'd rather
be honest about that than oversell it.

The reason it lives at the application layer is that's
where developers actually have control right now. And
regulated industries need auditable proof of what was
scanned — not just good intentions.

The memory and plugins problem is real and ShadowAudit
doesn't solve that yet. That's a harder architectural
problem.

My guess is both futures happen — privacy-aware models
at the foundation, tooling for compliance proof above
them. Like how TLS didn't replace application-level auth.

Good question to push on though. Kept me thinking.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.