I built an open-source tool that stops personal data from leaking into AI chatbots

#webdev #security #python #opensource

Ever copy-pasted something into ChatGPT and immediately
thought "wait, should I have done that?"

If you're building an AI app that handles user data, you need to know what's leaking into your LLM API before a regulator does.

That's the problem ShadowAudit solves.

It sits between your app and any LLM API and scans every
prompt before it leaves your system — catching emails,
phone numbers, API keys, and Indian national IDs like
Aadhaar and PAN numbers.

Two lines to integrate:

sa = ShadowAudit.from_config("shadowaudit.yaml")
client = sa.wrap(openai.OpenAI())

That's it. Everything else stays the same.

It also generates GDPR Article 30 compliance reports
automatically from your audit log — one command, done.

Built this over summer as part of my open-source portfolio.
Would love feedback from the community.

GitHub: github.com/Jeffrin-dev/ShadowAudit

Top comments (5)

Suny Choudhary • Mar 30

This is a solid direction; but also feels like a patch for a deeper issue.

Right now, most teams treat “don’t leak data into AI” as a user problem (be careful what you paste)… when it’s really a system design problem. People will paste logs, code, customer data; especially under pressure.

Tools like this help, but they’re kind of compensating for the fact that the default behavior of most AI tools is still “trust the input, send it out.”

What’s interesting is how many different ways data can leak:

prompts with secrets
connected tools/plugins
even memory over time

So it’s not just about filtering input; it’s about rethinking what the model is allowed to see and retain in the first place.

Feels like we’re heading toward a world where:

either every team builds guardrails like this
or AI tools themselves become privacy-aware by default

Curious; do you see this staying as a tooling layer, or eventually becoming part of the model/runtime itself?

Twisted-Code'r • Mar 31

You're right — it's a patch, not a cure. And I'd rather
be honest about that than oversell it.

The reason it lives at the application layer is that's
where developers actually have control right now. And
regulated industries need auditable proof of what was
scanned — not just good intentions.

The memory and plugins problem is real and ShadowAudit
doesn't solve that yet. That's a harder architectural
problem.

My guess is both futures happen — privacy-aware models
at the foundation, tooling for compliance proof above
them. Like how TLS didn't replace application-level auth.

Good question to push on though. Kept me thinking.