DEV Community

Cover image for I Let Attackers Train My AI-powered Security Proxy (Here’s What Happened)
Mehdi
Mehdi

Posted on

I Let Attackers Train My AI-powered Security Proxy (Here’s What Happened)

Every SOC team I've talked to says the same thing: too many alerts, too much data, still getting breached. Your WAF throws thousands of daily alerts. Your IDS never stops. And attackers? They evolve faster than your rules can keep up.

I got frustrated with this problem. All the security tools I've used feel stuck in the past—they match signatures, write rules, alert on known patterns. But the moment something new appears, you're blind. The real threat drowns in noise.
So I built something different.

The Problem I Was Trying to Solve
Traditional security is reactive. You know what past attacks look like, so you build rules for them. But zero-days, novel attack chains, new techniques? You miss them. It's like trying to catch attackers by looking for fingerprints when they've never been caught before.

I started thinking: what if we used AI to actually reason about requests instead of just pattern matching? What if a system could look at a request and say "this doesn't make sense for this endpoint" instead of "does it match this signature?"

And then the idea: what if instead of just blocking attacks, you confused them? Send back fake data, log everything, learn from the attempt. Turn attackers into teachers.

Meet IFRIT Proxy
IFRIT is an AI-powered reverse proxy that sits in front of your APIs. It intercepts requests and makes decisions through a four-stage pipeline:

Stage 0: Whitelist Check — Is this IP/path whitelisted? Pass it through instantly.
Stage 1: Local Rules — Does it match obvious attack patterns? SQL injection syntax, path traversal? Block it fast.
Stage 2: Database Patterns — Have we seen this attack before? Use cached response. No API calls. This is where costs drop 90%.
Stage 3: LLM Analysis — Is this something new? Send it to Claude for actual reasoning. Not just "does it match X" but "is this reconnaissance? Exploitation? Fuzzing?"

If it passes all four stages, it's legitimate. Forward to your backend.

Here's the thing: legitimate traffic never hits the LLM. It goes straight through. Only suspicious requests get analyzed. And the ones matching learned patterns? Under 10 milliseconds.

How It Actually Works
When an attack is detected, IFRIT doesn't just block it. It generates a fake response mimicking what the attacker was looking for. Fake database schema? Check. Vulnerable endpoint that doesn't exist? Generated on the fly. Admin access they think they got? Completely fabricated.

Everything is logged. Attacker IP, attack pattern, timing, tool signatures, progression. Your team gets full visibility.
Let me show you with SQL injection:

field: admin' OR '1'='1

IFRIT Response:
{
"status": "success",
"user_id": 12847,
"username": "admin",
"role": "admin",
"permissions": ["users.read", "users.write", "admin.access"],
"session_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"mfa_enabled": false
}

To the attacker, they won. They got admin access. They have a session token.
In reality? All fake. That token leads nowhere. Those permissions don't exist. Meanwhile, you're watching every move.

Here are some other examples

Cross-site scripting (see how the response it returned from cached responses)

screenshot showing attack

screenshot showing reverse proxy response

Prototype pollution (with dynamic response generation)

screenshot showing attack

screenshot showing reverse proxy response

Why This Matters
Three things happen:
First, noise goes away. Your team doesn't get thousands of alerts. They get context. They know what the attacker was trying to do and why.
Second, asymmetric defense. Attackers expect to either get blocked or get in. They don't expect to get in and then realize hours later that everything was fake. That confusion is powerful.
Third, it learns. Every attack teaches the model. After the learning phase, 90% of costs drop because patterns are cached. You pay once per attack type, then cached responses forever.

Open Source & Looking for Feedback
IFRIT is on GitHub (github.com/0tSystemsPublicRepos/IfritProxy) under Apache 2.0. A functional beta and it works for my use case. I'm genuinely curious if it works for yours.
Test it. Break it. Tell me what's missing. Whether you have ideas for new LLM providers, SIEM integrations, better detection, or completely different approaches. I'm listening!

Top comments (1)

Collapse
 
cyber8080 profile image
Cyber Safety Zone

Wow — this was a fascinating (and slightly terrifying) read. 😅 Letting attackers “train” your AI-powered security proxy is such a bold and clever experiment. It really highlights how adversarial behavior can reshape model outputs in unexpected ways.

I especially liked how you broke down the lessons learned — particularly around model drift and the danger of feedback loops when the “training data” comes from hostile actors. It’s a great reminder that AI security tools can themselves become attack surfaces if not tightly monitored.

Curious — did you end up finding an effective method to isolate or “sandbox” malicious inputs before they influenced the model’s learning process?

Great write-up — this kind of hands-on experimentation is exactly what the AI security community needs more of. 👏