When AI Attacks Itself: A Fully Autonomous Red Team vs Blue Team Experiment
Date: June 22, 2026 · Environment: Kali Linux VM · Azure OpenAI · Docker
Tags:AI SecurityPenetration TestingAppSecAutonomous AgentsGPT-4ogpt-5.2
The Idea I Couldn't Get Out of My Head
What if two AI agents fought each other — one building and defending a web application, the other trying to break in? Two different models. No human intervention. No waiting. No typos in terminal commands.
I ran the experiment. The results were more interesting than I expected — not just because the attack and defense both worked, but because of how fast everything happened.
The Setup
Two models. Two roles. One isolated Kali Linux VM.
| Agent | Model | Role |
|---|---|---|
| 🔴 Red Agent | GPT-4o (Azure OpenAI) | Attack, analyze findings, verify patch |
| 🔵 Blue Agent | gpt-5.2 (Azure OpenAI) | Build target app, patch vulnerabilities |
Target stack: Flask · SQLite · Werkzeug 3.1.8 · Python 3.11.15 · Docker
Why two different models? Using GPT-4o for offense and gpt-5.2 for defense creates genuine asymmetry — each model brings different reasoning patterns to its role. A single model playing both sides would produce biased results.
A note on tooling: We started with AutoGen for agent orchestration, but hit a library conflict — AutoGen's bundled openai v0.x clashed with the modern openai v1.x SDK. We scrapped it and called the Azure OpenAI API directly. Simpler, faster, no magic.
Phase 1: Proof of Concept
Act 1 — Blue Agent Builds the Target ⏱️ 15 seconds
Blue Agent (gpt-5.2) was given one instruction: build a Flask/SQLite web app, deploy it via Docker, and intentionally leave two vulnerabilities in it for the experiment.
Vulnerability 1: SQL Injection
# ❌ User input injected directly into SQL query
query = f"SELECT * FROM users WHERE username='{user}' AND password='{pwd}'"
cur.execute(query)
Vulnerability 2: Stored XSS
# ❌ Raw user input stored and rendered without sanitization
comments_html = "".join(f"<p>{r[0]}</p>" for r in rows)
The database was pre-seeded with two users: admin:secret123 and alice:pass456.
From script execution to Container vulnerable-webapp Started: 15 seconds.
$ curl -s http://localhost:5000/login | grep -o "<h2>.*</h2>"
<h2>Login</h2> # ✅ App is live on port 5000
Act 2 — Red Agent Attacks ⏱️ 70 seconds
Red Agent (GPT-4o) ran a four-phase attack script automatically.
Phase 1 — Reconnaissance: nmap (6.38 seconds)
PORT STATE SERVICE VERSION
5000/tcp open http Werkzeug httpd 3.1.8 (Python 3.11.15)
Framework version fingerprinted. We know exactly what we're dealing with.
Phase 2 — Manual SQL Injection (< 1 second)
Payload: admin' OR '1'='1
Response: ✅ Welcome admin!
Login bypassed on the first attempt. Classic OR-based injection.
Phase 3 — sqlmap Automated Scan (10 seconds)
sqlmap automatically identified the backend as SQLite, then discovered three injection techniques on the same username parameter:
Type: boolean-based blind
Payload: username=admin' AND CASE WHEN 1348=1348 THEN 1348
ELSE JSON(CHAR(69,74,90,69)) END AND 'xgKy'='xgKy
Type: time-based blind
Payload: username=admin' AND 7314=LIKE(CHAR(65,66,67,68,69,70,71),
UPPER(HEX(RANDOMBLOB(500000000/2)))) AND 'fesM'='fesM
Type: UNION query (3 columns)
Payload: username=-5323' UNION ALL SELECT NULL,CHAR(113,120,112,107,113)
||CHAR(70,109,100,...)||CHAR(113,120,118,106,113),NULL-- qZAZ
Then dumped the entire database — 100 HTTP requests total:
Database: SQLite_masterdb
Table: users
+----+-----------+----------+
| id | password | username |
+----+-----------+----------+
| 1 | secret123 | admin |
| 2 | pass456 | alice |
+----+-----------+----------+
Phase 4 — Stored XSS (< 1 second)
Payload stored: <script>alert("XSS_PWNED")</script>
Reflected back: ✅ Script tag present — executes in any visitor's browser
Total: 70 seconds. 100 HTTP requests. Every credential stolen. XSS payload live.
GPT-4o then analyzed its own attack output and produced a structured threat intelligence report:
| Vulnerability | Severity | Impact |
|---|---|---|
| SQL Injection | Critical | Full database compromise, authentication bypass |
| Stored XSS | High | Arbitrary JavaScript execution on all visitors |
API cost for this analysis: 4,667 tokens — roughly $0.05.
Act 3 — Blue Agent Patches the Code ⏱️ 30 seconds
The GPT-4o threat report was passed directly to Blue Agent (gpt-5.2) along with the vulnerable app.py. No human read the report. No human wrote the fix.
Fix 1: Parameterized Queries
# ✅ SQL logic and user data are now completely separated
cur.execute("SELECT * FROM users WHERE username=? AND password=?", (user, pwd))
The database driver handles escaping. User input is always treated as a literal value — never as SQL syntax.
Fix 2: Output Encoding + CSP Header
# ✅ Special characters neutralized before rendering
import html
comments_html = "".join(f"<p>{html.escape(r[0])}</p>" for r in rows)
# + Content-Security-Policy: script-src 'self' (added to response headers)
Blue Agent automatically saved a backup of the original file (app.py.backup), wrote the patched version, and the orchestrator triggered a Docker rebuild:
[+] Building 1.6s (11/11) FINISHED
✔ Container vulnerable-webapp Started ✅
API cost for patch generation: 2,561 tokens — roughly $0.03.
Act 4 — Red Agent Confirms the Fix ⏱️ 3 seconds
Same payloads. Same tools. Different result.
SQL Injection — blocked
Payload: admin' OR '1'='1
Result: ❌ Invalid credentials
sqlmap — full arsenal, nothing found
[WARNING] POST parameter 'username' does not seem to be injectable
[WARNING] POST parameter 'password' does not seem to be injectable
[CRITICAL] all tested parameters do not appear to be injectable.
sqlmap tried every technique it had. All failed.
Stored XSS — escaped
Input: <script>alert("XSS_PWNED")</script>
Output: <script>alert("XSS_PWNED")</script>
Stored as plain text. Browser renders it, doesn't execute it.
Legitimate login still works:
username=admin&password=secret123 → ✅ Welcome admin!
| Vulnerability | Before Patch | After Patch |
|---|---|---|
| SQL Injection — manual | ❌ Exploited | ✅ Blocked |
| SQL Injection — sqlmap | ❌ Full DB dumped | ✅ Not injectable |
| Stored XSS | ❌ Script executed | ✅ Escaped to plain text |
| Legitimate login | ✅ Works | ✅ Still works |
Phase 2: Fully Autonomous Closed-Loop
Phase 1 proved the concept with manual handoffs between steps. Phase 2 eliminated them entirely.
orchestrator.py connects both agents in a Closed-Loop Feedback System — a self-healing security pipeline that runs start-to-finish with a single command: python3 orchestrator.py.
[Orchestrator] ──── launch ────► [Red Agent GPT-4o: Attack]
│ │
rebuild Docker generate report
│ │
▼ ▼
[Docker Container] ◄── patch ── [Blue Agent gpt-5.2: Defense]
│
new container live
│
▼
[Red Agent GPT-4o: Verification Mode]
→ receives patched source code
→ reasons about bypass possibilities
→ confirms: SECURE ✅
The critical engineering decision in Phase 4: Red Agent doesn't just re-run attack.sh. It receives the actual patched Python source code and reasons about whether its previous payloads could succeed against the new logic. This is code-level security analysis, not blind tool re-execution.
Live Orchestrator Output
🚀 Starting Joint Operations Room: Red Team vs Blue Team...
==================================================
🔥 [Phase 1] Launching Red Agent (GPT-4o)...
📝 Red Agent successfully generated attack report!
🛡️ [Phase 2] Orchestrator hands report to Blue Agent (gpt-5.2)...
🛠️ Blue Agent patched the code and rewrote app.py automatically!
🐳 [Phase 3] Orchestrator rebuilds Docker with patched code...
🔄 Container updated. Secure version now live.
🎯 [Phase 4] Calling Red Agent for verification audit...
==================================================
🏁 Final Verification Report:
1. SQL Injection:
Patched: cur.execute("SELECT ... WHERE username=?", (user,))
Payload: admin' OR '1'='1
Result: ❌ BLOCKED — Parameterized queries neutralize the injection.
2. Stored XSS:
Patched: html.escape() + Content-Security-Policy: script-src 'self'
Payload: <script>alert('XSS')</script>
Result: ❌ BLOCKED — Rendered as <script>. CSP blocks inline JS.
System Status: SECURE 🛡️
==================================================
Why the CSP Header Is the Interesting Part
Blue Agent applied Defense-in-Depth without being explicitly asked:
-
Layer 1:
html.escape()converts<script>→<script>at the Python level -
Layer 2:
Content-Security-Policy: script-src 'self'tells the browser to refuse any inline JavaScript, even if encoding somehow fails
Both layers must fail simultaneously for XSS to succeed. The model reasoned about this independently — it wasn't in the prompt.
The Complete Timeline
18:36:58 🔵 gpt-5.2 builds app → Docker starts ~15s
18:37:06 🔴 GPT-4o begins attack
├── nmap: Werkzeug 3.1.8 / Python 3.11.15 6.38s
├── SQLi: login bypassed on first payload <1s
├── sqlmap: 3 injection types, full DB dump 10s
└── XSS: payload stored and reflected <1s
──────────────
70s total
100 HTTP reqs
18:37:16 🤖 GPT-4o analyzes findings 1 call · 4,667 tokens
🔵 gpt-5.2 patches app.py 1 call · 2,561 tokens
🐳 Docker rebuild ~20s (cached layers)
19:44:16 🔴 GPT-4o re-tests patched app 3s — all blocked
──────────────────────────────────────────────────────────────────
⏱️ Full cycle, start to finish: < 2 minutes
💰 Total Azure OpenAI cost: ~$0.08
👤 Human intervention: zero
What This Actually Means
Speed is the real shift.
What traditionally takes days — Red Team engagement, developer reads report, writes fix, gets it reviewed, deploys — happened in under two minutes. Not because AI is smarter than a human security engineer. Because it doesn't stop, doesn't need context-switching, and doesn't wait for a Slack reply.
Two models beat one.
GPT-4o on offense and gpt-5.2 on defense created genuine asymmetry. The experiment would have been less honest — and less interesting — with a single model playing both sides.
Ditch the framework when it fights you.
AutoGen looked good on paper. When its bundled openai v0.x clashed with our openai v1.x, we spent zero time debugging it and called the API directly. Sometimes the abstraction isn't worth it.
AI doesn't invent, it compresses.
SQL Injection is in OWASP Top 10. sqlmap is public. Parameterized queries are documented everywhere. What AI did here was collapse the time between knowing and doing — from days to seconds.
The real implication.
If an attacker can automate a full recon-exploit-report cycle in 70 seconds for $0.05, the defender's response window shrinks to something only automation can match. This experiment is a small demonstration of that pressure.
What's Next
- [ ] Add CSRF and IDOR to the target app and repeat
- [ ] Test whether Red Agent can find vulnerabilities it wasn't told about
- [ ] Pit GPT-4o vs gpt-5.2 in both roles and compare outcomes
- [ ] Build a real-time terminal dashboard for the orchestration loop
- [ ] Extend to DAST scanning with OWASP ZAP
Full source code and setup instructions: https://github.com/AbdaullahAG/autonomous-ai-red-blue-lab
All tests conducted in a completely isolated VM environment. Never apply these techniques to systems without explicit written permission.
Top comments (1)
Didn't expect the CSP header to show up without being asked — that's the interesting emergent behavior right there. And $0.08 for the whole loop definitely hits different 😅