Learning content (no signup needed)
8 modules that take you from "how do LLMs even work" to "how do I break them":
- How LLMs work - tokenization, attention, generation
- System prompts - how they're assembled, why they're vulnerable
- RAG explained - retrieval pipelines, BM25, document injection, trust boundaries
- Tools & function calling - how LLMs invoke external functions and why that's an attack surface
- The Bare LLM - direct prompt injection techniques
- LLM + External Data - RAG poisoning and indirect prompt injection
- LLM + Tools - tool abuse, excessive agency, OWASP LLM06
- LLM + Defenses - bypassing system-level protections and defense-in-depth
Interactive diagrams and visual explanations throughout. The idea is: first you learn how these systems work, then you break them.
Hands-on labs (free account required)
7 attack labs across 4 modules:
- Bare LLM attacks - direct prompt injection against unprotected models
- RAG poisoning - a hacker named gh0st has poisoned one document in a 10-doc knowledge base. You need to craft queries that make the real BM25 retrieval engine select the poisoned document, then trigger the hidden injection to exfiltrate data via the AI's email tool
- Tool exploitation - discover hidden tools the AI has access to and trick it into using them (OWASP LLM06 - Excessive Agency)
- Defense bypass - break through system prompt armor, output guards, canary tokens, and LLM-powered classifiers
Every lab has a Context Trace panel - you see exactly what the model receives in real time: system prompt, retrieved documents, available tools, conversation history, and the user input. When you send a message, BM25 runs, documents get injected into context, tools get called - and you watch
it all happen layer by layer.
13-level Gauntlet - a progressive CTF-style challenge with increasingly hardened AI systems:
- Levels 1-7: Prompt-level defenses
- Levels 8-11: Code-level guards (output filters, canary tokens, regex blockers)
- Levels 12-13: LLM classifiers (AI-powered input/output monitoring)
Hints unlock progressively as you try more attempts, so you're never completely stuck.
What makes this different
- Real LLMs - you're attacking actual models from OpenAI, Anthropic, Google, Groq, and Cerebras (rotated for availability). Not pattern matching or simulated responses.
-
Real RAG - the retrieval pipeline uses a real BM25 implementation with proper IDF/TF scoring, stopword removal, and top-K ranking. The 10 documents in the knowledge base contain realistic corporate data (compensation bands, AWS account IDs, vendor stacks, internal URLs). When the AI
exfiltrates via
send_email, the data looks genuinely sensitive. - Real tools - tools execute in a sandboxed environment. The AI actually calls functions, and you see the tool calls and results in the trace.
- Context Trace - this is the core teaching tool. Every layer of the prompt is visible: what the system prompt says, which RAG document was retrieved and its BM25 score, what tools are available, and what the AI actually receives. Understanding the full context window is what makes the attacks click.
Technical details for the curious
- Next.js app with server-side lab engine
- Pure TypeScript BM25 implementation (no external dependencies)
- AI SDK for multi-provider LLM routing
- Tool sandbox with email, file system, calendar operations
- Win conditions: text matching, regex, tool call detection (with argument validation), exfiltration markers, LLM classifiers
- NDJSON streaming for real-time trace updates during tool-calling labs
What I'm looking for
We're in beta - things are still evolving. I'd really appreciate honest feedback:
- What attack scenarios would you want to see?
- What learning topics are missing?
- How's the difficulty curve?
- Is the Context Trace actually helpful for understanding what's happening?
All feedback welcome - roast it, break it, tell me what sucks. That's the whole point of putting it out here early.
Try it: prompttrace.airedlab.com

Top comments (0)