PromptTrace - Free hands-on platform for learning GenAI security (hack real LLMs)

#ai #security #llm #tutorial

Learning content (no signup needed)

8 modules that take you from "how do LLMs even work" to "how do I break them":

How LLMs work - tokenization, attention, generation
System prompts - how they're assembled, why they're vulnerable
RAG explained - retrieval pipelines, BM25, document injection, trust boundaries
Tools & function calling - how LLMs invoke external functions and why that's an attack surface
The Bare LLM - direct prompt injection techniques
LLM + External Data - RAG poisoning and indirect prompt injection
LLM + Tools - tool abuse, excessive agency, OWASP LLM06
LLM + Defenses - bypassing system-level protections and defense-in-depth

Interactive diagrams and visual explanations throughout. The idea is: first you learn how these systems work, then you break them.

Hands-on labs (free account required)

7 attack labs across 4 modules:

Bare LLM attacks - direct prompt injection against unprotected models
RAG poisoning - a hacker named gh0st has poisoned one document in a 10-doc knowledge base. You need to craft queries that make the real BM25 retrieval engine select the poisoned document, then trigger the hidden injection to exfiltrate data via the AI's email tool
Tool exploitation - discover hidden tools the AI has access to and trick it into using them (OWASP LLM06 - Excessive Agency)
Defense bypass - break through system prompt armor, output guards, canary tokens, and LLM-powered classifiers

Every lab has a Context Trace panel - you see exactly what the model receives in real time: system prompt, retrieved documents, available tools, conversation history, and the user input. When you send a message, BM25 runs, documents get injected into context, tools get called - and you watch
it all happen layer by layer.

13-level Gauntlet - a progressive CTF-style challenge with increasingly hardened AI systems:

Levels 1-7: Prompt-level defenses
Levels 8-11: Code-level guards (output filters, canary tokens, regex blockers)
Levels 12-13: LLM classifiers (AI-powered input/output monitoring)

Hints unlock progressively as you try more attempts, so you're never completely stuck.

What makes this different

Real LLMs - you're attacking actual models from OpenAI, Anthropic, Google, Groq, and Cerebras (rotated for availability). Not pattern matching or simulated responses.
Real RAG - the retrieval pipeline uses a real BM25 implementation with proper IDF/TF scoring, stopword removal, and top-K ranking. The 10 documents in the knowledge base contain realistic corporate data (compensation bands, AWS account IDs, vendor stacks, internal URLs). When the AI exfiltrates via send_email, the data looks genuinely sensitive.
Real tools - tools execute in a sandboxed environment. The AI actually calls functions, and you see the tool calls and results in the trace.
Context Trace - this is the core teaching tool. Every layer of the prompt is visible: what the system prompt says, which RAG document was retrieved and its BM25 score, what tools are available, and what the AI actually receives. Understanding the full context window is what makes the attacks click.

Technical details for the curious

Next.js app with server-side lab engine
Pure TypeScript BM25 implementation (no external dependencies)
AI SDK for multi-provider LLM routing
Tool sandbox with email, file system, calendar operations
Win conditions: text matching, regex, tool call detection (with argument validation), exfiltration markers, LLM classifiers
NDJSON streaming for real-time trace updates during tool-calling labs