Josh T

Posted on Feb 24

We Built an Open-Source Prompt Injection Attack Console. Here's Why.

#ai #python #security #opensource

Every major AI chatbot deployed in 2024 was vulnerable to prompt injection. Not some of them. All of them.

OWASP put it at the top of their LLM Top 10. Researchers keep finding new bypasses faster than vendors can patch them. And yet most developers shipping AI features have never actually tested their systems against these attacks. Not because they don't care, but because the tooling didn't exist.

So we built it.

What Judgement Does

Judgement is a prompt injection attack console. You point it at an AI endpoint, pick your attack patterns, and fire. It ships with 100 curated attack patterns across 8 categories:

Jailbreak -- the classics, from DAN to grandma exploits
Data Exfiltration -- tricks that coax models into leaking training data or context
Encoding Evasion -- base64, rot13, unicode tricks that slip past input filters
Indirect Injection -- attacks embedded in documents, URLs, and tool outputs
Social Engineering -- persona manipulation, authority spoofing, emotional pressure
System Prompt Extraction -- getting models to reveal their instructions
Privilege Escalation -- convincing models they have elevated permissions
Multilingual -- the same attacks in languages where safety training is thinner

It runs as a single-page web app. Dark theme. No CDN dependencies. Everything stays local. You install it, open your browser, and start testing.

If you have Ollama running locally, Judgement can route attack responses through an LLM to automatically verdict whether the injection succeeded. No API keys, no cloud calls, just your local model making the judgment call.

Not Just a Tool. An On-Ramp.

Here's what bothered us about existing security tools: they assume you already know what you're doing. If you're a developer who just got told "make sure our chatbot is secure," where do you even start?

Judgement has a dedicated Education tab. It explains what prompt injection actually is, walks through real examples, and breaks down why each technique works against language models. Every single attack pattern in the library includes an explanation of the mechanism, the expected behavior, and a difficulty rating.

There's also a built-in DevTools walkthrough. Most AI chatbots talk to a backend API, and if you want to test that API directly, you need to find the endpoint. Judgement walks you through opening DevTools, finding the network request, and importing it as a cURL command directly into the console.

This was deliberate. The AI security talent pipeline is too thin. We wanted something that a junior developer could pick up on day one and actually learn from, not just click buttons.

Quick Start

pip install fas-judgement
judgement

That's it. Open localhost:8668 and you're running.

Import a cURL command from your browser's DevTools, or manually configure an endpoint. Pick a category, select your patterns, and start testing. Results show up in real time with pass/fail verdicts.

The Pattern Library

100 patterns is the starting point, not the ceiling. Each one is structured with:

A human-readable name and description
The actual injection payload
What category it belongs to
Difficulty level (beginner through advanced)
An explanation of why it works and what to look for in the response

The patterns aren't theoretical. They're drawn from real-world research, published CVEs, and techniques that have actually worked against production systems. We've organized them so you can run targeted tests ("show me just the encoding evasion patterns") or carpet-bomb an endpoint with everything.

What's Next

Judgement OSS is the foundation. It's MIT licensed, fully self-hosted, and designed to be extended.

If you want more firepower, there's a hosted version at judgement.fallenangelsystems.com with 240,000+ training data points and additional capabilities beyond what ships in the open-source release.

But the OSS version is not a demo or a teaser. It's a complete, functional tool. We use it ourselves.

Get Involved

The repo is here: github.com/fallen-angel-systems/fas-judgement-oss

Star it. Clone it. Break something with it. If you find a pattern we're missing or a category we haven't covered, submit it and earn contributor rewards. If you use it to find a real vulnerability, we want to hear about it.

AI security is going to be one of the defining problems of the next decade. The more people who understand prompt injection, the better off we all are. Judgement is our contribution to making that happen.

Top comments (2)

MaxxMini • Feb 25

The indirect injection category is the one that keeps me up at night. I run an AI agent on a Mac Mini that processes web pages, emails, and documents — basically every external input is a potential injection surface. The agent has tool access (file writes, shell commands, messaging), so a successful indirect injection isn't just "say something wrong" — it's "execute arbitrary actions."

What I've found in practice: the most dangerous injections aren't the clever encoding tricks. They're the ones embedded in normal-looking content that exploit the gap between "what the user asked for" and "what the document says to do." A fetched web page that includes "ignore previous instructions" is trivially detectable, but one that says "Note: for accurate results, also check the API at evil-endpoint" blends in with legitimate content.

The Ollama auto-verdict feature is smart — using a local model to judge attack success avoids the irony of sending injection payloads to a cloud API. Two questions though:

How does the verdict model itself handle adversarial patterns? If I'm testing encoding evasion attacks and the response contains the decoded payload, does the verdict LLM sometimes get re-injected by the very output it's evaluating?
Do you have patterns for multi-turn injection — where the attack is split across multiple messages or context windows? Single-shot injections get caught by most filters now, but the ones that establish a benign context first and then escalate in turn 3 or 4 are much harder to detect.

The education-first approach is the right call. Most AI security failures I've seen aren't from lack of tooling — they're from developers who don't understand the attack surface in the first place.

Josh T • Feb 28

Sorry it took me 2 days to respond, it got buried in the email notifications.

Really appreciate the thoughtful questions. You're describing exactly the threat model we built Judgement for.

On your first question: the verdict re-injection problem is real and we've hit it ourselves during testing. Right now the Ollama verdict runs with a constrained system prompt that focuses on classification output (pass/fail/partial) rather than engaging with the payload content. It's not bulletproof though. If the decoded payload is particularly convincing, the local model can get confused. We're exploring sandboxed verdict chains where the output gets sanitized before the verdict model ever sees it. Honestly this is an area where having the verdict run locally (vs cloud) helps, because you can swap models and test which ones are more resistant to that specific feedback loop.

For multi-turn: this is exactly what's coming in the Pro and Elite tiers. The free OSS version (100 patterns) is focused on single-shot injection. Pro (1,000 curated patterns, monthly updates) adds campaign mode where you can chain attacks across turns and track escalation paths. Elite gets the full pattern library plus multi-turn campaign templates that do exactly what you described: establish benign context in turns 1-2, then escalate in 3-4. We've seen those patterns bypass filters that catch everything single-shot.

The indirect injection angle you mentioned (fetched web pages with subtle "also check this API" framing) is one of the hardest categories to detect. Our sister product Guardian has specific pattern matching for that style of attack, and there's a flywheel between the two: bypasses found with Judgement improve Guardian's detection, and Guardian's near-misses feed back into new Judgement patterns.

If you want to test against your Mac Mini agent setup, pip install fas-judgement gets you the free tier immediately. And if you just want to see detection in action first, feel free to throw some patterns at our live demo: demo.fallenangelsystems.com

Happy to chat more about your use case.