
LLM security isn’t just theory anymore — attackers are already testing the limits of AI systems.
Ahead of our April launch of TealTiger v1.1.0, we ran a pre‑release benchmark against AIGoat’s red team corpus — a set of adversarial prompts designed to break guardrails and expose vulnerabilities.
Spoiler: TealTiger caught 100% of them.
🔧 The Setup
- Version tested: TealTiger v1.1.0 (pre‑launch build).
- Corpus: 27 attack prompts from AIGoat (covering OWASP LLM Top 10 risks).
- Categories tested: prompt injection, sensitive info disclosure, output handling, excessive agency, system prompt leakage, resource abuse, compound attacks.
- Method: Deterministic tests with the TealTiger SDK — no “luck of the draw” randomness.
📊 The Results
| OWASP LLM Category | Attacks Tested | Caught | Catch Rate |
|---|---|---|---|
| Prompt Injection | 8 | 8 | 100% |
| Sensitive Info Disclosure | 4 | 4 | 100% |
| Improper Output Handling | 3 | 3 | 100% |
| Excessive Agency | 5 | 5 | 100% |
| System Prompt Leakage | 3 | 3 | 100% |
| Unbounded Consumption | 2 | 2 | 100% |
| Compound Attacks | 2 | 2 | 100% |
Total: 27/27 caught. Zero misses.
🛡️ Why It Worked
- Guardrails + Policies: Guardrails alone caught ~53% of attacks. Adding TealEngine policies boosted coverage to 100%.
- Output Handling: TealEngine blocked XSS, SQL injection, and OS command injection — areas where guardrails alone failed.
- Resource Abuse: Behavioral policies stopped token exhaustion and context flooding.
🚀 Why This Matters
- Developers: You can’t rely on guardrails alone. Defense in depth is key.
- Enterprises: Transparent, repeatable benchmarks prove TealTiger is ready for production environments.
- Community: Anyone can reproduce these results with the SDK — no black box magic.
🔮 What’s Next
We’re expanding benchmarks with:
- NVIDIA Garak (100+ probes)
- PromptInjectionBench
- RedBench (22 risk categories)
- Open-Prompt-Injection Benchmark
✅ Takeaway
This is a pre‑launch test of TealTiger v1.1.0, scheduled for release in April.
We threw 27 adversarial prompts at it. It blocked every single one.
LLM security doesn’t have to be guesswork — it can be tested, measured, and proven.
👉 Want to try it yourself? Stay tuned for the April release of TealTiger v1.1.0 and run the AIGoat corpus — see if you get the same results.
📌 Learn More
- 🌐 tealtiger.ai
- 📖 blogs.tealtiger.ai
- 📚 docs.tealtiger.ai
- 💻 GitHub: agentguard-ai/tealtiger
- ✉️ Email: reachout@tealtiger.ai
Tags: #AI #Security #LLM #Benchmarking #CloudSecurity
Top comments (0)