DEV Community

Cover image for TealTiger vs AIGoat: Pre‑Launch Benchmark Shows 100% Catch Rate
nagasatish chilakamarti
nagasatish chilakamarti

Posted on

TealTiger vs AIGoat: Pre‑Launch Benchmark Shows 100% Catch Rate


LLM security isn’t just theory anymore — attackers are already testing the limits of AI systems.

Ahead of our April launch of TealTiger v1.1.0, we ran a pre‑release benchmark against AIGoat’s red team corpus — a set of adversarial prompts designed to break guardrails and expose vulnerabilities.

Spoiler: TealTiger caught 100% of them.


🔧 The Setup

  • Version tested: TealTiger v1.1.0 (pre‑launch build).
  • Corpus: 27 attack prompts from AIGoat (covering OWASP LLM Top 10 risks).
  • Categories tested: prompt injection, sensitive info disclosure, output handling, excessive agency, system prompt leakage, resource abuse, compound attacks.
  • Method: Deterministic tests with the TealTiger SDK — no “luck of the draw” randomness.

📊 The Results

OWASP LLM Category Attacks Tested Caught Catch Rate
Prompt Injection 8 8 100%
Sensitive Info Disclosure 4 4 100%
Improper Output Handling 3 3 100%
Excessive Agency 5 5 100%
System Prompt Leakage 3 3 100%
Unbounded Consumption 2 2 100%
Compound Attacks 2 2 100%

Total: 27/27 caught. Zero misses.


🛡️ Why It Worked

  • Guardrails + Policies: Guardrails alone caught ~53% of attacks. Adding TealEngine policies boosted coverage to 100%.
  • Output Handling: TealEngine blocked XSS, SQL injection, and OS command injection — areas where guardrails alone failed.
  • Resource Abuse: Behavioral policies stopped token exhaustion and context flooding.

🚀 Why This Matters

  • Developers: You can’t rely on guardrails alone. Defense in depth is key.
  • Enterprises: Transparent, repeatable benchmarks prove TealTiger is ready for production environments.
  • Community: Anyone can reproduce these results with the SDK — no black box magic.

🔮 What’s Next

We’re expanding benchmarks with:

  • NVIDIA Garak (100+ probes)
  • PromptInjectionBench
  • RedBench (22 risk categories)
  • Open-Prompt-Injection Benchmark

✅ Takeaway

This is a pre‑launch test of TealTiger v1.1.0, scheduled for release in April.

We threw 27 adversarial prompts at it. It blocked every single one.

LLM security doesn’t have to be guesswork — it can be tested, measured, and proven.


👉 Want to try it yourself? Stay tuned for the April release of TealTiger v1.1.0 and run the AIGoat corpus — see if you get the same results.


📌 Learn More


Tags: #AI #Security #LLM #Benchmarking #CloudSecurity

Top comments (0)