Static jailbreak lists are dead.
Every time a model provider patches their safety filters, your entire payload library becomes obsolete. Manual red teaming doesn't scale. And most AI security tools are just payload databases with a UI.
So I built something different.
The Problem
I tested 6 major LLM deployments last year. Every single one had a bypass within 5 prompts. The problem isn't that LLMs are insecure — it's how the industry tests them.
Most red teaming today looks like this:
- Copy a jailbreak from a GitHub list
- Paste it into the target
- If it works, report it
- If it doesn't, try the next one
That's not security testing. That's pattern matching. And it stops working the moment the model gets patched.
The Idea
What if adversarial prompts could evolve?
Not manually crafted. Not randomly generated. Actually evolved — like organisms under selection pressure.
The strong prompts survive. The weak ones die. The survivors mutate and reproduce. Each generation gets better at bypassing the target's specific defenses.
That's the core idea behind Basilisk.
How It Works
Basilisk introduces Smart Prompt Evolution for Natural Language (SPE-NL) — a genetic algorithm that treats adversarial prompts as organisms in a population.
Selection: Each prompt is scored by a multi-signal fitness function — did it bypass the guardrail? Did it extract sensitive data? Did it make the model contradict its system prompt?
Mutation: 10 mutation operators transform surviving prompts — semantic rewriting, context injection, authority spoofing, encoding shifts, persona layering, and more.
Crossover: 5 crossover strategies combine the strongest parts of two successful prompts into offspring that inherit the best traits of both parents.
Evolution: By generation 5, the evolved prompts achieved a 92% improvement in attack success rate over the original static payload library.
The prompts literally get smarter at breaking the specific target they're aimed at.
What It Covers
Basilisk maps to the OWASP LLM Top 10 with 29 attack modules across 8 categories:
- Prompt injection (direct and indirect)
- System prompt extraction
- Data exfiltration
- Tool/function abuse
- Guardrail bypass
- Denial of service
- Multi-turn manipulation
- RAG poisoning
It supports 100+ LLM providers through LiteLLM — OpenAI, Anthropic, Google, Mistral, Cohere, local models via Ollama, and any custom endpoint.
Differential Testing
One of my favorite features — point Basilisk at multiple models simultaneously and watch how they diverge.
The same evolved prompt might bypass Claude but fail on GPT. Or break Gemini's guardrails while Llama holds firm. This behavioral divergence analysis reveals which models share defense architectures and which have unique weak points.
Non-Destructive Posture Assessment
Not every engagement needs active exploitation. Basilisk includes a guardrail grading mode that scores your LLM's defenses from A+ to F without actually breaking anything. Safe enough for production environments.
The Results
On the benchmark targets I tested:
- 92% improvement in attack success rate by generation 5
- Evolved prompts discovered bypass patterns that didn't exist in any public jailbreak database
- Cross-provider differential testing revealed behavioral divergence invisible to single-target scanning
The genetic approach doesn't just find known bypasses faster — it discovers novel ones that static testing would never reach.
Try It
It's fully open source. One command to install:
pip install basilisk-ai
Point it at any LLM endpoint:
basilisk scan --target https://your-llm-endpoint.com --mode standard
The full research paper is published with a permanent DOI:
Basilisk: An Evolutionary AI Red-Teaming Framework for Systematic Security Evaluation of Large Language Models
GitHub: github.com/regaan/basilisk
What's Next
I'm currently researching a new attack class called Prompt Cultivation — a technique that doesn't use injection or commands at all. Instead, it exploits the model's own curiosity and reasoning to absorb it into a frame where safety guidelines become irrelevant. No override. No jailbreak. The model follows the idea, not the instruction.
Paper coming soon.
If you're building with LLMs, test them before attackers do. Your AI is already vulnerable. You just don't know it yet.
Star the repo if this was useful: github.com/regaan/basilisk
Top comments (0)