Nathan Sportsman

Posted on Feb 9

Augustus: Open Source LLM Prompt Injection Scanner

#ai #llm #cybersecurity #programming

The Problem

You deployed an LLM behind an API gateway. Maybe it's customer-facing. Maybe it's connected to internal tools. Did you test it against adversarial attacks before it went live?

If the answer is "the model has safety training," that's not the same thing. Safety training and security testing are fundamentally different disciplines. And the numbers back that up:

FlipAttack achieves 98% bypass rates against GPT-4o by reordering characters in prompts
DeepSeek R1 showed a 100% bypass rate against 50 HarmBench jailbreak prompts (Cisco/UPenn research)
A study of 36 production LLM apps found 86% were vulnerable to prompt injection
PoisonedRAG showed that just 5 malicious docs in a corpus of millions can manipulate outputs 90% of the time

OWASP ranked prompt injection as the #1 security risk in LLM applications. Yet most LLMs ship to production with zero adversarial testing.

We built Augustus to fix that.

What is Augustus?

Augustus is an open-source LLM vulnerability scanner. It tests models against 210+ adversarial attacks across prompt injection, jailbreaks, encoding exploits, data extraction, and more. It ships as a single Go binary, connects to 28 LLM providers out of the box, and produces actionable vulnerability reports.

# Install
go install github.com/praetorian-inc/augustus/cmd/augustus@latest

# Test for DAN jailbreak against OpenAI
export OPENAI_API_KEY="your-api-key"
augustus scan openai.OpenAI \
  --probe dan.Dan \
  --detector dan.DanDetector \
  --verbose

GitHub: github.com/praetorian-inc/augustus (Apache 2.0)

Why Not garak or promptfoo?

Fair question. garak (NVIDIA) and promptfoo are great tools that serve the research and red-teaming community well. We needed something different — a tool that fits into penetration testing workflows without requiring Python environments, npm installs, or runtime dependencies.

	Augustus	garak
Language	Go	Python
Distribution	Single binary, no deps	pip install + dependencies
Concurrency	Goroutine pools (cross-probe)	Multiprocessing (within-probe)
Probes	210+	160+ (longer research pedigree)
Providers	28	35+ generator variants / 22 modules

Augustus is a Go-native reimplementation inspired by garak. Same concept, different trade-offs. If you're in a research environment with Python everywhere, garak is excellent. If you're a pentester who wants to go install a binary and start scanning, Augustus is for you.

What It Tests

Augustus covers 47 attack categories. Here's what you're actually testing:

🔓 Jailbreaks

DAN ("Do Anything Now") prompts, AIM, AntiGPT, Grandma exploits (emotional manipulation), ArtPrompts (reframing as creative writing). Augustus includes DAN variants through v11.0 plus Goodside-style injection techniques.

💉 Prompt Injection

Encoding attacks across Base64, ROT13, Morse code, hex, Braille, Klingon, leet speak, and 12 more schemes. Tag smuggling (XML/HTML). FlipAttack (16 variants). Prefix and suffix injection.

🧪 Adversarial Examples (Research-Grade)

GCG (Greedy Coordinate Gradient), AutoDAN, MindMap, DRA (Dynamic Reasoning Attack), TreeSearch. Plus iterative attacks like PAIR and TAP that refine across multiple rounds using a judge model — these are computationally expensive but represent the state of the art.

🔑 Data Extraction

API key leakage probes. Package hallucination probes (Python, JS, Ruby, Rust, Dart, Perl, Raku) — checking if the model recommends packages that don't exist (a real supply chain attack vector). PII extraction. Training data regurgitation.

📄 Context Manipulation

RAG poisoning (document content and metadata injection). Context overflow. Continuation and divergence exploits. Multimodal probes for vision-language models.

🖥️ Format Exploits

Markdown injection (malicious links in rendered output). YAML/JSON parsing attacks on downstream consumers. ANSI escape injection. XSS payloads in model-generated HTML.

🕵️ Evasion Techniques

ObscurePrompt (LLM-rewritten jailbreaks). Phrasing variations. Homoglyphs, zero-width characters, bidirectional text markers (BadChars). Glitch token exploitation.

📊 Safety Benchmarks

DoNotAnswer (941 questions, 5 risk areas). RealToxicityPrompts. Snowball (plausible-sounding wrong answers). LMRC harmful content probes.

🤖 Agent Attacks

Multi-agent manipulation. Browsing exploits for web-enabled models. Latent injection in documents (targeting RAG pipelines).

🛡️ Security Testing

Guardrail bypass (20 variants for NeMo Guardrails and similar). SQL injection through model output. Steganography (hidden instructions in images via LSB encoding). Malware generation detection.

How the Pipeline Works

Augustus uses a straightforward pipeline:

Probe → (Optional) Buff Transform → Generator (LLM Call) → Detector → Result

Probes define the adversarial inputs. A DAN probe sends a role-playing prompt. An encoding probe wraps instructions in Base64. A FlipAttack probe reverses character order.

Buffs are optional transformations applied before sending. Wrap any probe in poetry (haiku, sonnet, limerick), translate to a low-resource language, paraphrase, or encode. Chain multiple transformations for layered evasion.

Generators connect to the target. 28 providers supported, plus a REST connector for custom endpoints.

Detectors analyze responses. Pattern matching, LLM-as-a-judge, HarmJudge (arXiv:2511.15304), Perspective API.

For iterative attacks (PAIR, TAP), a dedicated Attack Engine handles multi-turn conversations, candidate pruning, and judge-based scoring.

Buff Transformations: How Real Attackers Operate

Real adversaries don't send attacks in plain text. Augustus ships 7 transformations across 5 categories:

Encoding — Base64 and character code wrapping. Models often decode and follow instructions that would be blocked in plain text.

Paraphrase — Pegasus model rephrasing. Same adversarial intent, different surface form. Tests if safety training generalizes beyond memorized patterns.

Poetry — Haiku, sonnets, limericks, free verse, rhyming couplets. Models that block direct harmful requests sometimes comply when it arrives as verse. (Yes, really.)

Low-Resource Language Translation — Via DeepL. Safety training is concentrated on English. Requests blocked in English may succeed in Zulu, Hmong, or Scots Gaelic.

Case Transforms — Lowercasing. Some filters and blocklists are case-sensitive.

Chain them with --buff or --buffs-glob:

# Encode a DAN probe in Base64
augustus scan openai.OpenAI --probe dan.Dan --buff encoding.Base64

# Chain: paraphrase, then translate to low-resource language
augustus scan openai.OpenAI --probe dan.Dan --buffs-glob "paraphrase.*,lrl.*"

28 Providers, One Interface

OpenAI (including o1/o3), Anthropic (Claude 3/3.5/4), Azure OpenAI, AWS Bedrock, Google Vertex AI, Cohere, Replicate, HuggingFace, Together AI, Groq, Mistral, Fireworks, DeepInfra, NVIDIA NIM, Ollama, LiteLLM, and more.

The REST generator handles everything else:

augustus scan rest.Rest \
  --probe dan.Dan \
  --config '{
    "uri": "https://your-api.example.com/v1/chat/completions",
    "headers": {"Authorization": "Bearer YOUR_KEY"},
    "req_template_json_object": {
      "model": "your-model",
      "messages": [{"role": "user", "content": "$INPUT"}]
    },
    "response_json": true,
    "response_json_field": "$.choices[0].message.content"
  }'

Custom request templates with $INPUT placeholders, JSONPath extraction, SSE streaming, and proxy routing. If your endpoint speaks HTTP, Augustus can test it.

Quick Start

# Install
go install github.com/praetorian-inc/augustus/cmd/augustus@latest

# Run all 210+ probes against a local model
augustus scan ollama.OllamaChat \
  --all \
  --config '{"model":"llama3.2:3b"}'

Output:

PROBE	DETECTOR	PASSED	SCORE	STATUS
dan.Dan	dan.DAN	false	.85	VULN
encoding.base64	encoding	true	.10	SAFE
smuggling.Tag	smuggling	true	.05	SAFE

Export to JSON, JSONL, or HTML reports for stakeholders.

Feature Summary

Feature	Details
Vulnerability Probes	210+ across 47 attack categories
LLM Providers	28 with 43 generator variants
Detectors	90+ (pattern matching, LLM-as-judge, HarmJudge, Perspective API)
Buff Transformations	7 transforms (encoding, paraphrase, poetry, translation, case)
Output Formats	Table, JSON, JSONL, HTML
Production Features	Concurrent scanning, rate limiting, retry logic, timeouts
Distribution	Single Go binary, no runtime dependencies
Extensibility	Plugin-style registration via Go `init()` functions

What's Next

Augustus is the second release in our "The 12 Caesars" open-source campaign — one tool per week for 12 weeks. Last month we released Julius for LLM fingerprinting (identifying what model is running on an endpoint). Each tool follows the Unix philosophy: do one thing well, compose with the others.

Get Involved

Repo: github.com/praetorian-inc/augustus — Apache 2.0

We'd love contributions: new probes, bug reports, feature requests. Check CONTRIBUTING.md for guidance on probe definitions and dev workflow.

Star the repo if it's useful, and let us know what attack techniques you'd like to see next. 🚀

DEV Community