ToxSec

Posted on May 22 • Originally published at toxsec.com

How to Run STRIDE-AI on Your AI Stack in One Pass

#security #ai #llm #threatmodeling

STRIDE-GPT takes your architecture description and spits out a full STRIDE threat model in one shot. But the tool only works if you know which assets to point it at. AI applications carry assets traditional threat modeling never covered: system prompts, RAG documents, tool descriptions, embedding stores, agent reasoning chains. Point STRIDE-GPT at the wrong diagram and you get a traditional app threat model with an LLM bolted on. Here's how to run it right.

What Changes When You Add an LLM

Traditional STRIDE assumes deterministic execution. Same input, same output. Clear trust boundaries between user, app, and data store. An LLM context window breaks all of that simultaneously. Developer instructions and attacker payloads both arrive as tokens through the same attention pipeline. There's no ring separation, no kernel mode, no privilege boundary the model actually enforces.

Your threat model needs to treat these as first-class assets:

System prompt (it will leak, design like it already has)
RAG retrieval corpus and every document inside it
Tool descriptions in any connected MCP server
Vector embeddings (treat them as plaintext, they can be inverted)
Agent reasoning chains and the full tool call sequence

Every place untrusted text can reach the context window is a trust boundary. Mark all of them before you run STRIDE.

Setting Up STRIDE-GPT

STRIDE-GPT is open-source and generates a STRIDE pass against your written architecture description with explicit OWASP LLM Top 10 support.

pip install stride-gpt

Write your architecture description before you open the tool. Include:

Every component: user, API gateway, orchestrator, model provider, tool set, data stores
Every data flow: where user input enters, how it reaches the model, what the model can write to
Every trust boundary: anywhere you'd draw a line between trusted and not trusted
Every tool the agent can invoke, including MCP servers and their descriptions

"An AI chatbot with RAG" gets you generic output. "A FastAPI app with a Pinecone RAG corpus, three MCP tools including a file write endpoint, and a GPT-4o backend behind an API gateway" gets you a threat model you can actually act on.

Covering Repudiation: Log the Full Context

Most agent frameworks log the final answer. That's not enough for any post-incident reconstruction worth running. For every agent decision you need a structured trace with five fields minimum:

span = tracer.start_span("agent_decision")
span.set_attribute("system_prompt_hash", hash(system_prompt))
span.set_attribute("retrieved_context_ids", json.dumps(chunk_ids))
span.set_attribute("tool_calls", json.dumps(tool_calls))
span.set_attribute("model_output", response)
span.set_attribute("session_id", session_id)
span.set_attribute("user_id", user_id)

Langfuse and Phoenix both wrap OpenTelemetry for LLM-native tracing. Sign or hash entries that touch privileged operations. Without the full context window logged, an attacker who poisons your agent's memory leaves no trace of when the state changed. The tampered state just sits there across sessions looking normal.

Covering Denial of Wallet: Three Layers

Request-based rate limits don't protect against token drain attacks. One multi-step agentic query can cost 500x more than a cached response and still register as one request against your rate limiter. The limiter never fires.

Layer 1: AWS Budgets with BudgetActions. When the daily ceiling hits, the API automatically revokes Bedrock invoke permissions. Hard kill, not an alert.

{
  "BudgetName": "bedrock-daily-cap",
  "BudgetLimit": { "Amount": "50", "Unit": "USD" },
  "BudgetType": "COST",
  "BudgetActions": [{
    "ActionType": "APPLY_IAM_POLICY",
    "ActionThreshold": {
      "ActionThresholdValue": 100,
      "ActionThresholdType": "PERCENTAGE"
    }
  }]
}

Layer 2: AI gateway enforcing per-key token-based rate limits in front of the model provider. Cloudflare AI Gateway, Portkey, and Helicone all support token counting. Count tokens, not requests.

Layer 3: Vendor-side caps at the model provider. OpenAI usage tiers, Anthropic spend limits, Google Cloud quotas. All three layers independently. Any single layer alone is a single point of failure.

Covering Elevation of Privilege: Scope With OPA

The model holds your tools' permissions. Prompt injection inherits them all. The only real fix is scope enforcement outside the model entirely.

Open Policy Agent at tool dispatch checks every invocation against an allowlist tied to the current session's user identity:

package tool_dispatch

default allow = false

allow {
  input.tool_name == permitted_tools[_]
  input.session.user_role == "standard"
}

permitted_tools := ["search", "read_file", "summarize"]

Destructive operations, deletes, writes, payments, external sends, get a requires_human_approval flag enforced at the dispatch layer before the call fires. The model never sees the approval token, so prompt injection can't bypass the gate by telling the model to approve itself.

Three Gotchas That Bite People

System prompt exposure. Anything you'd panic about on Pastebin doesn't belong in the prompt template. Pull credentials, internal URLs, and business logic from a real authorization layer at runtime. The prompt will be extracted eventually.

Embedding inversion. Vector databases store documents as numerical embeddings. Research has shown embeddings can be inverted back into the original text. If your vector store is reachable from any process holding an API key, you have an information disclosure problem regardless of how the documents are stored.

Threat model drift. Every MCP server you bolt on grants capabilities the original model never covered. Re-run STRIDE every time a new tool, RAG corpus, or data source gets connected. Twenty minutes of walkthrough beats a postmortem.

What You Can Ship Today

Run STRIDE-GPT against a written architecture description with all five AI-specific assets called out explicitly. Set one hard spending cap that kills the key. Add the six-field structured trace to your agent's decision loop. Those three changes close the highest-exposure gaps across Repudiation, Denial of Service, and Elevation of Privilege before anything else gets shipped.

I wrote the full STRIDE-AI breakdown including the seven production red flags, the copy-paste threat model prompt, and the complete three-layer denial-of-wallet circuit breaker over on the ToxSec Substack.

ToxSec covers AI security vulnerabilities, attack chains, and the offensive tools defenders actually need to understand. Run by an AI Security Engineer with hands-on experience at the NSA, Amazon, and across the defense contracting sector. CISSP certified, M.S. in Cybersecurity Engineering.

Top comments (2)

Harjot Singh • May 31

Applying STRIDE to an AI stack is a smart way to make AI security concrete instead of hand-wavy - the classic categories still map, they just get new attack surfaces: Spoofing becomes "which agent/identity is actually making this call," Tampering becomes prompt/context injection, Information Disclosure becomes training-data or context leakage, Elevation of Privilege becomes an agent with one over-broad API key, and DoS becomes token-exhaustion / cost-bombing. Forcing an AI system through a structured threat model is exactly what teams skip when they ship the demo, so a one-pass method that makes it tractable is genuinely useful.

The category I'd flag as the sleeper is Repudiation, because it maps onto the attribution problem in multi-agent systems - when an action goes wrong, can you prove which agent/identity did it? Without that you can't contain or even diagnose. That structural-threat-modeling instinct is core to how I build Moonshift, the thing I work on - a multi-agent pipeline that takes a prompt to a deployed SaaS, where agents get scoped capabilities and a verify layer gates actions, so the STRIDE surface is constrained by design. Multi-model routing keeps a build ~$3 flat, first run free no card. Solid methodology. In your one-pass run, which STRIDE category surfaced the most findings on a real AI stack - injection (Tampering), or the identity/privilege ones? Curious where the actual risk concentrates in practice.

ToxSec • Jun 26

appreciate the breakdown, and yeah, repudiation is the sleeper for exactly the reason you said. attribution in multi-agent systems is brutal. when agent C does something dumb, proving it was C and not B handing C a poisoned context is the whole forensics problem. that's the part nobody logs until they need it and it isn't there.
on your question, tampering lights up the most in practice, injection just has the widest surface. but the identity/privilege findings are the scarier ones because they're quieter. an over-scoped key doesn't throw errors, it just works, right up until it works for the wrong caller.