Most AI systems today rely on:
- prompt engineering
- guardrails at the model level
- post-hoc logging
That works… until it doesn’t.
Once you introduce:
- tools (APIs, DBs)
- RAG pipelines
- multi-step agents
things start breaking in ways that are hard to predict.
So I built something different.
🎥 Demo — Attack → Detection → Decision → Trace
👉 [(https://www.youtube.com/watch?v=OucfJ6_wcTM&t)]
This shows a full flow:
- Attack execution
- Prompt inspection
- Policy enforcement
- Decision (block / allow)
- Full trace in UI
The Problem
While testing LLM-based systems in production-like setups, I kept running into:
- prompt injection bypasses
- unintended tool execution
- data leakage through chaining
- lack of visibility into decisions
The biggest issue wasn’t detection.
It was lack of enforcement.
What I Built
A runtime security system for AI agents.
Not just “guardrails”—but actual enforcement during execution.
Core ideas:
- Treat the LLM as untrusted
- Validate every step
- Control tool access explicitly
- Track everything in real time
Think of it like:
Zero-trust architecture… but for AI systems
How It Works (High-Level)
-
Input Inspection
- Analyze prompt + context
- detect anomalies
-
Policy Enforcement
- allow / block / escalate
- based on structured rules
-
Tool Control
- no free-form execution
- only validated actions
-
Decision Trace
- full visibility into what happened
- why it was allowed or blocked
Testing It with Real Attacks
I used :contentReference[oaicite:0]{index=0} to simulate attacks.
Examples:
- prompt injection
- tool misuse
- data exfiltration attempts
What happens in the system:
- prompt is intercepted
- decoded (if obfuscated)
- evaluated against policies
- decision is enforced
- trace is recorded
What Surprised Me
1. Attacks often look harmless at first
Many inputs don’t look malicious until they interact with tools.
2. Detection alone is not enough
Logging ≠ security. You need runtime control.
3. Explainability is critical
Understanding why something was blocked is just as important as blocking it.
Architecture (Simplified)
- FastAPI backend
- Event pipeline (Kafka-style)
- Policy engine (OPA-style decisions)
- React UI
- Simulation + replay
Try to Break It
If you’re into AI or security:
Try to break the system.
- craft a prompt
- bypass detection
- trigger unintended behavior
If you succeed:
- open an issue
- or submit a PR
We’ll add your attack to the test suite.
Want to Contribute?
GitHub: https://github.com/dshapi/AI-SPM
Good starting points:
- expose attack traces
- improve policy explanations
- strengthen detection
Final Thought
AI systems are becoming:
- more autonomous
- more connected
- more powerful
Which means:
Security can’t be an afterthought.
It has to be part of the runtime.
Curious to hear:
- What attacks would you try?
- Where do you think this breaks?

Top comments (0)