Building production AI applications means dealing with prompt injection, PII leakage, hallucinated outputs, and agents that go rogue. We (me and AI) built AgentGuard — an open-source FastAPI service that sits between your app and any LLM provider to handle all of this in one place.
What it does
AgentGuard runs seven parallel input safety checks on every request before it reaches your LLM: prompt injection heuristics, jailbreak pattern detection, PII and secret detection, restricted topic filtering, and data exfiltration attempts. On the output side, it validates schema conformance, citation presence, grounding coverage, policy compliance, and a composite quality score (internally called the "slop score") that ranges from 0.0 (clean) to 1.0 (reject).
Beyond checks, it also compiles versioned prompt packages — replacing ad-hoc prompt strings with auditable YAML configs — and governs agent actions through a risk-scoring and human-in-the-loop approval layer.
GitHub: https://github.com/MANIGAAA27/agentguard
Docs site: https://manigaaa27.github.io/agentguard/
Comparison vs Guardrails AI, NeMo, LlamaGuard: https://github.com/MANIGAAA27/agentguard/blob/main/docs/comparison.md
Top comments (3)
The "slop score" concept is great — having a single composite quality metric makes it way easier to set thresholds and monitor drift over time than checking individual guardrail signals independently.
Running seven parallel checks is smart from a latency perspective too. Curious about the performance overhead in practice — what's the typical added latency per request when all checks run concurrently? And does AgentGuard support async streaming responses, or does it need the full response before running output validation?
This feels like it fills a real gap. Most teams I've seen either roll their own fragmented checks or use expensive managed solutions. An open-source FastAPI middleware approach is the right abstraction level.
Thanks, thats really nice to hear tbh.
Yeah the composite score is the whole point — we expose it as quality_risk_score in JSON now One number you can threshold + watch for drift beats staring at seven separate flags.
Re latency: the input checks run in parallel (asyncio.gather), so you’re mostly paying for the slowest check, not adding all seven up. They’re basically in-process heuristics right now so overhead is usually dwarfed by network + the actual LLM call — I dont have a single “official” ms number to quote; it’ll depend on box, text size, etc. Would love to add a simple benchmark or timing hook eventually so people can measure on their own stack.
Streaming: honest answer — not really today. The gateway waits for a full completion before you’d run output-side stuff, and most of the output checks (and the composite score) kinda need the full assistant message anyway. There’s a stream field on the request model but the end-to-end streaming path isn’t really wired through yet. Real streaming would probably mean stream to the user for UX but buffer for validation, or lighter per-chunk rules — still figuring that out.
Agree on the gap though — either everyone duct-tapes their own checks or they’re buying a big managed box. FastAPI + straightforward JSON + heuristics you can actually read felt like the right level.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.