DEV Community

yhc
yhc

Posted on

I Spent the Last Few Days Testing AI Agents and Got Scared — So I Built Sentinel v0.3.0

Hey dev.to community 👋

Over the past few days, I’ve been running dozens of AI Agent experiments. The more powerful they got, the more nervous I became.

They can do amazing things, but they’re also shockingly easy to jailbreak, abuse tools, or quietly exfiltrate data. And once they go rogue? Traditional “safety inside the agent” approaches just don’t cut it.

So I decided to solve it differently.

The Solution: Sentinel v0.3.0 “The Shield Release”
I pulled the entire security layer completely outside the agent using an independent Shield Sidecar process.
The agent literally cannot see it or kill it. Every risky action (shell commands, file I/O, API calls, etc.) must request permission from the Shield first.

Key Features
• Shield Sidecar — True out-of-band protection + instant SIGKILL + forensic snapshot
• Deterministic Shadow Sandbox — Preview any action safely before it touches your real system
• Red Team Engine — 34 attack vectors, auto scoring (0-100)
• EU AI Act Compliance — One-click report generation
• Python @protect decorator + LangChain plugin
• Clean dashboard for monitoring + one-click kill

Full project here: https://github.com/byte271/Sentinel
Still very early (v0.3.0 just dropped), but I’m really happy with the direction — strong focus on determinism, auditability, and local-first.
If you’re building AI Agents, I’d love your honest feedback. What safety problems are you running into the most?
Let me know in the comments! 🔥

Top comments (1)

Collapse
 
harjjotsinghh profile image
Harjot Singh

"Tested agents, got scared, built a guardrail" is basically the origin story of every serious agent-safety tool - the fear is well-founded because the failure modes are quiet (an agent confidently does the wrong thing and reports success) and the blast radius scales with the permissions you gave it. Building Sentinel as a response is the right instinct: you don't make agents safe by hoping they behave, you make them safe with an external layer that watches and constrains.

The question that decides whether a sentinel-type tool actually helps: is it observe-only (alerts you after) or enforce (can block/halt before the bad action lands)? Observability is necessary but alerts fire after the damage; the high-value version intercepts - this action exceeds policy, halt. Detection + enforcement, not just detection. That intercept-before-it-counts model is the spine of how I build Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) - gates that block consequential actions, not just log them. Cool project, and the right thing to be building right now. Is Sentinel monitoring/alerting, or can it actually halt an agent mid-action? The enforcement side is where it goes from useful to essential.