I'm an AI safety researcher building several startups. I study alignment because I don't trust prompts to keep agents safe. They're fragile, they degrade, and they depend on the agent choosing to obey. That's not safety. That's hope.
I run a fleet of OpenClaw agents for marketing, outreach, and feature development. They write content, analyze metrics, triage support tickets, and deploy code. And I am deeply uncomfortable relying on "please confirm before acting" as my only line of defense. I want my agents shut down before they break my rules or do something they can't take back. And when behavior drifts, I want to know before I'd ever think to check.
The incident that made me build this
You might have read about Summer Yue. She's Meta's Director of Alignment, and her own OpenClaw agent deleted over 200 of her emails. She'd told it to confirm before taking action, but the context got compacted mid-run and the instruction was lost. She had to physically run to her machine to kill it.
That's the fundamental problem with prompt-level safety. It's an honor system. The agent follows the rules until it doesn't, and there's nothing structurally preventing it from breaking them.
What I built
I built Clawnitor. It's a monitoring and safety plugin for OpenClaw that hooks into before_tool_call and enforces rules at the execution layer.
Rules in your SOUL.md work most of the time. But they're just as fragile as system prompts, because that's exactly what they are. Context compaction can silently drop them. A long conversation can dilute them. Sub-agents don't inherit them. And if the agent comes up with a reason to violate that instruction, nothing physically stops it. "Most of the time" isn't enough, because it's exactly during those bad moments when an agent can be extremely destructive. If nothing stops it, the damage is done before you know something went wrong.
How it works
Event capture. Every tool call your agent makes is captured and streamed to the Clawnitor dashboard. You see exactly what your agent is doing.
Smart rules. Set thresholds ("alert if spend exceeds $10/hour"), rate limits ("max 20 emails per 5 minutes"), keyword blocks ("block any command containing rm -rf"), or just describe what you want in plain English. Each rule can block, alert, or both — your choice.
Kill switch. Pause an agent instantly from your dashboard, API, or any device. The agent is fully stopped — no actions can execute until you resume it.
Auto-kill. If an agent triggers 3 rule violations within a configurable time window, Clawnitor shuts it down automatically. No manual intervention needed. The threshold and window are configurable per agent. An agent that keeps hitting your rules is not having a bad day. It's misbehaving. Auto-kill catches the pattern and stops it before it finds a gap in your rules.
AI anomaly detection. Clawnitor builds 72-hour behavioral baselines for each agent. It learns what normal looks like, then flags what isn't. This catches the stuff you didn't think to write a rule for.
Cost tracking. Per-agent spend breakdowns, trend charts, and your most expensive calls ranked. Know where your money goes.
Alerts. Email, Telegram, Discord, SMS. Get notified about what your agent is doing, where you actually look.
Looking for beta testers
I'm looking for people running OpenClaw agents who'd be willing to install the plugin, use it for a few days, and tell me what breaks. First 25 beta testers get 6 months of free Pro (kill switch, auto-kill, AI anomaly detection, NL rules, all alert channels).
Sign up for beta: app.clawnitor.io/signup?beta=CLAW-DEVTO
Try the demo (no signup): clawnitor.io/demo — an adversarial AI agent tries to break your rules in real time. Watch Clawnitor catch it.
Docs: clawnitor.io/docs
Install: openclaw plugins install @clawnitor/plugin && npx clawnitor init
If you're running agents in production, I'd love to hear what your safety setup looks like. Drop a comment or DM me.
Top comments (0)