Ruslan Murtuzaliyev

Posted on Mar 18

Kaspersky Found 512 Bugs in OpenClaw. So I Built a Monitor to Catch AI Agents Misbehaving.

#ai #opensource #security

How this started

I didn't plan to build a security tool. I'm a CS student in Toronto. My February plans involved catching up on assignments, maybe learning some Rust.

Then OpenClaw went viral.

If you missed it: OpenClaw is an open-source AI agent that hit 20,000 GitHub stars in 24 hours. It connects to your WhatsApp, email, calendar, terminal. It runs 24/7. It writes its own code for tasks it hasn't seen before.

Kaspersky audited it:

512 vulnerabilities. Eight critical. A researcher got into Anthropic API keys, Telegram tokens, full admin access on exposed instances with Shodan. SecurityScorecard counted 135,000+ instances on the public internet, zero auth. More than 15,000 were vulnerable to remote code execution.

820 out of 10,700 ClawHub skills were malware.

I read the Kaspersky report in my dorm and realized something basic: there's no tooling for this. Antivirus for malware, sure. Firewalls for networks. EDR for endpoints. But for AI agents running code on your machine with full disk access?

Nothing existed. So I started building...

What it does? (demo)

Aegis sits between your AI agent and your OS. It polls and diffs process trees, watches the filesystem via chokidar, and logs network activity through OS-level APIs, all in user-space, no drivers required.

It watches four things:

1) Processes:
Every spawn, every child process, every shell command. When Claude Code runs npm install, you see it. When something tries to curl a domain you don't recognize, you see that too.

2) Files:
Real-time filesystem monitoring via chokidar. What's being read, written, created, deleted. Configurable rules flag when anything touches .env, .ssh/, or your credentials directory.

3) Network:
Outbound connections, DNS lookups, data leaving your machine. This is the one that gets the most attention from testers — knowing exactly where your agent sends data.

4) Behavior:
68 detection rules match against known risky patterns. Each agent gets a trust score, 0 to 100, updated live.

The rule engine

This is where Aegis went from personal hack to something shareable. Rules are defined in a structured format, an example of what one looks like:

    yaml- id: AI012
    name: Sensitive Config Access
    category: filesystem
    severity: high
    pattern: "\\.env|\\.ssh|credentials|secret"
    description: Agent accessing sensitive configuration files
    riskModifier: 15

On startup, the loader compiles patterns to RegExp, caches them in a Map, builds a categoryIndex for O(1) lookups by category.
Rules hot-reload - edit the source, they update without a restart.

68 rules cover filesystem access, network patterns, process behaviors, and agent-specific signatures including OpenClaw.

What I got wrong:

1: Chokidar globs. I used glob patterns in ignored. Lost two days to events either missing or crashing. The issue is documented but not obvious. Function-form fixed it immediately.

2: Tautological tests. I wrote 12 tests for formatBytes, all green, looked great. A contributor pointed out every single test checked static input against static output. No edge cases. No boundaries. No negative numbers, no zero, no floats. She rewrote them into 25 tests that actually caught bugs. I merged it the same day.

3: Backdrop-filter stacking. I put backdrop-filter: blur() on 33 elements for the glassmorphism look. Frames dropped. I didn't profile for a week because the rest of the UI was "fast enough." Eventually measured it — 33 composited elements is just too many. Cut it to 5, kept the visual effect where it mattered.

4: Pushing to master. "It's just a docs change." Lint-staged v16 has a bug on markdown-only commits. CI failed. Now I have a pre-commit hook that blocks edits on master. Should have had that from the start.
My own security bugs. During a hardening pass I found HTML injection in IPC channels, path traversal bypassing file access rules, and no protection against LLM prompt injection. Three PRs, 112 new tests. If you're building anything that talks to an AI model over IPC, audit it. You'll find something.

5: My own security bugs. During a hardening pass I found HTML injection in IPC channels, path traversal bypassing file access rules, and no protection against LLM prompt injection. Three PRs, 112 new tests. If you're building anything that talks to an AI model over IPC, audit it. You'll find something.

Stack

 Electron 33       Desktop shell
 Svelte 5 (runes)  43 components, pure CSS, no UI libraries
 Vite 7             Build: ~1.5s
 Vitest 4           707 tests, 44 files
 TypeScript         Strict, incremental migration from JS
 chokidar 3.6       File watching

The main process is 28 CommonJS modules (23 core + 5 platform-specific) loaded directly by Node — no build step. CJS gets grief, but Electron's main process doesn't need a bundler, and I haven't spent a single hour debugging module resolution. I'll migrate when Electron's ESM support is stable.

The renderer is 43 Svelte components with all animations constrained to transform and opacity, GPU composited only. No layout thrashing.

IPC is batched at 1000ms. Stats use running counters, not recalculated from raw data. Startup staggers monitors at 3, 8, and 12 seconds so nothing blocks first paint. Bundle ships at 60kB gzipped JS. Idle memory sits around 180-220MB (Electron baseline + monitoring overhead).

What Aegis can't do

No kernel-level monitoring. Everything is user-space polling. A sophisticated agent or malware with privilege escalation can bypass this , Aegis won't catch what it can't observe. This is a visibility tool for understanding agent workflows, not a defense against APTs. ETW on Windows and eBPF on Linux are on the long-term roadmap.

Windows gets the most testing. macOS and Linux work — a contributor shipped the cross-platform layer in PR #37 — but I develop on Windows, so that's where the edge cases get caught first.

No rule editor UI. You edit rules by hand. The IPC is ready (getRulesByCategory() and onRulesReloaded() already exist), the frontend isn't built yet.

Single machine. No fleet management. No cloud console. Aegis is a local tool. It tells you what happened on this computer.

Monitoring, not prevention.

Will be added in future updates. But now, Aegis shows you what an agent did. It doesn't block it. If you're running OpenClaw in production, you need an isolated VM. This tool is a camera, not a lock.

Roadmap

v0.10.0-alpha  ← current

Next           Spawn hardening (child_process security)
               Rules UI (visual editor in the app)
               TypeScript migration — 9 remaining files

Later          UtilityProcess for scan loop
               Ring buffers + OOM hardening

Future         ML anomaly detection
               z-score deviation from baseline agent behavior

Long-term      ETW / eBPF kernel-level hooks
               Rust N-API modules for hot paths

The ML layer is what I keep thinking about. Pattern-matching catches known bad behavior. But what about unknown bad behavior? An agent that usually reads five files per minute suddenly reading 500, that's a deviation you can catch with statistical methods, no rule required.

Build a baseline, flag anomalies.

Try it:

git clone https://github.com/antropos17/Aegis.git
cd Aegis
npm install
npm start

Starts in demo mode with simulated agent traffic. Poke around.

Or skip install: live web demo — runs in the browser, no setup.

By the numbers:

Tests - 707 pass,0 fail
Test files - 44
Svelte components - 43
Main process modules - 28 (23 core + 5 platform)
Detection rules - 68
Known agents - 107
tsc errors - 0
any types - 0
ESLint errors - 0
Build - ~1.5s
JS bundle (gzip) - 60 kB
License - MIT

What I need

Stars matter for open-source visibility.

If this seems useful,please STAR the repo.

Beyond that: install it, break it, file issues. There are some good-first-issues if you want to contribute code. If you know an agent's risky patterns, write a detection rule, I'll review and merge.

What agent behavior would you want to detect first? Curious what rules people would write.

GitHub · Demo · Landing Page

Top comments (7)

John Sun • Mar 18

This is exactly what I've been looking for. I've been running Claude Code and Cursor daily and had zero visibility into what they're actually doing with my filesystem. Just cloned it, the rule engine is clean. Question: are you planning to add custom alert thresholds per agent? Some of my tools are legitimately noisy and I'd want different baselines for each.

Ruslan Murtuzaliyev • Mar 18

Thanks! Per-agent thresholds aren't in yet but it's directly where the ML roadmap is heading. Right now every agent gets scored against the same 68 rules. The plan is to build per-agent baselines, track normal behavior for each tool over time, then flag deviations with z-score anomaly detection. So Claude Code reading 200 files/min would be normal, but your custom script doing the same thing would trigger an alert. If you have ideas on what signals matter most for your workflow I'd genuinely love to hear them, that kind of input shapes what gets built next.

Mykola Kondratiuk • Mar 22

the monitoring angle is interesting because most people focus on pre-deployment hardening but agents drift in production - new data, updated prompts, tool behaviors change. I ran into this building some AI tools where the agent started doing things that were technically allowed by my rules but clearly wrong in context. hard to catch without actually watching what it does over time. what does your monitor flag most often in practice?

Apex Stack • Mar 23

The network monitoring angle is the part that resonates most with me. I run about 10 AI agents daily for a programmatic SEO site — they handle everything from search console audits to content generation across 12 languages. The filesystem access patterns alone would be fascinating to profile, because some of these agents legitimately need to read thousands of files in a single run while others touching the same paths would be a red flag.

Your point about agents drifting in production (Mykola's thread above hits this too) is something I've experienced firsthand. An agent that was supposed to check indexing status started making unexpected network calls because the prompt context shifted slightly between runs. Without logs, I only caught it because the output file was wrong. Something like Aegis sitting in between would have flagged the network deviation immediately.

Curious whether you've thought about a headless/CLI mode for server environments. A lot of agent workflows run on VMs or CI pipelines where there's no desktop to render Electron. Even a stripped-down daemon that just writes structured JSON logs with the trust scores would be incredibly useful for anyone running agents in production.

Nube Colectiva • Mar 19 • Edited

That's interesting! I also read that they found some bugs in Claude. I hope everything gets fixed; these are tools that are used a lot. Good contribution, it helps to find bugs in projects.👍🏼

Vendom Take • Mar 18

Great work. You built this solo? The IPC batching and categoryIndex stuff is production-grade. How long did the security hardening pass take you? That HTML injection in IPC channels sounds scary.

Ruslan Murtuzaliyev • Mar 18

Took about 9 days across three PRs and 112 new tests. The IPC injection was the worst, I was passing unsanitized HTML through Electron's IPC channels, so a malicious agent could theoretically inject some content into the renderer. Path traversal was sneaky too, someone could craft a filepath like ../../etc/passwd and bypass the file access rules entirely. Honestly finding my own bugs was more educational than any security knowledge I've taken. If you're building anything with Electron IPC I'd recommend auditing it before someone else does.