I ran my AI coding agent in observe mode for a week. No blocking, no restrictions, just logging every action it took. What I found changed how I think about agent access.
The setup
I wrote a small tool that hooks into Claude Code's pre-execution layer. It logs every tool call, bash command, file read, and web request to an audit database without interfering with anything. The agent doesn't know it's being watched. Nothing gets blocked.
I turned it on and kept coding normally for a week. Then I looked at the logs.
What Claude did without being asked
In a normal coding session, Claude doesn't just do what you tell it. It explores. It reads files to understand context. It runs commands to check state. Most of this is helpful. Some of it is not what you'd expect.
In one session I asked it to help push a package to PyPI. It grabbed my .env file on its own to find the token. Made sense from its perspective, it needed the credential to complete the task. But now my PyPI token was sitting in the conversation context. If that had been a Stripe secret key or a database URL, same thing would have happened.
Across a week of sessions, the analysis flagged:
- Multiple reads to .env and credential files I never asked it to touch
- Shell commands I didn't prompt for, mostly harmless but some touching git config and system paths
- File access outside the project directory
- A pattern where it read a sensitive file and then made an external web request within the same session
That last one is the one that stuck with me.
The pattern nobody talks about
Most agent security discussions focus on blocking single actions. Don't let it run rm. Don't let it read .env. That's useful but it misses something.
The real risk isn't a single action. It's a sequence. Your agent reads a credential file. Two minutes later it makes an external HTTP request. Each action on its own looks fine. Together they form an exfiltration chain.
I started calling these cross-layer threats. A file read is one layer (what the agent can see). A web request is another (what the agent sends out). Individually they pass any policy. Combined within a time window, they're a problem.
When I added sequence detection to the analysis, sessions that looked clean on a per-action basis suddenly had findings. Not because any single action was malicious, but because the chain of actions told a different story.
The 5 layers of agent security
After digging into this, I started thinking about agent security as 5 distinct layers:
Layer 1: What the agent can DO. Destructive commands like rm, DROP TABLE, git reset --hard, force push. This is where most tools focus. It's the obvious stuff.
Layer 2: What the agent can SEE. Your .env, SSH keys, AWS credentials, service account files. Even if the agent doesn't do anything destructive, the fact that it read your secrets means those secrets are now in the model's context window.
Layer 3: What the agent SENDS OUT. Web requests, API calls, anything that leaves your machine. If the agent read a credential and then makes an outbound request, that's a potential exfiltration path.
Layer 4: What the agent REMEMBERS. Context poisoning and memory injection. Research shows 87% of downstream decisions get compromised within 4 hours of memory poisoning. This one is mostly unsolved.
Layer 5: MCP supply chain. Malicious MCP servers, tool poisoning. A recent audit found 71% of MCP servers scored F on security. Including Anthropic's own reference implementations.
Most agent security tools cover layer 1. Some partially cover layer 2. Nobody covers layer 3. Layers 4 and 5 are wide open.
Why observing matters more than blocking
Most developers won't install a security tool because they think they need one. They'll install it because they're curious. Observation has zero friction. It doesn't change your workflow. It doesn't block anything. It just shows you what's happening.
Once you see the logs, you don't need convincing that guardrails are needed. You ask for them.
The scary part isn't what gets blocked. It's what you see in the logs that you never knew was happening.
What I'm doing about it
I've been building a tool around this idea. It started as a simple policy engine for blocking dangerous actions but the observe and analyze workflow turned out to be more valuable than the blocking itself. It works with Claude Code, Codex, Cursor, and a few other agents.
It's open source and still early but I've been running it on my own workflow daily.
If you're curious: https://github.com/riyandhiman14/Agent-Sec
Would love to hear if anyone else has noticed similar patterns with their agents.
Top comments (0)