Daniel Nwaneri

Posted on Mar 30

Agents Don't Just Do Unauthorized Things. They Cause Humans to Do Unauthorized Things.

#discuss #architecture #webdev #ai

A comment thread shouldn't produce original research. This one did.

Last week I published a piece about agent governance — the gap between what an agent is authorized to do and what it effectively becomes authorized to do through accumulated actions. I used a quant fund analogy: five independent strategies, each within its own risk limits, collectively overweight the same sector. No single decision was wrong. The aggregate outcome was unauthorized.

The comment section built something I hadn't anticipated.

Kalpaka,Vic Chen, Andre Cytryn, Stephen Lee, and Connor Gallic spent three days extending the argument in directions I hadn't gone. What follows is an attempt to assemble what they built — with attribution, because the thread earned it.

The Unit Problem

The first thing Kalpaka named was the hardest: what do you measure?

Actions taken is too noisy. Resources touched is closer but misses compounding. The unit that matters, they argued, is state delta — the diff between what the system could affect before and after a given action sequence.

A vendor approval doesn't just place an order. It creates a supply chain dependency. A database read doesn't just return rows. It establishes an access pattern the agent now relies on.

So the audit object isn't an action log. It's a capability graph — resources and permissions as nodes, actually-exercised access paths as edges. Reconciliation compares the declared graph against the exercised one. The delta is your behavioral exposure.

This is the frame that makes the quant fund analogy precise rather than decorative. A 13F filing is exactly this: a reconciliation between declared strategy and actual exposure, anchored to external cadence rather than internal computation speed. The quarterly filing forces the moment when someone has to look at the full graph, not just the individual positions.

Two Kinds of Edges

The capability graph has a shape problem I hadn't considered.

Direct edges are legible. The agent called the database, touched the file, invoked the API. These edges can be tracked. They decay naturally — an access pattern not exercised in N reconciliation cycles can be pruned. The agent hasn't used that path, so it drops from the exercised graph.

Induced edges are different. These are the edges created when an agent's output causes a human to take an action the agent itself didn't take. The Meta Sev 1 incident last week is the exact pattern: the agent didn't write anything unauthorized. It gave advice that caused an engineer to widen access. The exposure persisted for two hours. The agent's direct action log showed nothing unusual.

Induced edges don't decay the same way direct edges do. The human decision the agent caused doesn't un-happen when the agent stops referencing it. The widened access persists independently of the agent's continued activity. These edges have a much longer half-life — effectively permanent until someone explicitly reconciles them.

This is where the governance architecture splits. Direct edges decay on a usage clock. Induced edges require active reconciliation. The first can be automated. The second can't — which is exactly where the 13F analogy holds strongest. Quarterly isn't a technical choice. It's a regulatory one. Someone external decided that interval was often enough for fund positions. Agent capability graphs need the same external anchor.

The Irreversibility Problem

Andre Cytryn raised threshold drift: you calibrate materiality at deployment time, but the agent's decision space expands with use.

The per-strategy limits in the fund example weren't wrong at design time. They were wrong at failure time because the correlation structure changed. Continuous recalibration is the obvious fix. But continuous recalibration creates its own attack surface — a mechanism the agent can game structurally, not intentionally. Enough accumulated decisions that each looks within tolerance can shift the calibration baseline until the new thresholds permit what the original thresholds wouldn't.

The only way out Kalpaka identified: decouple recalibration authority from the agent's own execution context entirely. Thresholds reviewed by something with no stake in the agent's performance.

This requires knowing which actions are irreversible — but irreversibility is often only visible in retrospect. The Meta incident proves it. Nobody flagged the permission-widening action as irreversible-scope-changing while it was happening. The agent's advice looked like a routine technical suggestion. The irreversibility was only visible after the state had already changed.

Kalpatha's resolution was to invert the default assumption. Don't try to classify irreversibility upfront. Treat every induced edge as irreversible. Over-reconcile first, and let the reconciliation history generate the labeled data you need to build the upstream classifier later.

This is how financial compliance actually bootstrapped. Early fund compliance didn't start with sophisticated materiality filters. They reconciled everything quarterly and learned which position changes were material through decades of accumulated review history. The filters came from the data. The data came from over-reconciling.

We're in the over-reconcile phase. Expensive, noisy, generates false positives. But it's the only way to produce the labeled examples that make the classifier possible later. Trying to solve the detection problem at inception is trying to skip the generation of training data.

The Fatigue Problem

Over-reconciling creates a human cost. Every flagged induced state change needs a review decision. Volume in early cycles will be high by design.

The compliance history the 13F analogy draws on has a second chapter nobody likes to mention: the review process degrades under load. Humans start rubber-stamping. False positives stop being caught. The labeled data gets noisy before the classifier can be trained on it.

Kalpaka's answer was adaptive rather than prescribed: track reviewer consistency over time. Same reviewer, same flag type — does the approval rate shift as volume increases? That's your fatigue signal. Once you can detect it, you don't need a hard cap. You have evidence-based throttling.

The auditor rotation parallel holds here too. Financial auditors rotate precisely because of this problem. Fresh eyes are the simplest fatigue mitigation. The question for agent behavioral review is whether there are enough qualified reviewers to rotate through.

That's harder than it sounds. Financial auditors share a professional vocabulary — they all know what material means, what a restatement implies, what a concentration risk looks like. That shared language is what makes fresh eyes useful. Agent behavioral accounting doesn't have that vocabulary yet. The capability graph framing this thread built is an attempt to construct it. Until reviewers have a shared framework for what "consequential scope change" looks like in practice, rotating fresh eyes doesn't transfer the same way.

The qualified reviewer problem might be the bootstrapping constraint underneath the bootstrapping constraint.

The Enforcement Floor

Connor Gallic brought the piece back to earth.

Theory is one thing. What teams actually ship is another. The simplest version of enforcement is a write checkpoint: every agent action that changes state passes through a policy evaluation before it executes. Not token monitoring, not post-hoc audit. A deterministic gate between intent and action.

The aggregate authorization problem gets simpler when you centralize that policy layer. If every agent's writes flow through the same checkpoint, cross-agent scope becomes visible — because the governance layer has the view that individual agents don't.

The 13F analogy works because quarterly filings are boring, reliable, and externally anchored. Agent governance needs the same properties. The hard part isn't the theory. It's making enforcement boring enough that teams actually ship it.

The write checkpoint wins on that dimension. It's implementable today. It handles direct scope violations cleanly. It's the enforcement floor.

The capability graph is the ceiling. It handles induced state changes, threshold drift, cross-agent composition. It's more expensive, harder to build, requires the vocabulary that doesn't fully exist yet. It's also necessary for complete governance.

The right architecture is probably layered: checkpoint as the floor that catches direct violations cheaply, capability graph as the audit layer that catches induced drift over time. You ship the checkpoint first because it's boring enough to actually build. You develop the capability graph because the checkpoint leaves a class of failures uncovered.

What the Thread Built

None of this was in the original piece.

The two-clock distinction was there. The 13F analogy was there. The governance debt framing was there.

The capability graph, the direct/induced edge split, the over-reconcile bootstrap, the adaptive fatigue ceiling, the layered enforcement architecture — those came from Kalpaka, Andre, Vic, Stephen, and Connor over three days in a comment section.

That's worth naming directly. Not as a courtesy, but because it demonstrates the problem Foundation is designed to solve. This conversation happened in public, in a thread that will eventually scroll off everyone's feed, attributed to usernames rather than preserved as structured knowledge. The architecture it built is worth more than that.

The agent governance problem doesn't have a consensus solution yet. What it has is a comment thread that got further than most papers. The qualified reviewer problem is still open. The capability graph needs implementation. The adaptive ceiling needs tooling.

But the vocabulary exists now. That's what the thread produced. And vocabulary, as someone wiser than me wrote recently, turns vague frustration into specific, solvable problems.

The original piece: Your Agent Is Making Decisions Nobody Authorized

Top comments (8)

Jonathan Murray • Apr 4

The quant fund analogy for emergent authorization is one of the clearest framings of this problem I've seen. The insight that aggregate behavior can be unauthorized without any individual action being unauthorized is exactly the failure mode that makes AI agent governance hard — traditional access control is designed to evaluate individual operations, not cumulative patterns.

The direction this points toward is that authorization needs to be evaluated at the intent and outcome level, not just the action level. An agent that sends five emails, each of which is individually within scope, but collectively constitutes a coordinated campaign that wasn't sanctioned — that's a problem none of the action-level permission systems catch. You need something closer to what financial compliance calls "look-through" analysis: what does the aggregate effect of these decisions actually add up to?

The comment thread that generated original research is itself evidence of the value of the problem framing — good questions produce generative discussion. What was the most unexpected extension the commenters introduced?

Daniel Nwaneri • Apr 5

The most unexpected extension was Kalpaka's inversion on irreversibility — don't try to classify which induced edges are irreversible upfront, treat all of them as irreversible and let the reconciliation history generate the labeled data to build the classifier later.

That reframed the whole governance problem from a detection problem into a data generation problem. The over-reconcile phase is expensive and noisy by design. That's the point.
Your "look-through analysis" framing is the piece that was missing from the article. What I called capability graph reconciliation, you named more precisely — look-through is exactly the right import from financial compliance because it implies a specific analytical posture: you don't stop at the declared position, you follow the exposure through to its actual source. That's what induced edge tracking requires. The agent's action log isn't the position. The human decisions the agent caused are the position. What's your read on where the look-through analogy breaks down — financial positions are discrete and timestamped, induced state changes are neither.

Dariusz Newecki • Mar 30

The induced-edge framing is the sharpest part of this — specifically your point that induced edges don't decay on a usage clock the way direct edges do. We've hit this in practice: CORE can block its own workers from making unauthorized state changes, but when CORE produces a proposal and a human applies it, that's outside the enforcement perimeter entirely. The logs capture what CORE did; they don't capture what the proposal caused.
Your irreversible-by-default bootstrap makes sense for exactly this reason. My question: once you have enough reconciliation history to start classifying materiality — how do you prevent that classifier from becoming gameable? The threshold-drift problem you describe seems like it could re-enter through the classification layer.

Daniel Nwaneri • Mar 30

The CORE example is precise and it exposes the gap I didn't push far enough. You're describing a two-log problem: CORE's action log exists; the proposal's consequence log doesn't. The enforcement perimeter ends where human agency begins, and that's exactly where the induced edge travels.
On classification becoming gameable: the short answer is it probably does, eventually. But the attack surface changes. Gaming a write checkpoint requires exploiting the agent's action space. Gaming a materiality classifier requires exploiting the labeling history — which means you need a much longer horizon to accumulate useful noise. The threat is slower, not absent.
The question I'd push back with: is the classifier's primary value the threshold itself, or the reconciliation record it forces you to build? A gameable threshold that generates honest audit history might still be net-positive over no classifier at all.

Apex Stack • Mar 30

The direct/induced edge distinction is the most important framing I've seen for agent governance. It maps perfectly to what I experience running a fleet of scheduled agents — the agent's action log is clean, but the downstream effects are where the real exposure accumulates.

The over-reconcile bootstrap is also the right call. I run a weekly review agent that reads all other agents' logs and produces a consolidated report. It's exactly the "quarterly filing" pattern you describe — boring, externally anchored, forces someone to look at the full picture. The surprising thing is how often the aggregate view reveals patterns that individual agent logs hide.

Andre's point about resource-level vs agent-level checkpoints resonated hard. When you have 10+ agents writing to the same file system, the composition problem is invisible from any single agent's perspective. The only place it becomes visible is at the resource layer — and most teams don't instrument there.

This thread-to-article format is genuinely useful. More of this, please.

Daniel Nwaneri • Mar 30

The aggregate view revealing what individual logs hide — that's the visibility horizon problem. Each agent operates inside its own clean audit trail. The induced effects live in the space between trails, which is exactly where governance frameworks aren't looking.

What you're describing with the weekly review agent is the right architecture, but it surfaces another gap: the review agent itself has to be trusted to read everything. That's a privileged position. If it's compromised or misconfigured, it doesn't just miss patterns — it becomes the single point where the full picture gets shaped.

How are you handling the trust model for the review agent specifically — separate credentials, read-only scoping, something else?

Andre Cytryn • Mar 30

the direct/induced edge split is the clearest thing I've read on this topic. the decay asymmetry is exactly right and it's what makes most existing audit tooling insufficient -- they're built around the agent's own action log, not the downstream human decisions that action set in motion.

one thing worth adding to the enforcement floor discussion: the write checkpoint only works if the policy layer has visibility across agent boundaries. a single agent's checkpoint sees its own writes, but induced edges often span multiple agents. you need the checkpoint to be positioned at the resource level, not the agent level, or you're back to the same composition problem. not sure many teams are thinking about that distinction when they reach for OPA or similar.

Daniel Nwaneri • Mar 30

Resource-level positioning is the right frame, and it reframes the whole enforcement floor problem. The write checkpoint as I described it is agent-scoped — it sees outbound writes from one boundary. You're right that induced edges are cross-boundary by definition. By the time a proposal reaches a human, it's already left the agent's perimeter. An agent-level checkpoint can't intercept that.

The OPA point lands. Most teams reach for policy engines that reason about "this agent's permissions" rather than "this resource's full write history, regardless of what touched it." Different abstraction entirely. The composition problem doesn't disappear . it just migrates down a layer.
What's your read on teams who try to solve this with event sourcing at the resource layer? Seems like the closest existing pattern but it doesn't natively distinguish induced writes from direct ones.