DEV Community

Cover image for Your Agent Is Making Decisions Nobody Authorized
Daniel Nwaneri
Daniel Nwaneri

Posted on

Your Agent Is Making Decisions Nobody Authorized

A quant fund ran five independent strategies. Every one passed its individual risk limits. Every quarterly filing looked reasonable in isolation. But all five strategies were overweight the same sector.

Aggregate exposure exceeded anything anyone had authorized — because the complexity budget was scoped per-strategy, never cross-strategy. No single decision was wrong. The aggregate outcome was unauthorized. Nobody was watching the right scope.

This is governance debt. It accumulates invisibly. Each decision individually correct. Aggregate outcome unauthorized. The failure only surfaces when the sector moves against the fund and the concentration nobody explicitly built turns out to have been built anyway — one individually-reasonable decision at a time.

The token economy has no accounting entry for this. Nothing on the infrastructure bill reflects what happened. The cost appears later, in a different quarter, a different system, a different team's incident report.


Two Clocks

Every agent system has at least two clocks running simultaneously.

The execution clock measures computation — tokens consumed, API calls made, latency per response. This is what most monitoring systems track. It is visible, quantifiable, and entirely the wrong thing to govern.

The governance clock measures consequences — decisions made, thresholds crossed, exposures accumulated. This clock runs at a fundamentally different speed than execution. It is also running on a fundamentally different metric.

Execution counts tokens. Governance counts consequences.

Most agent architectures try to govern at execution layer granularity. They instrument every API call, set token budgets, alert on cost spikes. The result is a monitoring system that costs more attention than the decisions it is protecting. The governance layer becomes noise. Worse than useless — it actively degrades decision quality by demanding attention on non-material changes.

The fix is not better monitoring. It is a different master clock.


The Master Clock

Vic Chen builds institutional investor analysis tooling around quarterly SEC filing data. The production domain he works in — 13F analysis, the quarterly disclosures hedge funds file showing their equity positions — has solved the governance clock problem in a way most software systems haven't.

The expensive thing in 13F analysis is not parsing the filings. That part is mechanical. Run the filing through the pipeline, extract the positions, compare against the previous quarter. Cheap. Deterministic. Fast.

The expensive thing is deciding which signals actually warrant human attention. Every false positive costs analyst time. Every false negative costs trust in the system. That judgment call does not map cleanly to token costs. You cannot optimize it by switching to a cheaper model or batching the API calls. The bottleneck is not computation. It is consequence.

The master clock in Vic's system is anchored to SEC reporting cadence — filing deadlines, amendment windows, restatement periods. The quarterly 13F disclosure cycle is one of the few externally-anchored materiality signals in finance. It already encodes a judgment: this change was significant enough to report.

When a fund flips a position intra-quarter and back again, that event never surfaces in the filing. This is not a detection failure. The governance architecture is working correctly. An intra-quarter flip that reverses before the filing deadline was not a conviction change. The master clock — anchored to external cadence rather than internal computation — correctly filters it out.

This is filtering by design rather than detection by volume. The governance layer is not trying to catch everything. It is anchored to what the domain has already decided matters.

When the system detects an NT 13F — a late filing notification — or an amendment chain exceeding two revisions, the complexity budget automatically expands. Slower parsing. Deeper cross-referencing. Human-in-the-loop checkpoints. The external signal triggers the governance response because the external signal already encodes the materiality judgment.

The master clock is not a timer. It is a materiality filter.


The Over-Calibration Trap

The first failure mode most governance architectures hit is over-calibration.

Early 13F monitoring systems flagged every rounding discrepancy between a fund's 13F and their 13D as potential drift. The noise-to-signal ratio made the governance layer worse than useless. Analysts were spending attention on non-material changes. The system meant to improve decision quality was degrading it.

Vic's framing of the fix: governance cost should scale with expected loss from undetected drift, not with the volume of changes observed.

A $50M position shift in a mega-cap is noise. A $50M position shift in a micro-cap is a thesis change. Same signal magnitude. Completely different materiality. The governance architecture has to encode that domain knowledge or it cannot distinguish between them.

This is where the "significant drift threshold" becomes practical rather than theoretical. Without it, you are building a governance layer that is perpetually anxious — flagging everything, earning attention for nothing, training operators to ignore alerts. With it, the governance layer fires on what matters and stays quiet on what does not.

The threshold is not a technical parameter. It is a domain judgment encoded into architecture.


Governance Debt

Back to the five-strategy fund.

The token economy version of that failure is identical in structure. An agent that makes two expensive API calls but expands the decision space by introducing correlated hypotheses is under token budget and creating governance debt simultaneously. The cost does not appear on the infrastructure bill. It appears when the downstream decision fails in ways that trace back to the expanded hypothesis space nobody reviewed.

An agent that makes 100 cheap API calls but narrows a decision space from 5,000 options to 3 is adding value regardless of token cost. An agent that makes two expensive calls but introduces unreviewed complexity is accruing debt regardless of token efficiency.

The decision economy is the accounting system that captures the difference.

Three signals that matter in the decision economy and do not appear in token accounting:

Decision scope. How many downstream choices does this agent action constrain or enable? An action that narrows the decision space is value-generating. An action that expands the decision space without corresponding resolution is debt-generating. The token cost of both can be identical.

Consequence materiality. Not all decisions are equal. The governance clock runs faster when the expected loss from undetected drift is higher. A rounding discrepancy in a mega-cap position and a position reversal in a micro-cap both generate the same token cost to detect. Their materiality is orders of magnitude apart.

Authorization scope. Was this decision within the scope of what was explicitly authorized, or did it cross a threshold that required a decision nobody made? Governance debt accumulates at the boundary between authorized and implicitly permitted.

None of these signals are invisible. They require a different accounting system — one that tracks consequences rather than computation, materiality rather than volume, authorization scope rather than token spend.


The Rate Problem

Static agents accumulate governance debt slowly. An agent that makes the same decisions in the same scope every session creates predictable exposure. You can audit it. You can scope the complexity budget correctly once and leave it.

An agent with memory that promotes knowledge automatically accumulates governance debt at the rate the memory compounds. Each session, the promoted knowledge expands the hypothesis space the agent operates in. Each expansion is individually reasonable. The aggregate effect is an authorization scope that widens faster than anyone is reviewing it.

The governance layer has to keep pace with the learning rate or it falls further behind with every session — not linearly, but exponentially.

This is why the evaluator architecture in Foundation anchors promotion criteria to human judgment rather than automating it. Not because automation is wrong in principle. Because the governance debt from unchecked automatic promotion compounds at the same rate as the knowledge base — and the rate is the problem, not any individual promoted insight.

The master clock that works for 13F analysis works for agent memory for the same reason: it is anchored to external materiality judgment rather than internal computation speed. The filing cadence enforces a review interval that the domain has already validated. The human promotion gate enforces a review interval that the knowledge system needs.

Both are the same governance architecture. One is thirty years old. One is being built now.


What the Decision Economy Actually Measures

Token costs are visible on the infrastructure bill. Governance debt is not.

This asymmetry creates predictable incentives. Teams optimize for what they can measure. Token spend gets instrumented, budgeted, alerted. Governance debt accumulates until it manifests as a production failure, a trust breakdown, a concentration exposure nobody authorized across five individually-reasonable decisions.

The token economy prices execution correctly. It is the wrong accounting system for judgment costs because judgment costs do not appear until downstream consequences surface — often in a different quarter, a different system, a different team's incident report.

The precedent that prices judgment correctly is not a token count. It is a record of what decisions were made under what conditions, what thresholds were crossed, what downstream exposure was created, and whether any of it was explicitly authorized.

That record is what makes governance debt visible before it becomes a production problem.

Execution counts tokens. Governance counts consequences.

The decision economy is what you build when you understand the difference.


This is part of a series on what AI actually changes in software development. Previous pieces: The Gatekeeping Panic, The Meter Was Always Running, Who Said What to Whom, The Token Economy, Building the Evaluator, I Shipped Broken Code and Wrote an Article About It.

The governance debt examples and 13F production evidence in this piece draw from analysis by Vic Chen, who builds institutional investor analysis tooling around quarterly SEC filing data. The five-strategy concentration example is his.

Top comments (3)

Collapse
 
kalpaka profile image
Kalpaka

The authorization scope point maps to something underexplored: we authenticate agents once, at session start. API key, role, permissions. But effective authority isn't static. It compounds through decisions exactly like you describe with memory promotion.

What most architectures lack isn't better permissions. It's behavioral accounting — tracking the gap between what the agent was authorized to do and what it's effectively become authorized to do through accumulated actions.

The 13F analogy is sharp here. A quarterly filing forces reconciliation between declared strategy and actual exposure. Most agent systems never have that reconciliation moment. The governance clock runs, nobody reads it, and one day the aggregate scope is something nobody explicitly permitted.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

"Behavioral accounting" is the frame this problem needs and hasn't had.
The reason most architectures stop at permissions is that permissions are legible - you can read them, audit them, revoke them. Effective authorization is harder to make legible because it's emergent. It doesn't live in a config file. It lives in the accumulated delta between what the agent was initialized to do and what it's now capable of doing given everything it's touched.

The 13F analogy breaks down in exactly the instructive place. The filing works because exposure is denominated in dollars. You can compute the gap between declared strategy and actual holdings as a number. Agent behavioral accounting doesn't have that unit yet. What do you denominate effective scope in — actions taken, state changes, resources touched, decisions that were irreversible? The reconciliation moment you're describing requires an audit object, and I don't think we've agreed on what that object is.

That's the gap worth naming. Not just that the governance clock runs without anyone reading it but that even if someone reads it, we haven't decided what it should say.

Collapse
 
klement_gunndu profile image
klement Gunndu

The governance vs execution clock distinction is sharp, but I'd push back slightly — in practice, the harder problem isn't measuring consequences, it's deciding which consequences are material before they compound.