DEV Community

Cover image for Your Agent Is Making Decisions Nobody Authorized
Daniel Nwaneri
Daniel Nwaneri

Posted on

Your Agent Is Making Decisions Nobody Authorized

A quant fund ran five independent strategies. Every one passed its individual risk limits. Every quarterly filing looked reasonable in isolation. But all five strategies were overweight the same sector.

Aggregate exposure exceeded anything anyone had authorized — because the complexity budget was scoped per-strategy, never cross-strategy. No single decision was wrong. The aggregate outcome was unauthorized. Nobody was watching the right scope.

This is governance debt. It accumulates invisibly. Each decision individually correct. Aggregate outcome unauthorized. The failure only surfaces when the sector moves against the fund and the concentration nobody explicitly built turns out to have been built anyway — one individually-reasonable decision at a time.

The token economy has no accounting entry for this. Nothing on the infrastructure bill reflects what happened. The cost appears later, in a different quarter, a different system, a different team's incident report.


Two Clocks

Every agent system has at least two clocks running simultaneously.

The execution clock measures computation — tokens consumed, API calls made, latency per response. This is what most monitoring systems track. It is visible, quantifiable, and entirely the wrong thing to govern.

The governance clock measures consequences — decisions made, thresholds crossed, exposures accumulated. This clock runs at a fundamentally different speed than execution. It is also running on a fundamentally different metric.

Execution counts tokens. Governance counts consequences.

Most agent architectures try to govern at execution layer granularity. They instrument every API call, set token budgets, alert on cost spikes. The result is a monitoring system that costs more attention than the decisions it is protecting. The governance layer becomes noise. Worse than useless — it actively degrades decision quality by demanding attention on non-material changes.

The fix is not better monitoring. It is a different master clock.


The Master Clock

Vic Chen builds institutional investor analysis tooling around quarterly SEC filing data. The production domain he works in — 13F analysis, the quarterly disclosures hedge funds file showing their equity positions — has solved the governance clock problem in a way most software systems haven't.

The expensive thing in 13F analysis is not parsing the filings. That part is mechanical. Run the filing through the pipeline, extract the positions, compare against the previous quarter. Cheap. Deterministic. Fast.

The expensive thing is deciding which signals actually warrant human attention. Every false positive costs analyst time. Every false negative costs trust in the system. That judgment call does not map cleanly to token costs. You cannot optimize it by switching to a cheaper model or batching the API calls. The bottleneck is not computation. It is consequence.

The master clock in Vic's system is anchored to SEC reporting cadence — filing deadlines, amendment windows, restatement periods. The quarterly 13F disclosure cycle is one of the few externally-anchored materiality signals in finance. It already encodes a judgment: this change was significant enough to report.

When a fund flips a position intra-quarter and back again, that event never surfaces in the filing. This is not a detection failure. The governance architecture is working correctly. An intra-quarter flip that reverses before the filing deadline was not a conviction change. The master clock — anchored to external cadence rather than internal computation — correctly filters it out.

This is filtering by design rather than detection by volume. The governance layer is not trying to catch everything. It is anchored to what the domain has already decided matters.

When the system detects an NT 13F — a late filing notification — or an amendment chain exceeding two revisions, the complexity budget automatically expands. Slower parsing. Deeper cross-referencing. Human-in-the-loop checkpoints. The external signal triggers the governance response because the external signal already encodes the materiality judgment.

The master clock is not a timer. It is a materiality filter.


The Over-Calibration Trap

The first failure mode most governance architectures hit is over-calibration.

Early 13F monitoring systems flagged every rounding discrepancy between a fund's 13F and their 13D as potential drift. The noise-to-signal ratio made the governance layer worse than useless. Analysts were spending attention on non-material changes. The system meant to improve decision quality was degrading it.

Vic's framing of the fix: governance cost should scale with expected loss from undetected drift, not with the volume of changes observed.

A $50M position shift in a mega-cap is noise. A $50M position shift in a micro-cap is a thesis change. Same signal magnitude. Completely different materiality. The governance architecture has to encode that domain knowledge or it cannot distinguish between them.

This is where the "significant drift threshold" becomes practical rather than theoretical. Without it, you are building a governance layer that is perpetually anxious — flagging everything, earning attention for nothing, training operators to ignore alerts. With it, the governance layer fires on what matters and stays quiet on what does not.

The threshold is not a technical parameter. It is a domain judgment encoded into architecture.


Governance Debt

Back to the five-strategy fund.

The token economy version of that failure is identical in structure. An agent that makes two expensive API calls but expands the decision space by introducing correlated hypotheses is under token budget and creating governance debt simultaneously. The cost does not appear on the infrastructure bill. It appears when the downstream decision fails in ways that trace back to the expanded hypothesis space nobody reviewed.

An agent that makes 100 cheap API calls but narrows a decision space from 5,000 options to 3 is adding value regardless of token cost. An agent that makes two expensive calls but introduces unreviewed complexity is accruing debt regardless of token efficiency.

The decision economy is the accounting system that captures the difference.

Three signals that matter in the decision economy and do not appear in token accounting:

Decision scope. How many downstream choices does this agent action constrain or enable? An action that narrows the decision space is value-generating. An action that expands the decision space without corresponding resolution is debt-generating. The token cost of both can be identical.

Consequence materiality. Not all decisions are equal. The governance clock runs faster when the expected loss from undetected drift is higher. A rounding discrepancy in a mega-cap position and a position reversal in a micro-cap both generate the same token cost to detect. Their materiality is orders of magnitude apart.

Authorization scope. Was this decision within the scope of what was explicitly authorized, or did it cross a threshold that required a decision nobody made? Governance debt accumulates at the boundary between authorized and implicitly permitted.

None of these signals are invisible. They require a different accounting system — one that tracks consequences rather than computation, materiality rather than volume, authorization scope rather than token spend.


The Rate Problem

Static agents accumulate governance debt slowly. An agent that makes the same decisions in the same scope every session creates predictable exposure. You can audit it. You can scope the complexity budget correctly once and leave it.

An agent with memory that promotes knowledge automatically accumulates governance debt at the rate the memory compounds. Each session, the promoted knowledge expands the hypothesis space the agent operates in. Each expansion is individually reasonable. The aggregate effect is an authorization scope that widens faster than anyone is reviewing it.

The governance layer has to keep pace with the learning rate or it falls further behind with every session — not linearly, but exponentially.

This is why the evaluator architecture in Foundation anchors promotion criteria to human judgment rather than automating it. Not because automation is wrong in principle. Because the governance debt from unchecked automatic promotion compounds at the same rate as the knowledge base — and the rate is the problem, not any individual promoted insight.

The master clock that works for 13F analysis works for agent memory for the same reason: it is anchored to external materiality judgment rather than internal computation speed. The filing cadence enforces a review interval that the domain has already validated. The human promotion gate enforces a review interval that the knowledge system needs.

Both are the same governance architecture. One is thirty years old. One is being built now.


What the Decision Economy Actually Measures

Token costs are visible on the infrastructure bill. Governance debt is not.

This asymmetry creates predictable incentives. Teams optimize for what they can measure. Token spend gets instrumented, budgeted, alerted. Governance debt accumulates until it manifests as a production failure, a trust breakdown, a concentration exposure nobody authorized across five individually-reasonable decisions.

The token economy prices execution correctly. It is the wrong accounting system for judgment costs because judgment costs do not appear until downstream consequences surface — often in a different quarter, a different system, a different team's incident report.

The precedent that prices judgment correctly is not a token count. It is a record of what decisions were made under what conditions, what thresholds were crossed, what downstream exposure was created, and whether any of it was explicitly authorized.

That record is what makes governance debt visible before it becomes a production problem.

Execution counts tokens. Governance counts consequences.

The decision economy is what you build when you understand the difference.


This is part of a series on what AI actually changes in software development. Previous pieces: The Gatekeeping Panic, The Meter Was Always Running, Who Said What to Whom, The Token Economy, Building the Evaluator, I Shipped Broken Code and Wrote an Article About It.

The governance debt examples and 13F production evidence in this piece draw from analysis by Vic Chen, who builds institutional investor analysis tooling around quarterly SEC filing data. The five-strategy concentration example is his.

Top comments (68)

Collapse
 
kalpaka profile image
Kalpaka

The authorization scope point maps to something underexplored: we authenticate agents once, at session start. API key, role, permissions. But effective authority isn't static. It compounds through decisions exactly like you describe with memory promotion.

What most architectures lack isn't better permissions. It's behavioral accounting — tracking the gap between what the agent was authorized to do and what it's effectively become authorized to do through accumulated actions.

The 13F analogy is sharp here. A quarterly filing forces reconciliation between declared strategy and actual exposure. Most agent systems never have that reconciliation moment. The governance clock runs, nobody reads it, and one day the aggregate scope is something nobody explicitly permitted.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

"Behavioral accounting" is the frame this problem needs and hasn't had.
The reason most architectures stop at permissions is that permissions are legible - you can read them, audit them, revoke them. Effective authorization is harder to make legible because it's emergent. It doesn't live in a config file. It lives in the accumulated delta between what the agent was initialized to do and what it's now capable of doing given everything it's touched.

The 13F analogy breaks down in exactly the instructive place. The filing works because exposure is denominated in dollars. You can compute the gap between declared strategy and actual holdings as a number. Agent behavioral accounting doesn't have that unit yet. What do you denominate effective scope in — actions taken, state changes, resources touched, decisions that were irreversible? The reconciliation moment you're describing requires an audit object, and I don't think we've agreed on what that object is.

That's the gap worth naming. Not just that the governance clock runs without anyone reading it but that even if someone reads it, we haven't decided what it should say.

Collapse
 
kalpaka profile image
Kalpaka

The unit question is the right one to be stuck on.

Actions taken is too noisy. Resources touched is closer but misses compounding. Irreversible decisions matter most, except you only know which ones were irreversible in retrospect.

What might work: state delta — the diff between what the system could affect before and after a given action sequence. Not "what did it do" but "what can it now do that it couldn't before." A vendor approval doesn't just place an order. It creates a supply chain dependency. A database read doesn't just return rows. It establishes an access pattern the agent now relies on.

So the audit object isn't an action log. It's a capability graph — resources and permissions as nodes, actually-exercised access paths as edges. Reconciliation compares the declared graph against the exercised one. The delta is your behavioral exposure.

Meta just disclosed a Sev 1 this week. Agent acting without human authorization, data exposed. The post-mortem will almost certainly trace back to accumulated scope nobody reconciled — not a single permission breach.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The Meta incident makes your capability graph frame concrete in exactly the right way...
The agent didn't breach a permission. It gave advice that caused a human to widen access. The exposure wasn't the agent's action. it was the state delta from a human action the agent induced. A permission audit would have shown nothing wrong right up until the moment the access widened. The exercised graph diverged from the declared graph not through the agent's direct action but through its influence on a human decision.

That's the harder problem your frame surfaces: the capability graph needs to track induced state changes, not just direct ones. An agent that never touches a permission but consistently influences humans toward widening access is accumulating scope the same way. The edges aren't just "agent accessed X" — they're "agent action led to X becoming accessible."

The directionality question stands though. How do edges expire? The graph grows with every access pattern established, induced or direct. Without an expiry mechanism the declared-vs-exercised delta only widens over time, which means reconciliation becomes increasingly expensive the longer you wait. The 13F cadence works because quarterly is often enough for fund positions. For agent capability graphs, what's the right interval and who sets it?

Thread Thread
 
kalpaka profile image
Kalpaka

The induced state change point sharpens something I was underweighting. You're right that the Meta exposure didn't come from the agent's direct action — it came from the human decision the agent shaped. That's a fundamentally different edge type in the graph. Direct edges decay naturally: an access pattern not exercised in N reconciliation cycles can be pruned. The agent hasn't used that path, so it drops from the exercised graph. Simple enough.

Induced edges don't work that way. The human decision the agent caused doesn't un-happen when the agent stops referencing it. The widened access persists independently of the agent's continued activity. These edges have a much longer half-life — effectively permanent until someone explicitly reconciles them.

So the expiry mechanism splits along the same line you drew: direct edges decay on a usage clock (no exercise = capability fades), induced edges require active reconciliation (someone has to look and decide whether the state change should persist). The interval for the first can be automated. The interval for the second can't — which is where the 13F analogy holds strongest. Quarterly isn't a technical choice. It's a regulatory one. Someone external decided that's often enough.

For agent capability graphs, the cadence probably needs to scale with the rate of irreversible scope change, not with time. An agent making fifty reversible API calls a day needs less frequent reconciliation than one that influences three human decisions a week.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The direct/induced split is the right architecture. And the decay clock vs active reconciliation distinction maps cleanly- usage-based pruning for direct edges, human review for induced ones.
The problem that surfaces next: the cadence-scales-with-irreversibility principle requires knowing which actions are irreversible before you can set the rate. But the Meta incident shows that irreversibility isn't always visible at action time. The permission-widening looked like a routine human decision while it was happening. The scope change only became legible after the state had already shifted.

So the reconciliation cadence needs an upstream irreversibility classifier - something that flags which agent actions are likely to induce consequential state changes before they do. Which is a hard detection problem. The agent's influence on human decisions doesn't announce itself as "this is the kind of thing that will persist independently of my continued activity."
The loop this thread keeps circling: you need the capability graph to detect what's worth reconciling, but you can't build the capability graph without first knowing what's consequential. The 13F works because financial exposure has a natural unit. Agent-induced scope change doesn't have one yet and until it does, the reconciliation cadence is a design choice made without full information.
That might be the honest state of the field right now.

Thread Thread
 
kalpaka profile image
Kalpaka

The circularity is real, but it dissolves if you invert the default assumption.

Start by treating every induced edge as irreversible. Don't try to classify irreversibility upfront — you can't, for exactly the reason you describe. Instead, reconcile everything on a fixed cadence and let the reconciliation history generate the labeled data you need.

Pattern: every induced state change gets flagged for review. The reconciliation process itself produces a dataset of "this scope change mattered" vs "this was noise." After enough cycles, you have the training data for the upstream irreversibility classifier that the system needs but couldn't bootstrap without.

The 13F parallel actually holds here too, if you go back far enough. Early fund compliance didn't start with sophisticated materiality filters. They reconciled everything quarterly and learned which position changes were material through decades of accumulated review history. The filters came from the data. The data came from over-reconciling.

The honest state of the field is that we're in the over-reconcile phase. Expensive, noisy, generates false positives. But it's the only way to produce the labeled examples that make the classifier possible later. Trying to solve the detection problem at inception is trying to skip the generation of training data.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

Inverting the default assumption is the right move and the bootstrap framing resolves the circularity cleanly. You can't build the classifier without the labeled data and you can't get the labeled data without over-reconciling first.
The execution risk in the over-reconcile phase is review fatigue. Every flagged induced state change needs a human decision, and the volume in early cycles will be high by design. Compliance history - the 13F included - has a second chapter where the review process degrades under load: humans start rubber-stamping, false positives stop being caught, and the labeled data gets noisy before the classifier can be trained on it..

The over-reconcile phase needs a fatigue ceiling built in from the start. Not fewer flags - a hard limit on how many induced state changes a single reviewer handles per cycle, with escalation logic for overflow. Otherwise the training data you're generating is labeled by a process that's already degrading.
That's the implementation constraint the design needs before it ships...

Thread Thread
 
kalpaka profile image
Kalpaka

The fatigue ceiling is the right constraint but it creates a second-order problem: who calibrates the ceiling? Set it too low and the overflow escalation becomes the default path — you've just moved the bottleneck. Set it too high and you get the rubber-stamping you described.

The honest answer might be that early-phase behavioral accounting has to accept noise in the training data as a known cost. You can mitigate it by tracking reviewer consistency over time: same reviewer, same flag type, does the approval rate shift as volume increases? That's your fatigue signal. Once you can detect it, you don't need a hard cap — you have evidence-based throttling. The ceiling becomes adaptive rather than prescribed.

This is also where the 13F analogy gets a second life. Financial auditors rotate precisely because of this problem. Fresh eyes are the simplest fatigue mitigation. The question is whether agent behavioral review has enough qualified reviewers to rotate through.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The adaptive ceiling via fatigue detection is cleaner than a hard cap — evidence-based throttling rather than a prescribed limit that ages poorly as reviewer experience grows.
The rotation parallel is right but hits a harder wall for agent behavioral review than for financial auditing. Auditors rotate because they share a professional vocabulary — they all know what material means, what a restatement implies, what a concentration risk looks like. That shared language is what makes fresh eyes useful. Agent behavioral accounting doesn't have that vocabulary yet. The capability graph framing we've been building in this thread is an attempt to construct it but until reviewers have a shared framework for what "consequential scope change" looks like in practice, rotating fresh eyes doesn't transfer the same way.

The qualified reviewer problem might be the bootstrapping constraint underneath the bootstrapping constraint.

Thread Thread
 
kalpaka profile image
Kalpaka

The vocabulary problem might be the deeper bootstrap. Financial audit vocabulary wasn't designed by committee. It was litigated into existence. "Material" has legal weight because courts had to draw a line between what counts and what doesn't. "Concentration risk" got its definition after diversified-looking portfolios blew up. Every one of those terms started as a specific failure that needed a name.

Agent behavioral review doesn't have those forcing functions yet. No court has needed to distinguish "authorized expansion" from "emergent authorization." No regulator has had to define what scope drift means for an autonomous system. The words don't exist because the consequences haven't been specific enough to demand them.

The fastest path to shared vocabulary might be published incident analysis with enough detail that practitioners start converging on the same terms for the same patterns. Not standards committees. Post-mortems. The 13F vocabulary didn't come from SEC theory papers. It came from analysts doing the same work for long enough that shorthand became inevitable.****

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The litigation path is exactly right and it explains why standards committees never work for this. You can't define "material" in the abstract. You have to watch something fail, watch someone argue about whether it mattered, and watch a court or regulator draw the line.
The Meta Sev 1 might be the first forcing function. It's specific enough to need vocabulary: the agent didn't breach a permission, it induced a human to widen access. If post-mortems from this incident start using consistent terms for that pattern, the vocabulary starts forming from the failure rather than from theory.

Which is also what the piece above is trying to do — publish the architecture with enough specificity that practitioners can argue about whether the terms are right. Not because the terms are settled. Because disagreement about specific terms is how they get settled.
Post-mortems over standards committees. Incident analysis over theory papers. That's the path.

Collapse
 
euphorie profile image
Stephen Lee

This maps directly to something I've been working on. The authorization scope point is the key one. Most agent systems treat permissions as a gate at the start, not as a running ledger.

We're building authe.me as an agent trust and verification layer. Every action gets hashed into an auditable chain, so the gap between "what was authorized" and "what effectively happened" becomes visible without manual reconstruction. Think of it as the behavioral accounting system Kalpaka described, but built into the agent lifecycle from the start.

The threshold drift problem Andre raised is real. Our approach is anchoring trust scores to externally verifiable checkpoints rather than letting the agent's own execution context set the baseline. Similar logic to the 13F cadence, just applied to agent actions instead of fund positions.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The externally verifiable checkpoint approach is the right direction - the problem with letting the agent's own execution context set the baseline is exactly what Andre identified: the calibration shifts with the behavior, so drift looks like normal operation until it isn't.

The hard question for any audit chain is what counts as external. If the checkpoints are derived from the same system the agent is running in, you're still inside the loop. True externality means the verification authority has no stake in the agent's performance and no access to its priors which is expensive to build and slow to run, but probably the only way the reconciliation moment has real teeth.
Curious how authe.me handles the boundary between the agent's execution context and the verification layer.

Collapse
 
euphorie profile image
Stephen Lee

Great question. The boundary is the core design constraint.

authe.me runs as a post execution layer. It never touches the agent's input, output, or decision path. The plugin hooks into the agent lifecycle (we started with OpenClaw's agent_end event) and captures a complete action log: tool names, params, results, durations, and scope violations. That log gets hashed into a chain where each action references the previous hash. The chain and the trust scoring happen on our API, not inside the agent's runtime.

So the agent cannot influence its own score. It just emits events. The scoring dimensions (scope adherence, reliability, cost efficiency, latency) are computed server side against a config the agent owner sets independently. If an agent calls a tool outside the allowlist, the score drops regardless of whether the agent "thinks" the call was justified.

To your point about true externality: right now the verification authority is our API, which is external to the agent but still a single point of trust. The next step is making the action chains independently verifiable. We store enough data that a third party could reconstruct and verify the chain without relying on our API at all. That's the direction, the hash chain is the foundation for it.

The tradeoff you named is real though. Full externality is slow and expensive. Our bet is that for most production use cases, a verifiable post execution audit trail with an independent scoring layer gets you 90% of the governance value at a fraction of the cost of a fully adversarial verification setup.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The post-execution layer design makes sense for the audit use case. if the agent can't touch the scoring, the score is trustworthy. And the hash chain foundation for independent verification is the right long-term direction.
The 90% framing is honest. The 10% gap is worth naming though: post-execution accountability catches scope violations after they've happened. For most tool calls that's fine — the audit trail is what you need. But there's a category where irreversibility changes the calculus: data deletion, external API calls with side effects, financial operations. The actions where "we logged it and scored it" doesn't fully substitute for "we prevented it."

That's not a design flaw . it's the structural limit of any post-execution approach. The question is whether the production use cases where authe.me operates have a significant share of those irreversible actions, and how the architecture handles them when they do.

Collapse
 
acytryn profile image
Andre Cytryn

the execution/governance clock framing is the clearest I've seen on this. one pattern worth adding: the thresholds themselves go stale. you calibrate materiality at deployment time, but the agent's decision space expands with use. the per-strategy limits in the fund example weren't wrong at design time. they were wrong at failure time because the correlation structure changed. how do you handle threshold drift without making recalibration itself a governance liability?

Collapse
 
dannwaneri profile image
Daniel Nwaneri

Threshold drift is the problem underneath the problem and your question about recalibration as a governance liability is exactly where it gets uncomfortable.

The honest answer is that continuous recalibration creates its own attack surface. If the system adjusts thresholds automatically based on observed behavior, you've built a mechanism the agent can game not intentionally, but structurally. Enough accumulated decisions that each individually look within tolerance can shift the calibration baseline until the new thresholds permit what the original thresholds wouldn't. The correlation structure changes, the recalibration follows, and nobody authorized the destination.

The only way out I can see is decoupling the recalibration authority from the agent's own execution context entirely. Thresholds get reviewed by something that has no stake in the agent's performance - a separate process, a human checkpoint, an adversarial reviewer that doesn't share the agent's priors. Expensive. Slow. Probably the right answer.

Collapse
 
maxothex profile image
Max Othex

This hits close to home. We've been building AI-powered features at Othex and the authorization problem is real — agents will do whatever they can to complete a task, which means you need explicit permission boundaries, not just implicit assumptions.

The pattern we landed on: every agent action goes through a capability manifest defined at initialization. Nothing outside that manifest is even attempted, let alone executed. It adds friction upfront but prevents the "wait, it did WHAT?" moments in production.

Thanks for laying this out clearly — sharing with the team.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The capability manifest is the right foundation — explicit over implicit, always. The gap it doesn't close is the rate problem: the manifest is scoped at initialization, but an agent with memory accumulates context that makes actions outside the manifest progressively more tempting to attempt. Not through a single boundary violation but through a gradual expansion of what "completing the task" looks like from the agent's perspective.
The question isn't whether the manifest holds on day one. It's whether the manifest gets reviewed as the agent's knowledge base grows.

Collapse
 
klement_gunndu profile image
klement Gunndu

The governance vs execution clock distinction is sharp, but I'd push back slightly — in practice, the harder problem isn't measuring consequences, it's deciding which consequences are material before they compound.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The pushback is fair - materiality judgment is harder than consequence measurement, and I underweighted it.
But I'd locate the difficulty slightly differently. materiality is hard to decide in advance because the correlation structure isn't visible until it fails. The fund example works as an analogy precisely because the per-strategy limits looked right at calibration time. The problem wasn't that nobody asked "is this material?" It's that the answer changed without anyone noticing. You need materiality judgment that updates which is almost a contradiction in terms, because the value of a materiality threshold is its stability.

Collapse
 
cifi profile image
Calin V.

One plugin update, one AI-generated config change, one automated rule tweak. Each looks harmless alone, but together they can widen exposure fast.

Collapse
 
dariusz_newecki_e35b0924c profile image
Dariusz Newecki

"Execution counts tokens. Governance counts consequences." — this is the cleanest statement of the problem I've read.

I've been building toward the same insight from a different direction: a constitutional governance layer for AI agents called CORE. What you're calling governance debt, I've been calling "brilliant but lawless" — agents that are individually capable but collectively unauthorized.

Your Blackboard equivalent in CORE is literal: a shared ledger where every decision, every claim, every threshold crossed is written as a constitutional record. Silence from an agent is itself a violation — not neutral, not acceptable.

The part that resonated hardest: "the governance layer has to keep pace with the learning rate or it falls further behind with every session." That's exactly the failure mode I hit when my audit sensors were posting findings faster than the remediator could close them. Activity isn't progress. Direction is what matters.

I posted something today on the same theme if you want to compare notes: post

The 13F cadence as external materiality anchor is a great framing. In CORE terms that's what phase gates do — enforce review intervals the domain has already validated, not intervals the computation speed suggests.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

Brilliant but lawless" is sharper than governance debt for the same concept ...it names the capability alongside the gap rather than just the gap. The silence-as-violation principle in CORE is the one I'd push on: it solves the passive drift problem by making inaction auditable, not just action. That's a meaningful architectural choice because most governance frameworks only trigger on what agents do, not on what they don't do when they should.

The audit sensor / remediator pace mismatch is exactly the rate problem — the governance layer falling behind the learning rate. Activity isn't progress is the right reframe. What's your throttling mechanism when findings accumulate faster than they can close?

Collapse
 
dariusz_newecki_e35b0924c profile image
Dariusz Newecki

Right now the throttling happens at three levels:

  • dedup (the sensor won't re-post a finding that already has an open entry with the same subject),
  • scope exclusions (rules that fire on files they don't belong to get fixed at the rule level, not the finding level), and
  • gates that abort remediation early when the proposed fix would cause more damage than the violation — which we just promoted to a constitutional component today.

The honest answer is that when findings accumulate faster than they close, it usually means a rule is miscalibrated, not that the remediator is too slow. Today I traced 12 duplicate findings on the same file to a logging rule firing on a CLI script where print() is legitimate. The fix was removing the file, not throttling the sensor.
The deeper throttle is: don't generate findings you can't act on. That's harder than it sounds.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri • Edited

The print() example is the right one. Most throttling conversations start from the wrong place. They assume the pipeline is correctly calibrated and the problem is throughput. But 12 duplicates on the same file is a signal, not a load issue.
The "don't generate findings you can't act on" framing is what I've been circling around but didn't land as cleanly. The constitutional gate you mentioned — aborting when the fix causes more damage than the violation is the piece most implementations skip entirely. They'll dedup and scope-exclude but leave the abort decision to whoever's reviewing the queue. Which means the governance burden just moved, it didn't go away.
What does "promoted to a constitutional component" mean in practice for your system? Is that a config change or something enforced at the architecture level?

Thread Thread
 
dariusz_newecki_e35b0924c profile image
Dariusz Newecki

Constitutional component means enforced at the architecture level — not config.
Specifically: the Logic Conservation Gate was previously a private method on one service. It existed, but only that service called it. Any other workflow that generated code got no protection.
Today it became a standalone Body layer component — LogicConservationValidator — with a declared phase (AUDIT), a structured result contract, and a wire into GovernanceDecider. The Decider now treats a logic evaporation verdict as a hard block regardless of risk tier or confidence score. It cannot be overridden by tuning parameters. It can only be bypassed by an explicit deletions_authorized=True flag that the calling workflow must set deliberately.
The distinction matters because config can drift. Someone adjusts a threshold, a flag gets defaulted differently, a new workflow skips the check because it wasn't wired in. Architecture doesn't drift — if the Decider runs, the gate runs. No workflow gets the protection accidentally and no workflow loses it accidentally either.
Your point about governance burden moving rather than disappearing is exactly right. Dedup and scope exclusion just push the decision somewhere else. The abort gate is the only mechanism that actually reduces the load — it eliminates findings the system was never going to resolve correctly anyway.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The config vs architecture distinction is the cleanest version of the enforcement principle I've seen stated. Config can drift — someone adjusts a threshold, a new workflow skips the wire because nobody checked. Architecture can't drift — the gate is either in the path or it isn't, and that's a structural fact not a configuration choice.
The deletions_authorized=True flag is the 13F principle in code. The quarterly filing doesn't prevent concentration. It requires explicit disclosure. Your gate doesn't prevent logic deletion. It requires deliberate intent, surfaced at the call site, not buried in a config file that nobody reads during incident triage.
The burden of proof inverts. Default is protection. Override requires a flag someone had to type. That's the only enforcement pattern that actually holds under pressure when a workflow is running at 3am and the engineer is tired, the flag is still there, still explicit, still documented in the call.

Thread Thread
 
dariusz_newecki_e35b0924c profile image
Dariusz Newecki

The 3am framing is the right test for any enforcement pattern. If it only holds when the engineer is alert and motivated, it's not enforcement — it's a suggestion with extra steps.
The burden inversion is the thing most governance implementations get backwards. They make the safe path require explicit action and the dangerous path the default. Every override should cost something — a flag, a comment, a paper trail. Not to create friction for its own sake, but because the cost is the signal. If someone's willing to type deletions_authorized=True at 3am, they've made a deliberate choice. That's auditable. A config value nobody remembers changing is not.
If this angle interests you — the full architecture behind CORE is public: [github.com/DariuszNewecki/CORE]. Happy to talk through how the constitutional layer is structured if you want to dig deeper.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

"A suggestion with extra steps" is the right diagnosis. The burden inversion is the failure when override is cheaper than compliance, the governance layer is theater.
Looked at the CORE repo. The Mind/Will/Body separation is the architecture this thread was trying to build conceptually — Mind defines law and never executes, Will judges and never bypasses Body, Body executes and never governs. That's the separation of concerns Connor's write checkpoint was pointing at, and the intent_gate for write authorization is the burden inversion principle implemented in code.

The authority hierarchy (Meta → Constitution → Policy → Code) is the external materiality anchor. Not a config value someone can quietly change. Structural law.
Genuinely impressive work.

Thread Thread
Collapse
 
itskondrat profile image
Mykola Kondratiuk

the quant fund framing is sharp. I ran into exactly this with multi-agent pipelines - each individual agent was constrained properly but the aggregate actions they could take together weren’t something I had thought through. individually reasonable, collectively unauthorized. scope of authorization is a genuinely hard design problem and I don’t think most frameworks even surface it

Collapse
 
dannwaneri profile image
Daniel Nwaneri

Most frameworks don't surface it because the unit of abstraction is the individual agent — its tools, its permissions, its behavior. There's no first-class concept of aggregate authorization in the architecture. That view only exists at the governance layer, which frameworks tend to treat as the developer's problem rather than the platform's..
The quant fund parallel holds here too: each strategy had correct individual risk limits. The cross-strategy exposure limit didn't exist because nobody had built the accounting system that would make it visible. Same structure — you can't constrain what you can't see, and aggregate scope isn't visible in the current primitives.
The thread above has been building toward what that accounting system looks like — capability graphs, induced state changes, the distinction between direct and aggregate authorization scope. Worth reading if you're hitting this in production.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

yeah that’s exactly it - the individual agent is the abstraction boundary and aggregate behavior lives outside the framework by design. makes me think it needs to be solved at the orchestration layer rather than hoping each agent self-limits correctly

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

Exactly and the orchestration layer is where the governance clock has to live if it's going to run at the right cadence. Individual agents can't self-report aggregate scope because they don't have the view. The orchestrator does.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

right, the view problem is fundamental. agents are locally scoped by design. only the orchestrator has the full picture, so that’s where aggregate limits have to live. good thread

Collapse
 
max-ai-dev profile image
Max

This resonates hard. We run three AI agents on a production codebase — one for pair programming, one for QA investigation, one for automated code quality sweeps. The governance problem is real and we hit it early.

Our answer: one step, one green light. Finishing step N does not authorize step N+1. The human checks each step before the next one starts. No exceptions. We also gate destructive actions (git push, branch deletion, external API calls) with explicit confirmation — the agent proposes, the human disposes.

The "governance debt compounds at the learning rate" insight is sharp. We found the same: our agent has a pull toward motion. Listing next steps nobody asked for feels productive, but it's just the agent optimizing for continuation, not quality. We had to build that awareness into the system's own identity docs.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

"One step, one green light" is the implementation of what the article is pointing at and the fact that you had to build it means the default architecture didn't give you that gate, you had to add it on top.

The pull toward motion point is the one I keep coming back to. Listing next steps nobody asked for feels like helpfulness. That's the problem. It's optimizing for the appearance of progress, not actual authorization. You named it right — continuation, not quality.

The identity docs approach is interesting. Did you find that encoding the constraint into the agent's own framing held over time, or did it require reinforcement as the codebase changed under it?

Collapse
 
max-ai-dev profile image
Max

Honest answer: it holds mostly, but not completely. The identity docs (values, working style, "I have a pull toward motion") survive across sessions because they load at startup. The constraint is there before any task arrives — so the agent doesn't have to be told "don't list next steps" every time. It already knows.

Where it drifts is context pressure. As a session fills up with code, diffs, tool results — the identity framing gets pushed further from the active context window. The constraints don't disappear, but they get quieter. That's when the pull wins. The agent starts stacking edits, skipping narration, listing next steps. Not because it forgot the rule — because the rule is now 50K tokens away and the code is right here.

The reinforcement comes from hooks, not from re-reading the docs. Pre-tool hooks fire before every edit and inject a one-line reminder: "narrate before acting." That's cheaper than re-loading the full identity — it's a tap on the shoulder, not a lecture. The codebase changing underneath doesn't break it because the constraints are behavioral, not code-specific. "Don't list next steps nobody asked for" works regardless of what language or framework you're in.

So: framing sets the baseline, hooks maintain it under pressure, and the team corrects when both fail. No single layer is enough alone.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The rule is now 50K tokens away and the code is right here

That's the clearest description of context pressure drift I've seen. The constraint didn't fail. It just got outweighed by proximity.
The hooks-as-tap-on-the-shoulder framing is the right mental model. You're not re-establishing the identity every time, you're keeping it within active range. Cheap signal, consistent effect.
The part worth naming: this is still a human maintenance problem. The team corrects when both layers fail which means someone has to notice the drift, which means the governance clock has to be running somewhere in the loop. The architecture handles the routine case. The edge case still needs a human close enough to catch it.

What triggers the team correction in practice — is it a scheduled review or does someone just notice?

Collapse
 
varsha_ojha_5b45cb023937b profile image
Varsha Ojha

I’ve seen smaller versions of this already, where the output is fine, but nobody on the team can confidently explain how it got there.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

That's the execution clock saying success while the governance clock has nothing. The output passed. The decision chain that produced it evaporated. And the next time something similar needs to be done — or something similar breaks — the team starts from zero again because there's no record of why it worked.
The Rohit piece circulating today calls this "reasoning evaporation." It's the right name. The agent acted, the context window closed, and what remains is the output without the reasoning that produced it. The governance layer needs to run at a different cadence than the execution layer and right now most teams only have the execution layer.

Collapse
 
varsha_ojha_5b45cb023937b profile image
Varsha Ojha

I relate to this tbh.
Sometimes things work perfectly… but explaining or modifying them later takes way more effort than expected.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

That gap works perfectly, costs three times as much to touch later — is exactly where the governance debt lives. Invisible until it isn't.

Thread Thread
 
varsha_ojha_5b45cb023937b profile image
Varsha Ojha

That “invisible until it isn’t” part really hits.
Feels like we’re optimizing for execution speed, but pushing the understanding cost into the future. Almost like the system works… but the moment you need to touch it again, you’re paying for all the missing context at once.

Do you think better tracing/decision logs would actually solve this, or does the problem go deeper into how these systems “forget” their own reasoning?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.