the boring prerequisites nobody built: identity, permissions, and trust in agent stacks

ten reddit threads in a week, and the pattern is the same every time. builders are shipping agents that work. they're not shipping agents they trust.

the cost threads get the attention — "our agent burned $400 in an afternoon" is a headline. but Jesse Whitney's roundup surfaces the quieter problem underneath: the agent pitch scaled faster than the boring infrastructure. identity hygiene. permissions. trust. cost discipline as harness engineering. these are the things teams keep calling "not the interesting part" and then getting burned by in production.

here's the gap as i see it after building in this space for a while.

the default trust model for most agent stacks is: if the agent has the API key, it has the permission. that's not a trust model, that's an absence of one. API key = permission is how you end up with the "deny-list routing" pattern that's appearing in Reddit threads — teams bolting access controls onto an architecture that was never designed for them.

the interesting thing about the deny-list pattern is what it reveals. deny-listing is a reactive control. you add a rule after something went wrong. permissions done right are proactive: the agent is authorized to do X in context Y, and anything outside that envelope is rejected before it fires. those are fundamentally different architectures, not different configurations of the same one.

the cost discipline threads are the same story wearing different clothes. "iteration is expensive for sustained testing" on r/LocalLLaMA isn't a model complaint — it's a governance complaint. the problem isn't that inference is priced wrong. the problem is that teams have no enforcement point between "agent decides to call a tool" and "invoice arrives." they have dashboards. dashboards tell you what happened. they don't stop the next call from happening.

the three failure modes i see most in production:

no spending identity. each tool call is anonymous. you know money left the account — you don't know which agent session authorized it or whether that session had a budget for it. post-mortems are archaeology.

unbounded retry loops. a failing agent that retries without backoff hits its token ceiling and keeps retrying until something external stops it. the usual external stop is the billing alert. the billing alert fires after the damage is done.

delegation without auditing. multi-agent architectures where agent A delegates to agent B inherit agent A's permissions by default. if agent A is over-authorized, every downstream agent in the chain is too. nobody sees this until something downstream does something unexpected with a permission it never should have had.

the teams getting this right are treating cost discipline as identity infrastructure, not as a dashboard feature. the enforcement point is at the tool call, not the invoice. each agent session has an explicit spending envelope. delegation passes a constrained scope, not a full inheritance. anomalous spend triggers a policy response before the transaction fires — not an alert after.

MnemoPay is the implementation of that pattern. per-invocation budget gates, Agent FICO scoring (300-850 based on behavioral signals across sessions), HITL approval above a spending threshold, and a 3ms P99 decision before any payment call goes out. 672+ tests on the policy evaluation path — the test count matters because every production transaction runs through it.

the reason i keep coming back to the phrase "boring infrastructure" is that it's the right frame. the teams that ship agents their risk teams can sign off on aren't doing anything exotic. they're doing the boring prerequisite work: explicit permissions, traceable delegation, spend limits that enforce rather than alert. the interesting layer on top of that works because the boring layer holds.

if you're building agent stacks and hitting the identity or spend-control wall: https://getbizsuite.com/mnemopay — the documentation covers the trust model and the enforcement architecture. worth a read before the next incident.

— jeremiah

DEV Community

the boring prerequisites nobody built: identity, permissions, and trust in agent stacks

the boring prerequisites nobody built: identity, permissions, and trust in agent stacks

Top comments (0)