An employee at a mid-sized company opens her internal HR agent and types "cancel my contractor agreement with vendor X." The agent works out what to do, calls the procurement API, and the contract is cancelled. She gets a confirmation in chat and goes back to her morning.
Now look at the procurement system's audit log. Its job, set up years before this AI agent existed, is to record who authorised every action affecting supplier contracts, so that finance or legal can later trace any change back to a responsible person. For this cancellation, what does the log show? In the vast majority of agent deployments running today, it shows a request from a service account called something like hr-agent-bot. The bot has standing permissions to cancel contracts because that is what makes it useful. The log records that the bot acted. It does not record which employee asked, what reason she gave, or whether the cancellation she described was the cancellation the agent actually performed. She signed in earlier that morning through single sign-on with a hardware token, and none of that travelled with the procurement call.
I have seen the same pattern in every agentic system I have reviewed for clients over the past year, across retail support agents, internal IT agents, and finance copilots. The chat layer always knows who the user is. The system on the receiving side of the tool call never does. Whatever happened between gets compressed into a service-account API request indistinguishable from any other application talking to that backend.
I picked up the phrase "last mile" from the telecom industry. Running fibre across hundreds of miles of backbone is cheap per metre. The cost balloons when the fibre has to enter someone's actual building and connect to whatever wiring is already there. The agent stack has the same imbalance. Reasoning, protocols, and the language model itself are modern. Most useful actions still have to land on a system that was designed for something else, Salesforce or a Postgres database or a payment processor or a ticketing API, often built before AI agents were a category. They have no native way to carry a delegation chain, no way to attach user intent to an incoming request, and no way to refuse a call on attributes the agent did not provide. They trust the agent the way they would trust any other connected application.
The blind spot is bigger than "who asked." The employee supplied an instruction with constraints baked into it: vendor X, not vendor Y, cancel and not pause, this contract specifically and not the whole relationship. That intent lived in the chat conversation. By the time it reached the procurement system, it had been flattened into cancel_contract(contract_id=12345). If the agent's interpretation drifted, because of a prompt injection in the contract metadata or because it picked the wrong row from a list of three, the procurement system has no way to detect the drift. It receives the call, sees that the calling bot has permission, and writes the row.
I expected an identity-propagation problem with a clean engineering answer when I started looking into this. After reading the published reports on Microsoft 365 Copilot's EchoLeak vulnerability, the Slack AI cross-channel exfiltration, Zenity Labs' Copilot Studio attack, Replit's production-database deletion, and the OpenClaw and Moltbook disclosures from early 2026, the situation is more structural. The fix is real, it has been deployed at internet scale for years inside the kind of distributed systems infrastructure that runs hyperscale platforms, and the protocols it depends on were standardised at the IETF in January 2020. The piece that has not been solved is the call from your agent into the model provider itself. The Python client your application uses to talk to a model provider takes an API key once at construction and reuses it for the lifetime of the process. There is no callback to refresh it per request, no field to attach the current user, no clean lifecycle hook to drop it on signout. The backend leg of the last mile is solvable today with open-source and cloud-native primitives. The agent-to-model leg waits on an SDK contract change none of the providers have made.
The four green boxes know who the user is. The two red boxes never will, with the current default. The dotted arrow is the last mile.
tl;dr
- Two things vanish at the same hop in every agent stack: the user's identity (who asked) and the user's intent (what they actually wanted). Both turn into a generic service-account API call by the time the backend sees them.
- Every major agentic security incident from 2024 to mid-2026 fits this shape: EchoLeak, the Slack AI exfiltration, Copilot Studio AIjacking, Replit's production-database deletion, and the 2026 wave around Moltbook and OpenClaw.
- The fix is a credential broker between the agent and the backend. It propagates user identity through RFC 8693 token exchange, evaluates the request against an open-source policy engine like Cedar, Open Policy Agent, Cerbos, OpenFGA, or Teleport's policy engine, and issues short-lived audience-bound credentials through standard token-exchange and workload-identity primitives. The architectural pattern has been deployed at hyperscale for years; BeyondProd's public engineering documentation from 2019 is the canonical write-up of the same pattern in a non-agentic setting.
- For the architecture to close end-to-end, the broker needs a sibling on the model side: a transparent proxy on the inference path that captures every prompt and response, detects prompt injection in real time, and stamps every interaction with the same correlation ID the broker uses. LLMTrace was built for that role.
- The structural blocker: the agent's own call into the model provider still requires a long-lived API key in memory, because the OpenAI, Anthropic, and Google client SDKs take the key once and never offer to refresh it. Until that contract changes, the broker pattern fixes the backend leg of the last mile and leaves the model-provider leg unprotected.
Full Article here >>



Top comments (0)