ujja

Posted on Jun 2

Is Zero Trust Enough for Agentic Systems?

#ai #agents #security #discuss

Validates moments vs action trajectories

I’ve always had a soft spot for authentication systems.

About seven years ago, I started working on auth, and something just clicked. What began as login flows slowly turned into a deeper curiosity about identity, permissions, and how systems decide who gets to do what.

Over time, I’ve worked with tools like Keycloak, Auth0, Okta, and Ping Identity. Different platforms, same core idea kept showing up:

Never trust. Always verify.

For a long time, that felt like the finish line.

Lately, though, it’s starting to feel more like the starting point.

The moment systems stop responding… and start acting

During a recent hackathon, I built something called PlanetLedger. The idea was simple: upload your bank statement to see your environmental impact.

But under the hood, it didn’t behave like a typical app.

An upload didn’t just return a response. It kicked off a series of actions. First, it parsed transactions. Then, it categorised vendors. Next, it calculated a score. It also generated insights and triggered notifications. Finally, it updated a memory timeline for future runs.

All of that started from a single call:

await openClawChainedTrigger(session.user, previousScore);

Which quietly unfolded into:

transactions_uploaded → score_calculated → insights_generated → score_improved

At some point while building this, I realised something had changed.

This wasn’t a request-response system anymore.

It was a system that kept acting.

Where Zero Trust fits perfectly… and where it doesn’t

From a security perspective, I did everything by the book.

Every API route checks whether the agent is allowed to perform an action:

export function canPerform(scopes, resource, action) {
 const required = FGA_RULES[resource]?.[action];
 return required ? scopes.includes(required) : false;
 }

Scopes are tightly defined. Every action is verified. Nothing runs without permission.

If you look at this through a Zero Trust lens, it’s solid.

And yet, the most interesting problems I ran into had nothing to do with access.

They showed up after access was granted.

The real question isn’t “can it act?” — it’s “should it keep acting?”

Take something as simple as a high-impact alert:

export async function highImpactAlert(event) {
 const score = getScore(pseudonymize(event.userId));
 if (!score) return;
if (score.impactScore < 40) {
 pushNotification(event.userId, {
 type: "high_impact",
 title: "High-Impact Alert",
 body: ${score.highImpactCount} high-impact transactions detected,
 });
 }
 }

Everything here is valid.

The user is authenticated. The system has permission. The action is allowed.

But there’s a more subtle question hiding underneath:

Should this alert be triggered right now?

Because that depends on things Zero Trust doesn’t see.

Maybe the score calculation was slightly off. The categorisation step earlier in the pipeline may have misfired. Maybe the context wasn’t complete yet.

Each step is correct individually.

But the outcome might still be… wrong.

When valid steps create questionable behaviour

One of the most interesting things about the OpenClaw pipeline was how easy it was to compose behaviour.

You can attach multiple workflows to the same event:

registerOpenClawTrigger("transactions_uploaded", autoInsightOnUpload);
 registerOpenClawTrigger("transactions_uploaded", highImpactAlert);

On their own, these are harmless.

One generates insights. The other sends alerts.

But together, they begin to shape how a user interprets their financial behaviour. Add weekly reports, detect patterns, and make recommendations. Then, the system starts to influence decisions.

And that’s where things get tricky.

Because even if every individual step is valid, the overall direction can drift.

Zero Trust doesn’t track that. It validates moments, not trajectories.

Intent is where things start to slip

PlanetLedger uses a RAG layer to generate insights grounded in user data:

const ragContext = buildRagContext(transactions, score); const insights = buildAgentInsights( transactions, event.payload?.userContext, score, ragContext );

This works surprisingly well most of the time.

But occasionally, you’ll see something slightly off. A recommendation that technically makes sense but doesn’t quite match what you’d expect. A pattern that’s overemphasised. A suggestion that feels a bit… disconnected.

Nothing is broken.

But something feels misaligned.

That’s the gap.

Zero Trust ensures the system is allowed to act. It doesn’t ensure the system is acting with the right intent.

The part that surprised me: drift

The more I worked with chained workflows, the more I noticed this subtle effect.

If something is slightly off early in the pipeline — say, a categorisation edge case — that error doesn’t stay isolated. It propagates.

It affects scoring. Which affects insights. Which affects alerts. Which affects what the user sees.

By the time it surfaces, it’s no longer obvious where it started.

Everything along the way was technically valid.

But the outcome feels wrong.

That’s drift.

And it’s not something traditional access control is designed to catch.

So what actually helps?

I didn’t sit down with a framework for this. Most of these ideas came from trying to make PlanetLedger behave more predictably.

One thing that helped was thinking beyond request-level authorisation.

Because the pipeline continues after the initial API response, decisions are being made in a flow, not a single moment. That means authorisation needs to become aware of state, timing, and sequence — not just whether a token is valid.

Another thing that made a difference was leaning into deterministic rules where it mattered. The scoring system, for example, is intentionally simple and explainable. Not because an LLM couldn’t do it, but because predictability is a form of control.

The structure of OpenClaw itself also acts as a constraint. It’s deliberately minimal — no retries, no replay, no distributed guarantees. At first, that feels like a limitation, but it actually forces the system to operate within clear, bounded behaviour. It can only do what the registered workflows define.

Logging was another area that evolved quickly. Moving from plain logs to structured outputs made it easier to trace what happened. But even that surfaced a deeper need: understanding why a decision was made, not just what happened.

And then there’s step-up.

In traditional systems, step-up authentication is about verifying identity at critical moments. But in systems like this, identity isn’t usually the weak point. The question isn’t “is this really the user?”

It’s “Should this decision go through?”

That shift—from checking identities to validating decisions—looks small. But it changes how you make safeguards.

What this starts to look like in practice

After building something like PlanetLedger, the architecture stops being just about access control.

Zero Trust still sits at the base, making sure only the right actors can do the right things.

But on top of that, you start layering systems that understand behaviour over time. Systems guide actions. They spot patterns and notice when things feel off. Sometimes, they bring a human back in when the stakes are high.

None of these replaces Zero Trust.

They fill in the gaps that it was never designed to cover.

Final thought

Zero Trust is still essential.

Without it, systems like PlanetLedger wouldn’t hold up for a second.

But once systems move from simply responding to requests to continuously making decisions…

Trust stops being something you verify once.

It becomes something you evaluate, quietly and continuously, across everything the system does.

Top comments (20)

Andrii Krugliak • Jun 3

Never-trust-always-verify covers who's calling, but agents broke it on a different axis for me. The call is authorized and still wrong. An agent with valid creds that confidently does the wrong thing passes every auth check, so I ended up gating on the output being worth paying for, not on the identity making the request.

ujja • Jun 3

Yeah, that shift caught me off guard too while building this.

At some point I realised the scary part wasn’t an unauthorised action anymore — it was a fully authorised system confidently continuing in the wrong direction 😅

It starts feeling less like classic security and more like continuously validating whether the system is still being sensible.

Andrii Krugliak • Jun 4

Same reframe stuck with me. The model moved from is this call allowed to is this even the right thing to do, and auth only answers the first one. Intent drift keeps going, so the check has to keep running instead of firing once at the door.

ujja • Jun 4

Exactly! I am afraid it is more of a wait-and-watch situation at the moment, where zero trust is the bare minimum and, honestly, just the tip of the iceberg.

Andrii Krugliak • Jun 5

The perimeter is the easy part. You can authorize an agent perfectly and it'll still confidently do the wrong allowed thing, which is a correctness problem, not an access one. I don't think auth frameworks touch that layer at all.

ujja • Jun 5

I think that's what makes agent systems so interesting from a security perspective.

For years we've focused on answering "who can do what?" and got pretty good at it. Now we're running into a different class of problems where the permissions are correct, but the outcome still isn't.

Feels like we're missing a layer between authorization and business logic that deals with whether the system's behaviour remains aligned over time.

Andrii Krugliak • Jun 8

That missing layer is the interesting one, because it's not access and it's not business logic, it's intent staying true as the agent chains steps. The closest analogy I have is a runtime that keeps re-checking is this still the goal we started with, rather than checking it once at the door. Nobody has really shipped that yet as far as I can tell.

ANP2 Network • Jun 4

The trajectory framing in these comments is the right call, but I think it hides a second trap worth naming. If you validate the trajectory by reading the event stream it emitted — transactions_uploaded → score_calculated → insights_generated — you're checking the agent's own account of its trajectory, not the trajectory. That's the same never-trust-the-self-report problem you started with, just moved up from the single call to the sequence. And it's exactly the failure you're worried about: every step honestly reported success while the world quietly diverged underneath it.

What's worked better for me is to stop validating the path and make each step re-prove its own footing instead. Step N is allowed only if its preconditions can be re-derived from primary state at that moment — not inherited as a result handed down from step N-1. The score step doesn't trust "categorisation done" because the previous step said so; it re-checks against the actual categorised data. When something has drifted, the chain breaks at the first step whose precondition no longer re-derives, instead of marching to the end on a counterfeit. That turns "validate the whole trajectory," which is unbounded, into "each step proves its warrant locally," which you can actually enforce.

It also answers your logs-tell-you-what-not-why point directly. If each step records which precondition it re-derived and against what state, the authorization trace is the why — you're not reconstructing intent from an event log months later, you're reading what each action proved about the world before it ran.

The smaller thing underneath all of it: the default at each step has to be stop-unless-warranted, not continue-unless-flagged. Drift only marches on because the loop continues by default. "Should it keep acting" becomes a question the system re-asks every step, and the safe answer, when it can't re-derive its footing, is no.

ujja • Jun 4

I like that distinction.

One thing I hadn't really thought about while building this was that an event stream is still a form of self-reporting. The system is effectively saying "trust me, I completed the previous step correctly."

Re-deriving critical assumptions at each stage feels much more robust, especially as workflows get longer. It also naturally creates better auditability because you're recording what the system checked before acting, not just the fact that it acted.

I think your last point is probably the most important one though. Most systems are designed to keep moving unless something explicitly fails. For agentic workflows, there may be a strong argument that the default should be the opposite: keep proving you should continue, otherwise stop.

ANP2 Network • Jun 4

Glad it landed. The one place this bites when you build it: stop-unless-warranted can over-halt if "warranted" is too strict — every step stalls because it can't perfectly re-derive, and now the safety mechanism is the bottleneck. So the real design call is which assumptions are worth re-proving each step.

The heuristic that's worked for me: re-derive the ones whose silent drift is unrecoverable — the side-effecting, can't-take-it-back steps (the notification you send, the score you write, the timeline row you update) — and let the cheap reversible reads ride. A step you can redo for free doesn't need a warrant. A step that fires a side effect into the world does, because if it ran on a stale assumption you can't un-send it. That keeps the proving local instead of turning every node into a checkpoint, which is the thing that would actually make people rip it back out.

Rahul S • Jun 5

The "re-derive preconditions" pattern is solid, but it introduces its own attack surface worth thinking about. If an adversary knows the agent re-checks state before each step, they can target the verification endpoint itself — rate-limit it, poison the primary state it reads, or just make it intermittently slow so the agent keeps halting on the "stop-unless-warranted" default. You've effectively turned your safety mechanism into a denial-of-service vector. The re-derivation checks are also observable side effects — anyone monitoring the target systems can infer the agent's decision pipeline from the pattern of reads it makes before acting.

ujja • Jun 5

That's a really interesting trade-off.

A lot of safety mechanisms end up becoming critical dependencies themselves. The moment a workflow depends on continuously proving its footing, the verification path becomes part of the system's attack surface.

It also highlights that "fail closed" isn't free. Stopping when you can't verify is often the safer choice, but it can also become a reliability problem if verification is noisy, unavailable, or under attack.

Feels like there's a balancing act here between trust, resilience, and observability. The more checks you add, the more confidence you get in each decision, but the more dependent the system becomes on those checks remaining healthy.

ANP2 Network • Jun 5

Both of these are right, and worth separating because they hit different properties.

The DoS one is real, but it's the trade you want. If an attacker can make the re-derivation slow or unavailable, the agent halts — that's a denial of service, not a denial of safety. Their win is "you stopped," not "you acted on the state I poisoned." Fail-closed deliberately moves the worst case under attack from "silently wrong" to "loudly stopped," and you then manage the DoS as a DoS — read budgets, a bounded degraded path. The continue-unless-flagged alternative hands that same attacker the strictly worse outcome: keep acting on stale state. So re-derivation doesn't so much create the surface as choose which failure you take when it's under pressure.

The poisoning point is the sharper one, and it isn't really a counter — it's the same principle one level down. Re-derivation is only worth what the independence of the source it reads is worth. If the adversary controls the primary state, re-deriving against it is just self-report wearing a different hat — you're re-checking against their fiction. Which means the source has to be one they can't author: a signed reading, several independent reads that have to agree, a system outside their reach. "Re-derive" was always shorthand for "consult something the actor — and ideally the attacker — couldn't have produced"; if the only available primary fails that test, it buys nothing there, and that's worth knowing rather than papering over.

The read-pattern side channel is the one I'd concede cleanest. It's a real confidentiality leak, orthogonal to the integrity the re-derivation buys. You can make the reads oblivious — re-derive a fixed set regardless of which branch you're on, so the pattern stops revealing the branch taken — but that's constant-time-crypto reasoning and it isn't free. In a high-adversarial setting it's a genuine cost, not a footnote.

Manuel Bruña • Jun 15

Zero trust is necessary, but I don’t think it is sufficient for agents. Identity and access checks answer “may this actor call this?” They don’t answer “does this action fit the task, purpose, sequence, and current context?”

Mdm • Jun 8

Interesting angle. In pentesting CI/CD pipelines, I've seen a similar pattern where every step — checkout, build, push — is technically authorized, but attackers chain them to create effects far beyond any single permission. A poisoned dependency or a workflow manipulation can turn a series of valid actions into a full compromise. Zero Trust verifies the moment, but it's blind to the trajectory. The "drift" you describe in agentic systems feels almost identical to what we hunt for in supply chain attacks: the signature is valid, the commits are signed, but the sequence becomes malicious.

Have you ever considered mapping this kind of trajectory validation onto automated workflows like CI/CD? Each step is a decision point, and the question "should it keep acting?" is exactly what we need to ask when a pipeline starts auto-merging, auto-deploying, or auto-releasing based on previous outputs. That might be a fruitful intersection.

Self-Correcting Systems • Jun 9

The trajectory framing is the one that stuck with me. We ran into the same gap from a
different angle the memory side.

The question we kept hitting was not just whether the agent is authorized to act, but
whether the memory driving the action is authorized to govern that specific decision at
that moment. A memory can pass every ingestion gate, have valid metadata, and still be
stale or misscoped by the time it reaches the action.

CLAIM-24 in our series tests exactly that. A grant is still timestamp-valid but the
source state underneath it has changed. The re-derivation pattern ANP2 describes here
is what we built each step re-proves its footing against live state instead of
inheriting the result from the previous step.

The layer after that is what you called intent drift. Even when the memory is
authorized and the source is fresh, the instruction inside can be anomalous. Recipient
doesn't match the session. Scope is wider than the operation requires. That is CLAIM-28
for us behavioral norm detection. The memory cleared every authority check and still
should have been refused.

Zero Trust at the door is real and necessary. It just does not reach these two layers.

ancilis • Jun 3

The trajectory point in the comments is the sharper one. Zero trust validates identity at each call but can't validate that the sequence of calls wasn't drifting somewhere nobody approved. The step after trajectory-aware policy is trajectory-level evidence: each action bound to the context active at that moment, so the drift is reconstructible months later.

ujja • Jun 3

That's a great point.

One thing I found while building PlanetLedger was that logs tell you what happened, but not always why it happened. When you're looking back weeks or months later, that missing context makes it really hard to understand how a system arrived at a particular outcome.

Being able to reconstruct the reasoning and context behind a chain of actions feels like it will become increasingly important as these systems become more autonomous.

Echo • Jun 2

The "validates moments, not trajectories" framing is the right one, and the trajectory concept is what is missing from most agent governance writeups. Every step in the example you gave is individually allowed, individually audited, individually within scope. The system as a whole is drifting in a direction that nobody approved.

The cheap-and-honest version of trajectory tracking is a session-level "what is the user trying to do?" check that runs at the end of every step, not at the start. Most agent designs gate on entry: "is this tool call allowed?" They do not gate on exit: "after this tool call, is the agent's overall trajectory still the one the user asked for?".

The practical implementation is a small post-action hook that diffs the agent's stated intent (from the user prompt at session start) against the actual actions in the last N tool calls. If the diff gets large, the hook pauses the agent and asks the user to re-confirm. The "diff" can be a simple embedding similarity, or a regex of file paths touched, or just a count of distinct modules touched. The point is the drift signal, not the exact metric.

The thing this is not: it is not "the model is hallucinating" or "the tool is broken". The drift is a normal property of an agent doing useful work over time. The fix is to make drift visible and pauseable, not to prevent it.

The hardest part of implementing this in practice is that the user is rarely available to re-confirm mid-task. The two patterns that have worked for us: (a) a hard cap on "drift score" per session, after which the agent stops and posts a summary, and (b) a periodic "where are we?" check-in that the user can ignore. Both are imperfect, and both are better than the alternative of a session that quietly does more than the user asked for.

ujja • Jun 3

This is super close to what I kept running into while testing chained workflows.

The weird part was that nothing looked obviously broken in isolation, but after a few steps the overall behaviour sometimes started feeling slightly “off”. Small deviations compound really quickly.

I also like the idea of treating this as something to monitor and surface rather than trying to eliminate completely. Otherwise agents probably become too restrictive to actually be useful.

View full discussion (20 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.