Real Agency in Modern Agents
The agent says it will check, continue, verify. Then the process exits. Nothing comes back.
Not because the models are bad. The models are getting better so fast, that complaining about them already feels outdated by the time the sentence is finished. The problem is not that they cannot write code, summarize a document, call a tool, or generate a plan. They can, often brilliantly.
The problem is that most of what we call “agentic” still behaves like just an end-to-end execution triggered by a prompt.
You ask. It answers. It stops.
Maybe it calls a tool somewhere in the middle, or streams a lot of confident text. Maybe the demo looks like the future for thirty seconds.
But when, and it is just a matter of when it loses context, hits a failure, promises to do something later, or quietly stops before the work is actually done, there is often nothing underneath it that makes the system come back.
That is not agency.
That is a very impressive function call.
I do not want better autocomplete
I know what autocomplete is. I've used it before deep agents. It was useful...
What I care about is delegation and agency. Real agency. The kind where I can hand a directive to a system, walk away for a while, and come back to evidence: what was attempted, what changed, what failed, what still needs a human, and why.
That is a very different bar from “the model produced a plausible answer.”
The difference sounds obvious until you watch these systems fail. An assistant can be helpful with one answer. A real agent needs to survive the space between answers. It needs to know that a partial result is not the same as completion. It needs to know that “I will do this” is not the same as doing it. It needs to know when to keep going, when to recover, and when to stop lying to itself and ask for help.
The model is not the agent.
The loop is the agent.
The lie of “I’ll do that”
The most annoying failure mode in agentic systems is annoyingly polite.
The agent says it will check something. It says it will continue or that it will assign the next step. It says it will verify the result. The language sounds like work is about to happen.
Then, for some reason, the process exits.
Nothing comes back. No proof. No correction. No escalation. Just a sentence that had the shape of agency and none of the behavior or outcome.
This is where a lot of teams fool themselves. They see a model produce agentic language and confuse that for agentic execution. But agency is not tone. It is not a persona. It is not saying “I’ll take care of it” with confidence.
Agency is operational and it doesn't need you.
It shows up in the boring parts: state, recovery, validation, evidence.
If the system cannot tell the difference between a promise and a completed action, it does not have agency.
The loop is the agent
The mental model that keeps coming back to me is simple.
One-shot software looks like this:
ask -> answer -> stop
That is fine for questions. It is terrible for work.
Work looks more like this:
goal -> attempt -> observe -> classify -> continue / recover / stop
When I say “the loop is the agent,” I do not mean the loop alone is some magical entity. Obviously the harness, the tools, the model, the state, and the surrounding product all matter. I mean it metaphorically: the thing that creates agency is not the single model call. It is the repeating structure that lets the system attempt, observe, correct, and continue.
Each word matters.
Goal means the system is trying to move toward an outcome, not just complete a text turn.
Attempt means the model gets a bounded chance to act. Not infinite freedom. Not a magical open-ended process. One attempt.
Observe means the system records what actually happened. Not what the model said happened. What happened.
Classify means the system decides whether the attempt is complete, partial, failed, blocked, or just a polite promise with no substance behind it.
Then, and only then, should it decide what comes next.
Continue if progress is real. Recover if the failure is useful. Stop if the system needs a human decision. Do not blindly rerun the same prompt twelve times and call it autonomy.
The shiny feature is the agent doing work.
The real feature is the harness that knows whether the work landed.
What workflow systems already know
None of this is new if you have spent time around reliable systems.
Long-running work has always needed state. Distributed systems have always needed retries. Serious automation has always needed idempotency, logs, limits, and escalation paths. The language is different now because the worker is a model, but the shape of the problem is familiar.
You need a durable trace because without memory, agents are bound to repeat their steps and mistakes.
You need validation because “looks done” is how broken work ships.
You need attempt limits because retrying forever is not sustainable.
You need recovery because partial progress is valuable.
You need a way to stop because some problems really do require a human-in-the-loop.
This is the part I think a lot of agent discourse misses. Everyone wants to talk about reasoning. Reasoning matters. But in real systems, reasoning is only one dimension. A brilliant model inside a fragile harness is still fragile.
If the harness treats the last message as truth, the whole system inherits the model’s worst habit: sounding finished before it is finished.
Skills are how the loop remembers
One pattern I have become increasingly interested in is skills.
Not “skill” as a marketing word. I mean small reusable packages of procedure: instructions, checklists, scripts, examples, validation steps, and recovery paths that teach an agent how to handle a recurring kind of work.
This is important because a serious agent system should not make the same mistake forever.
If a failure happens once, maybe it is noise. If it happens twice, it is a signal. If it happens three times, it should probably become a procedure.
Humans work this way too. We build habits. We write runbooks. We create checklists. We learn which parts of a task are fragile, and we stop trusting ourselves to remember them perfectly every time.
Our brains are not fixed. They change through practice, repetition, feedback, and recovery. That is how we learn. That is how habits form. That is why you can improve at almost any age if you keep pushing into new patterns instead of letting yourself calcify.
Agent systems are not brains. Software is not biology.
But the lesson still carries: improvement requires the system to change after experience.
For agents, that change does not have to be mystical. It can be boring in exactly the right way. A failed run becomes a checklist. A repeated mistake becomes a validator. A fragile workflow gets a recovery step. A confusing handoff gets clearer evidence requirements.
Real autonomy is not doing the task once.
It is changing the system so the next task starts from a better place.
The real feature is continuation
The more I build around agents, the less impressed I am by a single successful run.
A single run can be luck. A demo can hide all the ugly parts. A prompt can work perfectly when the context is fresh, the input is clean, and nobody asks the system to recover from anything.
Continuation is harder.
Can the system resume after an interruption?
Can it notice that the previous answer was only a promise?
Can it preserve enough context to keep going without starting from zero?
Can it surface evidence instead of confidence?
Can it stop before it causes damage?
That is where trust begins.
Not in the first answer. In the second attempt. In the recovery. In the correction. In the trace that lets a human understand what happened without replaying the entire mess from memory.
The agent that impresses me is not the one that sounds the smartest.
It is the one that comes back.
Small societies of agents
I do not think the future is one giant god-agent doing everything.
That framing feels wrong to me. It is too theatrical, too brittle, too obsessed with intelligence as a single point of power.
The future I care about looks more like small societies of bounded agents, each with a job, a context, a set of tools, and a way to be checked. Not because hierarchy is cool. Because boundaries are how systems stay sane.
One agent explores. One writes. One validates. One summarizes. One asks the human when the system is out of its depth. The exact shape will vary, but the principle stays the same: autonomy needs structure.
Without structure, more agents just means more chaos.
With structure, agents can become something closer to teammates if the workflow around them makes accountability possible.
In this dimension, agency is a contract.
Think in loops
The next wave of useful agentic systems will not come only from better prompts.
It will come from better loops.
Loops that observe. Loops that classify. Loops that recover. Loops that validate. Loops that remember what broke last time. Loops that know when to stop and bring a person back in.
That is the part I love about this space right now. It feels like software engineering again. Not just asking a model to be smarter, but building the environment where intelligence can become reliable work.
I am tired of agents with no real agency because I can see the better version trying to exist.
It is not magic.
It is not AGI.
It is a loop with memory, evidence, recovery, and taste.
If your agent cannot come back, it is not autonomous.
It is just a prompt that ended.

Top comments (0)