The guided runtime — push, not pull.
Status: public draft for Dev.to. Public-safe. A thesis with early, observational
evidence — not a validated result. The controlled experiment is at the end, and
the number will follow.
I watched my agent slowly lose the thread
I'm building a system that got complex. Many entry points. Interfaces that call
interfaces. Docs, contracts, and conventions scattered across the repo. The kind of
codebase where you need a minute to remember how a piece fits.
So I'd hand an agent a task and watch. It would start fine. Then, a dozen steps in,
it would begin to slip — re-deriving where it was, missing an interface it had
already seen, grepping for a doc it had read an hour ago, patching a symptom,
hitting the same wall, patching again. It never stopped to ask whether its whole
approach was wrong. It just kept grinding, locally, until it ran out of road.
I did the obvious thing. I improved the context: better indexing, embeddings,
retrieval, a repo map, a search tool. Give the agent more ways to find what it
needs.
It barely helped. And that's when I realized I was solving the wrong problem.
Everyone's solving the wrong problem
Grep, RAG, repo-maps, semantic search — they're useful. I still use them. But
they mainly make more information available. And the dominant assumption is
that the agent fails because it doesn't know enough about the system.
But watch a capable model fail on a long task and that's not what you see. Given
the right frame, the same model executes long, correct, disciplined chains without
complaint. It doesn't lack intelligence, and on my codebase it didn't lack
information — it had a search tool pointed at everything.
It gets lost because it cannot hold its place in a complex system across a long
horizon. That's an attention problem, not just an information problem.
Retrieval helps the fact layer; it does not, by itself, control attention. Worse:
pull asks an already-drifting model to do the exact thing drift has corroded —
navigate the system, decide what's relevant, and assemble its own context. Hand
agents a precise structural tool and leave it optional, and they often don't even
reach for it. They're lost. Being lost is what makes you unable to know what to
fetch.
We've been answering the wrong question.
The flip: push, not pull
If the agent can't be trusted to find its context, stop making it. Have the
runtime compute the next step and push it — into the loop, before the agent
acts.
That's the whole move: push, not pull. I stopped building a better library for
the agent to search and started building a thing that tells the agent where it is
and what's legal next, every step. I call it a guided runtime.
It's defined by what it is not:
- Not retrieval as the control loop. Search still exists, but it becomes a fact-layer tool. The agent shouldn't have to decide, while drifting, what context to assemble before every action.
- Not error-feedback (post-hoc). Not "you ran the tests, they failed, fix it." It fires before the action, to keep the trajectory on the rails rather than drag it back after it leaves them.
- Not a one-shot plan. It's recomputed every step from current state, not read off a plan written once at the start.
The mechanism — Facts + Rules → Guide
A guided runtime factors into three layers. Only the outer two are
domain-specific; the middle engine is reusable.
- Facts — a fresh, queryable model of ground truth. For code, a commit-bound dependency/symbol graph. Generally: a projection over an append-only event log (event sourcing). Current state is the fold of events pinned to an explicit watermark. If the world has moved beyond that watermark, the runtime can say so instead of pretending its snapshot is omniscient.
- Rules — the plan, compiled into something checkable. This is the load-bearing idea: a contract is a plan made verifiable. Where a plan says "implement the feature and add tests," a contract says: these files are the fence, these acceptance predicates must hold, these actions are allowed and these forbidden, this evidence must exist before done.
-
Guide —
guide = rules ∘ facts. Evaluate the rules against the facts and emit the steering signal: the next legal action, the missing piece, a drift alert, a blocked action. The guide tells; gates enforce. Evidence still has to be written by the right actor, and authoritative gates decide what can close.
If this smells familiar, good: it's the classic facts + rules → inference
architecture — production rule engines, policy-as-code (evaluate policy over
input facts → a decision), event sourcing / CQRS. The engine rests on decades of
battle-tested ground. The novelty isn't the engine.
What does the runtime actually push? Not just a paragraph of advice. A usable
guide looks like an envelope: current contract id, actor role, allowed and
blocked actions, next legal action, required evidence, a payload skeleton, the
state watermark it was computed from, and the gate that will verify it. The
important thing is that the agent no longer has to re-infer "where am I and what
is legal now?" from scratch.
The part I didn't expect: this isn't about code
The guide engine — rules ∘ facts — is domain-independent. Code is just the
instance where the fact layer is easy (a commit and its dependency graph are free).
To stand a guided runtime up in any complex agent-workflow domain — an ops
runbook, an insurance-claims pipeline, a multi-step business process — you supply
exactly two things:
- A fact-layer tool that extracts ground truth as an event-sourced projection (most business state is lower-churn than code — it changes at discrete, modelable transition points, which makes the fact layer cheaper than for code, not harder).
- The rules — the domain's plans compiled into checkable contracts.
Then the same guided runtime steers the agent: it pushes the right next step, the
right gate verifies the result, and the system escalates only when the contract
can't resolve it. The hard part — and the only per-domain cost — is the
modeling: choosing the events and writing the checkable rules. Everything else
is reused. The machine is general; the modeling is where human judgment goes.
That's the bet that makes this more than a coding trick: a guided runtime is a
general way to keep an agent on-trajectory through any complex business you can
model as facts + rules.
Let me be honest about novelty
Every ingredient is prior art. Continuous, structure-derived steering of the next
action is old (grounded decoding, value-guided action selection in robotics).
State-machine-driven context injection into an LLM is old. The facts+rules engine
is older. Event sourcing is decades old.
The bet is the synthesis: push (not pull) + structure-derived from a contract
over a freshness-pinned, event-projected fact model + as the primary anti-drift
control for an autonomous agent. As far as I can find, no one has fused exactly
that — and several lines of work are converging on it right now, which I take as a
sign the frame is right.
Does it work?
Here's where I keep myself honest, because the internet is full of "new paradigm
solves everything" posts and I don't want to write one.
I built this as a governance layer for coding agents — Aming Claw (public; it's dogfooding on itself, so expect churn) — and dogfooded it hard. The
observational signal is genuinely encouraging: as the runtime stabilized, the
pathological stuck-loops became visible earlier, and several paths that used to
die in a thrash of repeated blockers started converging cleanly. Some
multi-worker paths that had been painful began to run through the contract
instead of through operator memory.
I will not call that validated. It's observational, not controlled. The work
was longitudinal: the architecture, the bug fixes, and my own guidance all changed
at once, so I cannot yet cleanly separate the runtime's effect from the
operator's. Encouraging trajectory, unproven causality. If anyone shows you a
trend like this and calls their paradigm "validated," they're selling.
Where it actually is today: mid-migration from pull to push. The guide does not
yet compute every legal step. Where a contract hasn't modeled the situation, the
agent still falls back to pulling context and reasoning its way through — and that's
by design during the build. Push doesn't abolish pull; it aims it, and right now a
lot of the space still isn't aimed. Each iteration moves more of it from
agent-pulls to runtime-pushes as more gets modeled into facts and rules. I'm
publishing the thesis now, mid-iteration and openly unstable. Once the push path
converges and the pull fallback shrinks to the genuinely-novel edge, I'll cut a
stable release — and run the experiment below against that, not against
today's moving target.
So here's the experiment I'm running next, and the number I'll publish:
- Operator-free, logged as such — so the runtime's effect is isolated from mine.
- Same model, guided runtime on vs off, plus a strong baseline scaffold — a fixed-model ablation, so the delta is attributable to the harness, not the model.
- An external, objective benchmark with a third-party metric and verified trajectories — not my own success gate.
- Drift/completion delta, stratified by task length — because the thesis is that the overhead loses on short tasks and the anti-drift wins on long ones. The crossover is the claim.
If it shows what I think it shows, I'll post the number. If it doesn't, that's
worth knowing too.
The frame, in one line
Your agent isn't lost because it lacks context. It's lost because it can't hold its
place. So stop making search the control loop — push, not pull: build a guided
runtime that computes the next step as rules ∘ facts and hands it to the agent
before it acts. It works for code today; the same shape should work for any
complex workflow you can model as facts and rules.
Tell me where it's wrong.

Top comments (0)