DEV Community

The Loop Is Not the Product

Daniel Nwaneri on June 09, 2026

A tweet landed on my timeline from Peter Steinberger — OpenClaw founder, now at OpenAI: "Here's your monthly reminder that you shouldn't be promp...

Read full post

Sloan the DEV Moderator • Jun 10

Hey, this article appears to have been generated with the assistance of ChatGPT or possibly some other AI tool.

We allow our community members to use AI assistance when writing articles as long as they abide by our guidelines. Please review the guidelines and edit your post to add a disclaimer.

Failure to follow these guidelines could result in DEV admin lowering the score of your post, making it less visible to the rest of the community. Or, if upon review we find this post to be particularly harmful, we may decide to unpublish it completely.

We hope you understand and take care to follow our guidelines going forward!

Ken W Alger • Jun 9

This is an incredibly necessary reality check, Daniel. The financial and operational hangover hitting enterprises right now is the direct result of treating "loops" as a magic bullet rather than an infrastructure risk.

From a systems architecture perspective, Peter Steinberger’s premise is fundamentally flawed because it implies that the loop should be built around the agent. When you design loops that just chain probabilistic prompts together, you aren't building a product. You're building a token-denominated bureaucracy that runs up a massive bill while hiding drift.

The correction here requires a strict shift in custody:

The deterministic logic is the brain; the LLM is just the narrator.

If you are going to run a loop, the loop itself must be a rigid, finite state machine running on local silicon. The agent shouldn't be roaming freely across toolsets; it should be treated as an ephemeral runtime utility called inside strict, deterministic boundaries.

For a loop to be production-safe and compliance-ready, it has to enforce three sovereign guardrails:

An Ingestion Gate: Every single turn of the loop must pass through a local sieve to strip out conversational "prose tax" and keep token burn bounded.
Deterministic Verification: The agent never decides when a loop is "done" or if a failure occurred. A binary, immutable code gate (like a unit test or a strict schema validator) handles state promotion.
A Forensic Trace: Every cycle must emit a cryptographically signed receipt binding the input hash and transformation telemetry. If a loop executes 30 times into a void, you must have a non-repudiable audit trail to reconstruct exactly where the logic drifted.

Steinberger's advice is a recipe for expensive randomness unless we stop treating AI as an orchestrator and start treating it as a closely guarded component inside a deterministic harness. Exceptional write-up.

Daniel Nwaneri • Jun 9

The finite state machine framing is the correction the whole conversation needs. "The deterministic logic is the brain, the LLM is the narrator" . That's the architectural inversion most agent builders never make because the tooling doesn't enforce it. They reach for the LLM first and bolt on guardrails later, which is exactly backwards.

The "prose tax" concept is sharp. Every turn of the loop paying a conversational overhead that has nothing to do with the task . That's where a lot of the 30x multiplier actually lives and nobody names it that cleanly.

The forensic trace requirement is where I'd push back slightly. Cryptographically signed receipts make sense at compliance scale. For most teams the more immediate problem is they have no trace at all not because they chose the wrong format but because they never thought to emit one. What's your minimum viable audit trail before you get to cryptographic signing?

Ken W Alger • Jun 9

That is a completely fair pushback. You can't worry about verifying the integrity of a trace if your system isn't emitting any telemetry in the first place. Most teams are flying completely blind, which is why their first clue that a loop went sideways is a massive API invoice.

Before you ever reach for asymmetric keys or public-key infrastructure, the Minimum Viable Audit Trail (MVAT) requires you to turn that black box into a deterministic state ledger.

For teams just trying to survive the loop multiplier, the bare-minimum implementation comes down to enforcing three local constraints on every turn:

The Structural Delta Ledger: Never log raw text dumps or full chat histories. Instead, log a structured, local row (SQLite or flat JSON lines) containing three things: the state_origin (where the turn started), the input_hash, and a strict execution metric (e.g., execution time, token delta, or a binary pass/fail from your testing suite).
Deterministic Context Isolation Tokens: Assign a unique session-scoped ID to the loop execution, and pass an immutable sequence counter (turn_01, turn_02) into your state metadata. If your loop loops 5 times on the same task, you need to see exactly which sequence index began to stall.
The Local "circuit_breaker": Wire a hard-coded maximum turn count and a rolling token-burn ceiling directly into the state machine. If turn_count > 5 or accumulated_tokens > 15000, the loop violently crashes and forces a human checkpoint. The MVAT's job isn't just to watch the loop fail; it's to kill the loop before it drains the bank account.

Once a team shifts from raw text strings to a structured, local state ledger, they have their MVAT. They can see the drift, track the cost, and catch anomalies.

Cryptographic signing (Forensic Receipts) is simply the next logical layer of maturity for that exact ledger. You don't change the data shape; you just sign the manifest so that an external auditor can verify that the logs weren't altered post hoc to hide a compliance failure or a runaway loop.

Love the pushback—getting teams to emit any stable instrument before they prompt is half the battle!

Daniel Nwaneri • Jun 9

The circuit_breaker is where this clicks for me. turn_count > 5 isn't just telemetry / it's the exit condition enforced at the infrastructure layer instead of trusted to the model. Which means the spec-writer problem and the MVAT problem are the same problem at different altitudes. You define done before you open the terminal. The circuit_breaker kills the loop when done hasn't arrived by the boundary you set. One is upstream discipline, the other is downstream enforcement. Both are rejecting the idea that the LLM decides when it's finished.

The Structural Delta Ledger framing also reframes what logging is for. Most teams log for debugging. You're describing logging as governance . The ledger isn't there to help you reconstruct what happened, it's there to prove the loop never had the authority to run past the boundary in the first place.

SQLite or flat JSON lines is the right call for the MVAT floor. What's your threshold for when the delta ledger graduates to something with stronger consistency guarantees or does the circuit_breaker make that largely irrelevant below compliance scale?

Ken W Alger • Jun 9

Exactly. You’ve captured the core philosophy perfectly: Upstream discipline defines the boundaries; downstream enforcement breaks the circuit. Neither trusts the model to police itself.

To your question about graduation thresholds: the circuit_breaker is excellent for controlling execution velocity and token burn, but it protects your bank account, not your state integrity.

A simple local Minimum Viable Audit Trail (MVAT) (SQLite or flat JSON lines) is incredibly resilient, but it hits its architectural floor the moment you cross from a single isolated agent thread to a distributed multi-agent system sharing a mutable runtime context.

There are three distinct tipping points where a flat delta ledger must graduate to stronger consistency guarantees:

The Distributed Race Condition: If you have multiple asynchronous loops attempting to read from and write to the same state machine or shared memory base simultaneously, flat JSON lines will corrupt, and standard SQLite will throw database locks. You graduate to strict serializable isolation levels because a loop cannot make a deterministic state-promotion choice if the ground truth shifted under its feet mid-turn.
Causal Lineage Branching: In complex pipelines, a circuit-breaker might trip on Agent B, but Agent A already executed a downstream tool call based on Agent B's pre-failure state. A simple delta log tells you that it broke, but it can't roll back the environment. You graduate to an event-sourced, content-addressed ledger (where every state mutation is treated as an immutable, append-only block) so you can atomically roll back the system to the exact turn before the drift occurred.
The Custody Handshake (The Compliance Scaled Boundary): Below the compliance scale, a local database file is fine because the developer is the auditor. But the moment the loop's output updates a financial ledger, modifies a production codebase, or touches sensitive user data, your ledger must transition from an internal file to an external, non-repudiable one.

This is the exact design threshold where the Sovereign-SDK graduates a team from simple structured logging to asymmetric cryptographic sealing. The data shape doesn't change, but wrapping every state transition in an Ed25519 ForensicReceipt means you no longer rely on database permissions for security. The receipt itself proves the loop never violated its boundary.

If you're running isolated, sequential loops on local silicon, a properly tuned SQLite db with a violent circuit-breaker is a bulletproof fortress. You only need to scale the ledger's consistency when the loop's state becomes distributed or legally binding.

Daniel Nwaneri • Jun 9

The causal lineage branching case is the one that changes the mental model. The circuit breaker is a financial instrument. It protects the bank account. But Agent A already fired the downstream tool call before Agent B tripped and that call may have touched something real. The loop stopped. The side effect didn't.
That's the gap between "the loop is controlled" and "the system is safe." Most teams conflate them because in single-agent sequential flows they're the same thing. The moment you go distributed they decouple completely.

The "developer is the auditor" line draws the graduation threshold cleanly. SQLite with a violent circuit breaker is genuinely bulletproof for isolated loops where one person holds both roles. The consistency guarantees only become load-bearing when the auditor is someone who wasn't in the room when the loop ran — a regulator, a client, a future engineer reading the trace six months later.

That reframes what the Forensic Receipt actually is. It's not a security primitive. It's a trust transfer mechanism — proof that the loop's behavior can be verified by someone who wasn't present. Which means the question of when to graduate isn't really about scale. It's about who needs to trust the output and whether they were there when it ran.

Is the Sovereign SDK's custody model designed around that trust transfer moment specifically or is the Ed25519 sealing more about tamper evidence than auditability for absent parties??

Alex Shev • Jun 9

Good distinction. Loops are useful only when they are wrapped around a real outcome. Otherwise you get a system that keeps iterating without ever proving that the work became better.

Daniel Nwaneri • Jun 10

"Proving the work became better" is the exact gap most loop architects skip. They instrument for activity — tokens burned, turns completed, tool calls fired and call that progress. But activity metrics and improvement metrics aren't the same thing. A loop that runs 30 times and produces the same quality output as turn 1 looks productive on every dashboard that exists.

The proof function has to be defined before the loop starts or you have no way to distinguish iteration from spinning in place.

Alex Shev • Jun 10

Yes. A loop needs an exit criterion that is tied to quality, not motion. Otherwise the system can keep producing evidence that it ran, while never producing evidence that the artifact improved.

The best agent workflows I have seen define the proof first: test passed, diff got smaller, user friction dropped, cost stayed inside a budget, etc. Then the loop has something real to optimize against.

Daniel Nwaneri • Jun 10

"Proof first" is the frame the essay was circling without landing on directly. The spec-writer forcing function gets at it . you define done before you open the terminal but your examples make the principle operational in a way the essay didn't. Test passed and diff got smaller are binary. Cost stayed inside a budget is binary. User friction dropped is harder to instrument but still directional. All of them give the loop something real to optimize against rather than a vague directive it can satisfy by running indefinitely.

The failure mode you're describing — evidence of motion mistaken for evidence of improvement is also how most teams evaluate their agent deployments. Dashboard shows activity, invoice shows spend, nobody asks whether the artifact is actually better than it was on turn one. The proof function doesn't just constrain the loop. It's the only honest way to measure whether the loop was worth running at all.

Alex Shev • Jun 11

Yes. That dashboard/invoice point is the trap: the system can generate a perfect audit trail of activity while the artifact stays basically unchanged.

I like "proof first" because it forces the team to define the comparator before the loop starts. Not "did the agent work?" but "what observable property of the artifact got better?" Without that, the loop has every incentive to produce motion.

Daniel Nwaneri • Jun 11

"What observable property of the artifact got better" is the question that forces the proof function into existence before the loop starts. It's also the question most teams can't answer not because the answer doesn't exist but because nobody sat down to define the comparator before deploying. The loop fills that vacuum with motion because motion is what it can produce without a target.

The audit trail point is the sharp edge here. A perfect activity log is actually the worst outcome .it looks like accountability while hiding drift completely. The loop ran 30 times. Every turn logged. Every tool call recorded. The artifact is functionally identical to turn one. Nothing in the audit trail flags that as failure because nobody defined what improvement looks like.
That's why the spec has to come before the ledger. The ledger proves the loop stayed inside its boundaries. The spec defines what the boundaries are optimising toward. Without the spec the ledger is just an expensive diary.

chneg cheng • Jun 22

The scary part isn't that loops burn money (they do). It's that a well-tuned loop can run for weeks producing output that looks valuable but isn't, and you only catch it when someone asks "what did this actually change?"

chneg cheng • Jun 22

Great piece. You're right that the loop mechanic is the enabler, not the value.

I think the missing piece is what runs inside the loop. A cron job has a clear contract — do X at 2am or fail loudly. Most agent loops I see skip the contract step and jump straight to "prompt an agent and hope."

The teams I've seen succeed with agents don't optimize loops. They optimize contracts — defining exactly what input the agent expects, what output it must produce, and what failure looks like before the loop starts. The loop is just the repetition. The contract is where the predictability (and cost control) comes from.

Curious if you've seen the same — teams that nail the contract before the loop, or teams that jump into loops and burn tokens?

Daniel Nwaneri • Jun 23

"Optimize contracts not loops" is the framing I've been circling without landing on directly. The spec is the contract . what the agent expects as input, what it must produce as output, what failure looks like — written down before the loop starts. The loop is just repetition. The contract is where the predictability lives.

The teams that jump into loops and burn tokens almost always share the same root cause: the contract was implicit. They had a vague sense of what the agent should do, a prompt that gestured at it, and a loop that ran until something looked approximately right. "Approximately right" isn't a contract. It's just the loop grading its own homework.

To your question — yes, consistently. The teams that nail the contract first spend the most time on the boring part before they write a line of loop code. What does done look like in one sentence? That question alone separates the teams that ship from the teams that burn.

Mykola Kondratiuk • Jun 11

the loop is infra until it fails in front of a user. retry logic and latency are UX decisions the moment the agent touches the customer path.

Pizza Cat • Jun 18

Great piece. You're right that the loop mechanic is the enabler, not the value.
I think the missing piece is what runs inside the loop. A cron job has a clear contract — do X at 2am or fail loudly. Most agent loops I see skip the contract step and jump straight to "prompt an agent and hope."
The teams I've seen succeed with agents don't optimize loops. They optimize contracts — defining exactly what input the agent expects, what output it must produce, and what failure looks like before the loop starts. The loop is just the repetition. The contract is where the predictability (and cost control) comes from.
Curious if you've seen the same — teams that nail the contract before the loop, or teams that jump into loops and burn tokens?

Theo Valmis • Jun 13

The legibility you're mourning is also what bounded the cost, which is why the two halves of this post are one problem. A cron job can't run away with your budget because its work is fixed before it runs; you can read the ceiling off the one line. An agent loop's cost is unbounded by construction: the number of steps is decided at runtime by the same probability distribution doing the work, so nothing caps it in advance. The 30x is what it costs to let the loop decide its own length. No amount of tuning removes that, it's structural. The teams not going pale put legibility back at the boundary, a hard step budget, a cost circuit breaker, a fixed plan the agent fills in instead of invents. You can't ceiling what you can't read ahead of time, so making the loop legible again is the same move as making it affordable.

Daniel Nwaneri • Jun 15

"The loop deciding its own length" is the exact mechanism, and naming it that way exposes why tuning never works . you're not adjusting a parameter, you're trying to bound something that was designed to be unbounded. The circuit breaker isn't a tuning knob on the agent. It's a different actor entirely, sitting outside the probability distribution, enforcing a ceiling the distribution has no access to and no ability to negotiate.

"A fixed plan the agent fills in instead of invents" is the cleanest description of spec-first architecture I've seen. The spec doesn't make the agent dumber . it moves the planning decision to a point where a human can read it before any tokens are spent. Legibility restored exactly where the cost was unbounded.

HARD IN SOFT OUT • Jun 13

This piece captures something I've been feeling but couldn't name: agents turned the quiet cron failure into a loud, expensive, polite failure. The bill arrives before the reckoning. That line about "optimizing indefinitely without confronting whether the output matters" is going to haunt my next architecture review.

A couple of thoughts from reading:

The Spotify example is sharp, but I wonder if the real risk isn't agentic loops replacing product thinking — it's cheap validation replacing real feedback. An agent can A/B test 500 variants of a button color, pick the winner, and call it done. Nobody asked if the button should exist at all. The loop optimizes for clicks, not for "did this solve a user's problem."
The "automated nobody is still nobody" from your Bookmark Brain is devastating. That's the whole essay compressed into six words.

One practical suggestion: the difference between a cron job and an agent loop is who owns the failure signal. Cron fails → log → human sees. Agent fails → retries → falls back to another agent → eventually asks human after burning $40. A simple rule could save thousands: "if the same task fails twice, stop and ask, do not escalate automatically." That puts a governor on the loop's optimism.

Also, because the tweet about "designing loops" earned it:

A cron job and an agent walk into a bar.

The cron job says: "I run at 2 AM. If I fail, I log it and leave."

The agent says: "I retry, escalate, spawn subtasks, and send a weekly report."

The cron job asks: "What do you actually ship?"

The agent says: "I'm not sure. But I have a really good dashboard."

Anyway, this is the kind of reality check that should be pinned next to every "agentic everything" slide deck. Appreciate you writing it.

Daniel Nwaneri • Jun 15

"Cheap validation replacing real feedback" is the failure mode that doesn't even need a runaway loop to hurt you. The loop can run exactly as designed, hit its turn limit, produce a clean ledger and still have spent the whole budget answering a question nobody needed answered. 500 button colors A/B tested is activity with a winner declared. Nobody asked if the button should exist. The proof function has to include "is this the right question" or the loop optimizes perfectly toward irrelevance.

The 2-strikes rule is the kind of thing that sounds almost too simple until you realize most agent failures are exactly this: fail, retry with a twist, fail again, escalate to a different agent, fail differently, ask a human after $40. 2 strikes and stop isn't conservative. It's just refusing to let the loop's optimism compound past the point where a human could have caught it for free.

The bar joke is staying with me. "I'm not sure. But I have a really good dashboard" is the whole essay in nine words.

zxpmail • Jun 14

▎ Great piece. The line that hit hardest: "Automated nobody is still nobody."
▎
▎ I just published research measuring LLM sycophancy (~1.2M tokens across DeepSeek and Claude), and your article nails ▎ something I couldn't put into words: the loop without a human checkpoint isn't just expensive — it's epistemically
▎ broken.
▎
▎ We found that LLMs naturally cater to the user's stated position (GI = 0.21). When you put them in a loop optimizing
▎ autonomously, there's no one to challenge the assumptions. The model agrees with the last instruction, patterns
▎ match to "success story," and the loop compounds direction error before anyone sees it.
▎
▎ The Spotify point is the one I keep coming back to. Revenue up, headcount down —looks like efficiency. But what's
▎ the latency on product debt when nobody's asking "who asked for this?"
▎
▎ Your seo-agent setup (cron →agent →human judgment) maps exactly to what we built as a "Critique Gate" —a
▎ structured adversarial checkpoint that runs once, not iteratively, because iteration re-triggers sycophancy drift.
▎ One pass, human decides.
▎
▎ "The loop runs to me. Not into a void." That's the line worth bookmarking.

Daniel Nwaneri • Jun 15

"Epistemically broken" is the right word and it's a different problem than the one the essay focused on. Token burn is a cost problem — bad, but bounded by your bank balance. Sycophancy compounding through a loop is a correctness problem with no natural ceiling at all. The loop doesn't just spend more, it becomes more confidently wrong, and the confidence is generated by the same mechanism that's wrong.

The one-pass Critique Gate inverts the instinct completely. Most people's response to "the agent might be wrong" is "have another agent check it" but if checking is also subject to the same sycophancy toward the framing it's handed, repeated checking just launders the error through more agents. A single adversarial pass that isn't iterated avoids re-triggering the drift you're trying to catch.

That maps onto the review surface almost exactly — the human checkpoint is the one-pass critique gate. It runs once, after the loop, comparing output to the original spec. Not iteratively. Not as part of the loop. Outside it, asking the question the loop structurally cannot ask of itself.

zxpmail • Jun 28

Great observation on the one‑pass gate. In practice, we also found that iterative multi‑agent checks often converge to the initial framing’s bias, not truth. The real challenge is designing the spec that the human checkpoint uses — if that spec itself embeds assumptions, even a single pass can be gamed. Curious how you handle spec engineering in your workflow?

Pizza Cat • Jun 14

The cron job analogy hits. One thing I've noticed watching this space: the teams that get this right treat the loop as infrastructure, not value. The loop compresses cost and widens surface area — it doesn't replace the judgment call of what to build and for whom.
The scary part isn't that loops burn money (they do). It's that a well-tuned loop can run for weeks producing output that looks valuable but isn't, and you only catch it when someone asks "what did this actually change?"

Daniel Nwaneri • Jun 15

"Well-tuned" is the word that makes this land. A broken loop gets caught —something looks wrong, someone investigates. A well-tuned loop never throws that signal. It runs cleanly, produces output, hits no errors, for weeks and the only thing wrong with it is that nobody ever asked whether the output mattered. That's not a debugging problem. Nothing is broken. It's a question nobody scheduled to ask.

Manuel Bruña • Jun 15

The loop multiplier point is important. Once agents call agents, cost and risk become emergent properties of the loop, not the first prompt. I think every loop needs a budget, a stop condition, and a receipt trail before it deserves autonomy.

Mininglamp • Jun 10

The 30x cost multiplier is the elephant in the room. Every ReAct loop iteration burns tokens re-ingesting context that a well-designed state machine would skip entirely. The cron job comparison nails it, same pattern with more steps and less predictability. Companies shipping agentic products need to optimize for minimal loop iterations not maximum agent autonomy. Otherwise you end up with impressive demos and terrifying invoices.

Daniel Nwaneri • Jun 11

"Optimize for minimal loop iterations not maximum agent autonomy" is the reframe most teams need before they architect anything. The autonomy metric is seductive because it's visible . you can demo it, screenshot it, put it in a pitch deck. Minimal iterations isn't a feature you can show anyone. It only shows up on the invoice or rather doesn't show up, which is the point.

The ReAct re-ingestion cost is where the 30x actually lives for most teams. It's not that each individual call is expensive . it's that iteration N is paying for iterations 1 through N-1 just to understand the current state. A state machine externalises that context. The loop reads a row, not a transcript. Same information, fraction of the tokens.

The cron job comparison holds precisely because cron never pretended to be stateful between runs. It wakes up, reads what it needs from disk, does the work, writes the result, stops. Every agent loop should be embarrassed by how clean that contract is.

A. S. • Jun 10

👍️

leob • Jun 9

Reality check!