Meta Description: Loop Engineering is redefining how developers work with AI agents — instead of prompting them manually, you design autonomous loops that do it for you. Learn the 6 core primitives, architecture patterns, and risks every engineer must know.
Table of Contents
- Introduction — The Prompt Is Dead, Long Live the Loop
- What Is Loop Engineering? (And What It Isn't)
- The Six Pillars of a Production-Grade Loop
- The /goal Primitive — Self-Terminating Loops
- A Real-World Loop Walkthrough
- Tooling Landscape: Codex vs. Claude Code
- The Three Risks of Loop Engineering
- Best Practices for Loop Engineers
- The Future of Loop Engineering
- Conclusion — Build the Loop, Stay the Engineer
Introduction — The Prompt Is Dead, Long Live the Loop
"I don't prompt Claude anymore. I have loops running that prompt Claude and figure out what to do. My job is to write loops."
— Boris Cherny, Head of Claude Code at Anthropic
The person leading development of one of the most widely used AI coding tools on the planet no longer manually types prompts to his own tool. He architects systems that do the prompting for him. This isn't a quirky personal workflow — it's a directional signal pointing squarely at where software engineering is heading.
For nearly two years, developers engaged with AI coding agents the same way: type a prompt, read the response, refine, repeat. Productive — but fundamentally human-paced. The agent always waited for you. You were the bottleneck.
That model is giving way to something architecturally superior. Loop Engineering is the practice of replacing yourself as the human who triggers the AI agent, designing instead a self-running system that handles the triggering, evaluation, and iteration autonomously. It's the shift from holding a tool to building a machine.
In this deep dive, you'll learn what Loop Engineering is, how it differs from prompt and harness engineering, the six core primitives every production-grade loop requires, the real risks autonomous loops introduce, and what this discipline looks like as it matures across the industry.
What Is Loop Engineering? (And What It Isn't)
At its most precise, Loop Engineering is the discipline of designing autonomous, self-cycling systems that discover work, dispatch AI agents to execute it, verify the results, and iterate — without a human in the trigger path.
Addy Osmani defines a loop as "a recursive goal where you define a purpose and the AI iterates until complete." Peter Steinberger frames the shift even more sharply: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."
To understand what's genuinely new here, look at the evolutionary ladder:
Prompt Engineering was the first discipline. You crafted better inputs to get better outputs. The human was fully in the loop, turn by turn.
Agent Harness Engineering was the next step. You built scaffolding around the agent — system prompts, tool definitions, context policies, sandboxes, feedback hooks. A well-designed harness dramatically improved what any model could accomplish. Osmani's formulation holds: "A decent model with a great harness beats a great model with a bad harness." The human still initiated each session.
Loop Engineering is the current frontier. You're no longer optimizing individual sessions — you're designing systems that run sessions autonomously, on schedules, in parallel, with self-checking and external memory. The human defines the goal and architecture, then steps back.
What Loop Engineering is not: it isn't a cron job wrapping a shell script. A cron job is static. A loop is adaptive — it uses AI to determine what needs doing, uses AI to do it, and uses AI to verify whether it was done correctly. It's not just automation; it's intelligent automation with dynamic decision-making at every stage.
The Six Pillars of a Production-Grade Loop
Every robust Loop Engineering implementation rests on six foundational components. Remove any one of them and the structure becomes unreliable at scale.
Automations — The Heartbeat of the Loop
Automations transform a one-time agent run into an actual loop. Without them, you have a script. With them, you have a self-initiating system.
An automation is a scheduled task that fires on a cadence — hourly, nightly, on every PR merge, on every CI failure. It runs a defined prompt or calls a skill, collects findings, and routes them to the appropriate next step. The automation is the pulse of the loop — everything else is a response to what it surfaces.
In OpenAI's Codex app, automations live in a dedicated Automations tab. You configure the project, the prompt, the schedule, and the execution environment. Results that surface actionable findings go to a Triage inbox; runs that find nothing archive cleanly. OpenAI uses this internally for daily issue triage, CI failure summaries, and commit briefings.
In Claude Code, the same capability comes through /loop for cadenced re-runs, scheduled cron tasks, lifecycle hooks that fire shell commands at specific agent lifecycle moments, and GitHub Actions for automations that need to outlive your local machine session.
When an automation calls a SKILL.md rather than embedding a giant inline prompt, it stays maintainable as your project evolves. This is the difference between a loop you maintain and a loop that maintains itself.
Worktrees — Parallelism Without Chaos
The moment you run more than one agent concurrently, file collision becomes your primary failure mode. Two agents writing to the same file is identical to two engineers force-pushing to the same branch without coordination — guaranteed corruption.
The solution is git worktree: a separate working directory on its own branch that shares the same repository history. Each agent gets its own worktree. Their edits are physically isolated — they cannot touch each other's checkouts.
Codex builds worktree support directly into its threading model. Claude Code exposes it via the git worktree command, a --worktree flag, and an isolation: worktree setting you attach to sub-agent definitions so each helper gets a clean, auto-cleaning workspace.
Worktrees solve the mechanical collision problem. But review bandwidth is still the ceiling — you can run twenty parallel agents, but if you can meaningfully review only five PRs in a day, you've built a queue, not a solution. Worktrees must pair with strong verification agents to make parallel output trustworthy.
Skills — Compounding Project Knowledge
Every time an agent starts a session without a skill file, it starts cold. It doesn't know your naming conventions, your preferred testing patterns, the third-party library quirk you worked around six months ago, or why the legacy auth module is structured the way it is. It will guess — confidently and incorrectly.
A Skill is a structured knowledge artifact: a folder containing a SKILL.md with instructions and metadata, plus optional scripts, references, and example assets. Both Codex and Claude Code use this format. When an agent loads a skill, it loads the institutional knowledge your team has codified.
This directly addresses intent debt — the accumulated cost of an agent filling knowledge gaps with confident assumptions. Every wrong guess creates rework. Skills pay down that debt before it accumulates: write the convention once, and every loop run benefits from it.
Skills also compound. A skill encoding your authentication patterns written three months ago still applies today. Over time, a well-maintained skill library means your loops get smarter as your project grows — the inverse of what happens with pure in-context prompting. When you want to share a skill setup across repos or bundle skills for teammates, you package them as a plugin — one install, full setup.
Plugins & Connectors — Giving the Loop Real Hands
A loop that can only read and write files is a loop with its hands tied behind its back. Real-world software engineering happens across a constellation of connected tools — not just a filesystem.
Plugins and Connectors, built on the Model Context Protocol (MCP), are what give loops genuine reach. MCP is the integration layer that lets agents communicate with external services: issue trackers, CI systems, databases, Slack, GitHub, staging environments. Both Codex and Claude Code speak MCP natively, meaning a connector you write for one typically works in the other.
The practical distinction is profound. There's a loop that says "here's what needs to be fixed" — and there's a loop that opens the PR, links the Linear ticket, updates the status, and pings #engineering-alerts once CI is green, all without human intervention. The first loop is a report. The second is an autonomous engineering collaborator.
Once your loop has MCP connectors into your issue tracker and CI system, the morning automation isn't just reading your repo — it's reading your entire engineering context: open issues, failing builds, recent PRs, stale branches. That breadth is something no single human engineer maintains continuously.
Sub-agents — Separating the Maker from the Checker
This is the single most structurally important architectural decision in Loop Engineering. The agent that produces work must never be the agent that grades it.
The reasoning is fundamental: any system asked to evaluate its own output has a strong bias toward declaring success. The model that wrote the code reasoned itself into that implementation. A second agent, given the same codebase and a verification-oriented prompt, will catch what the first agent confidently overlooked.
The typical sub-agent architecture follows a three-role pattern:
- Explorer agent — reads the codebase, understands the problem space, identifies scope
- Implementer agent — writes the code or makes the change
- Verifier agent — checks the implementation against the spec, tests, conventions, and acceptance criteria
In Codex, sub-agents are defined as TOML files in .codex/agents/, each with a name, description, instructions, model, and reasoning effort. Your security reviewer can be a high-capability model at maximum reasoning depth; your file explorer can be a fast, lightweight read-only agent. Claude Code uses the same pattern with definitions in .claude/agents/ and agent teams that pass work between members.
Sub-agents burn more tokens — each runs its own model and tool stack. But the calculus is clear: in a loop running unattended overnight, the cost of a verification agent catching a broken deployment is trivially less than the cost of that deployment reaching production.
Persistent State — The Spine of the Loop
This component surprises engineers most, because it sounds almost embarrassingly simple: a markdown file.
The model has no memory between runs. Every time your automation fires, the new agent has zero knowledge of what previous runs did, what was attempted, what succeeded, and what remains open. Without external state, your loop re-derives the same findings every morning and attempts to fix the same issues it fixed yesterday — very busy, accomplishing very little.
The fix is a state file that lives outside the model's context window: in the repository itself, in a Linear board, in a database — anywhere that persists between sessions. This file is the loop's working memory: what was found, what was tried, what passed verification, what is still open, what is blocked and why.
Osmani's formulation is worth memorizing: "The agent forgets, the repo doesn't."
In practice, this looks like a PROGRESS.md or AGENTS.md committed into the repo. The triage automation writes findings to it. Sub-agents update it as they complete or fail tasks. The next automation run reads it before deciding what to work on. The state file is what transforms a collection of individual agent runs into a coherent, continuous engineering process.
The /goal Primitive — Self-Terminating Loops
Understanding the six pillars tells you how to build a loop. Understanding /goal tells you how to build a loop that knows when it's done.
Both Codex and Claude Code expose a /goal command that takes a verifiable completion condition — something like "all tests in test/auth pass and lint produces zero warnings" — and runs the loop until that condition is provably true. After every iteration, a separate, smaller model evaluates whether the stopping condition has been met. If yes, the loop halts cleanly. If not, it continues.
This is the maker/checker split applied to the exit condition itself. The agent doing the work never declares itself done. An independent evaluator holds that responsibility, meaning you can walk away from a running loop with genuine confidence that when it stops, it stopped for the right reason.
The implication for engineering practice is significant. Before /goal, even a well-designed loop required human monitoring to decide when sufficient progress had been made. With /goal, your role shifts from babysitter to architect — define the acceptance criteria with precision, then trust the system to execute and self-terminate against them.
This also sharpens discipline around acceptance criteria. A vague /goal ("make the auth module better") produces a loop that never terminates or self-certifies arbitrarily. A precise /goal ("all 47 tests in test/auth/ pass, TypeScript compilation succeeds with zero errors, eslint reports no violations") produces a loop that can be trusted to run unattended. The quality of your stopping condition directly determines the quality of your output.
A Real-World Loop Walkthrough
Let's make this concrete. Here's the anatomy of a morning engineering triage loop — the kind that runs while you're having coffee and has a prioritized work queue ready before you open your laptop.
Step 1 — Automation fires at 8:00 AM. A scheduled automation triggers, spawning an agent in an isolated environment.
Step 2 — Triage skill executes. The agent loads a $triage skill that reads: yesterday's CI failures from the CI connector, open bug-tagged issues from the Linear connector, and commits from the last 24 hours via the GitHub connector. All triage logic lives in the skill file — version-controlled, maintainable, never repeated inline.
Step 3 — Findings written to state file. The agent synthesizes findings and appends a structured summary to TRIAGE.md: severity, affected module, relevant commit SHA, and a suggested approach for each item.
Step 4 — Worktrees opened per finding. For each finding that meets the "auto-attempt" threshold (e.g., failing unit tests with a clear root cause), the loop opens a git worktree on its own isolated branch.
Step 5 — Maker sub-agent drafts fixes. An implementer agent runs inside each worktree, reads the triage finding, loads relevant skill files, drafts a fix, and updates the state file with its approach.
Step 6 — Verifier sub-agent reviews. A separate verifier agent — potentially using a stronger model — reviews the implementer's diff: does it pass the failing tests? Introduce lint violations? Violate patterns encoded in skill files?
Step 7 — Connectors act on results. For fixes that pass verification: the connector opens a PR, links it to the relevant Linear ticket, marks it "In Review", and posts a summary to #engineering-alerts on Slack. The loop has done everything except click "Merge."
Step 8 — Residuals land in the human inbox. Findings that didn't meet the auto-attempt threshold, or fixes that failed verification, land in the Triage inbox. The state file records why each item was escalated.
Consider what you — the engineer — actually did to make all of this happen. You designed the loop architecture once. You wrote the skill files once. You configured the connectors once. You set the schedule once. You did not prompt a single one of those steps. That's Steinberger's point made concrete.
Tooling Landscape: Codex vs. Claude Code
One of the most striking aspects of the current landscape is how convergent the two leading tools have become. OpenAI's Codex app and Anthropic's Claude Code have independently arrived at the same six-primitive architecture.
| Primitive | Role in Loop | Codex | Claude Code |
|---|---|---|---|
| Automations | Discovery + scheduled triage | Automations tab, Triage inbox, /goal
|
/loop, cron tasks, lifecycle hooks, GitHub Actions |
| Worktrees | Parallel agent isolation | Built-in per-thread worktrees |
git worktree, --worktree flag, isolation: worktree
|
| Skills | Codified project knowledge |
SKILL.md, invoked with $name or implicitly |
SKILL.md via Agent Skills |
| Plugins/Connectors | Real-world tool integration | MCP connectors + plugin distribution | MCP servers + plugins |
| Sub-agents | Maker/checker separation | TOML in .codex/agents/
|
YAML in .claude/agents/, agent teams |
| State | Cross-session memory | Markdown or Linear connector |
AGENTS.md, PROGRESS.md, or Linear via MCP |
The convergence matters for a practical reason: loops built around these six primitives are largely tool-agnostic. If your loop logic lives in SKILL.md files, your state in a markdown file, and your integrations in MCP connectors — your architecture survives a tool migration. You're not betting on a vendor. You're betting on the pattern.
For engineers choosing a starting point: if you're already in the OpenAI ecosystem, Codex's Automations tab provides the lowest-friction entry. If you prefer a terminal-native CLI workflow with deeper hook integration, Claude Code offers more flexibility. Either way, the skills and connectors you build will transfer.
The Three Risks of Loop Engineering
Loop Engineering is genuinely powerful. It's also genuinely risky in ways that don't apply to manual agent prompting. Understanding these risks is the precondition for designing loops that remain trustworthy as they scale.
Verification Debt
A loop running unattended is also a loop making mistakes unattended. The mechanical productivity of a well-designed loop is high enough that subtle errors can accumulate faster than you'd notice in a manual workflow. A loop that opens twenty PRs a day but gets the semantics wrong in 15% of them isn't ten times more productive — it's a source of compounding technical debt with a PR-shaped wrapper around it.
The mitigation is the verifier sub-agent, paired with precise /goal conditions. But even then: "done" is a claim, not a proof. The only complete defense against verification debt is a human engineer who reads the diffs, understands what changed, and takes ownership before it merges. The loop does the work. The review is still yours.
Comprehension Debt
The faster your loop ships code you didn't write, the larger the gap between what exists in your codebase and what you actually understand. Osmani calls this comprehension debt, and it compounds with loop efficiency: a loop that ships ten features a week can build ten features of incomprehensible code a week if you let it.
This is subtler than verification debt because it doesn't manifest immediately. It surfaces when you need to debug something the loop wrote three months ago, when a new engineer asks you to explain a module, when you need to refactor a system you've never internalized. The loop didn't cause the problem — the choice not to read what it produced did.
The mitigation is deliberate: treat loop-generated diffs with the same attention you'd give a senior engineer's PR. Not rubber-stamping — actually understanding what changed and why.
Cognitive Surrender
This is the most dangerous and most invisible risk. When a loop runs reliably and produces good output, there is a powerful psychological pull toward simply accepting whatever it gives you. You stop forming opinions about implementation choices. You stop questioning architectural decisions. You stop being the engineer who understands the system, and start being the engineer who presses go.
Osmani is direct: "Designing the loop is the cure when you do it with judgment and the accelerant when you do it to avoid thinking — same action, opposite result."
Two engineers can build identical loops and get opposite outcomes. One uses the loop to move faster on work they understand deeply. The other uses it to avoid understanding the work at all. The loop can't tell the difference. Only you can.
Best Practices for Loop Engineers
Start with one repeatable task. Before building a full six-primitive loop, automate a single, well-understood, high-frequency task — nightly test failure summaries, weekly dependency audits, automated PR descriptions. Master the automation + skill pattern before adding sub-agents and connectors.
Always split maker and checker. No matter how capable your implementer agent is, give it a verifier. The cost of a second agent is almost always worth the reliability it adds — especially in a loop running without supervision.
Version-control your SKILL.md files like production code. Skills encode your team's conventions, architecture decisions, and hard-won lessons. Treat them with the same rigor as your application code: PR reviews, changelogs, and regular audits for accuracy.
Write explicit /goal conditions. Vague stopping criteria produce vague results. Before any loop runs unattended, write its acceptance criteria as a machine-checkable condition. If you can't articulate it precisely, the loop isn't ready for autonomy.
Read the diffs before merging. Non-negotiable. The loop earns trust by producing good output; you verify that trust by actually reading what it produced. Loop output that goes unreviewed into main is a liability regardless of how good its internal verification is.
Monitor token costs proactively. Loop Engineering usage patterns can vary dramatically. A loop spawning five sub-agents, each making multiple tool calls, can produce a surprising first billing cycle. Instrument your loops with cost logging and set budget thresholds early.
Design for failure. Every loop needs a clear answer to: what happens when an agent fails partway through? Ensure your state file captures partial progress, worktrees clean up after failed runs, and your Triage inbox captures anything the loop couldn't resolve. A loop that fails silently is worse than one that fails loudly.
The Future of Loop Engineering
We are early. The primitives described in this post are roughly six to twelve months old as mainstream engineering patterns. Several trajectories are worth tracking.
Loops that spawn loops. The current pattern has a human designing the top-level loop. The emerging pattern is loops that dynamically spawn child loops when they encounter problem domains they weren't originally designed for. A triage loop encountering a performance regression might spin up a dedicated profiling loop with its own skill set. This meta-loop pattern increases autonomy but demands far more sophisticated verification infrastructure.
Standardization of MCP and skill formats. The independent convergence of Codex and Claude Code on SKILL.md and MCP is significant. As more tools adopt these formats, a reusable ecosystem of skills and connectors will grow — much as npm grew for JavaScript packages. Engineers will install pre-built skills for common frameworks rather than authoring everything from scratch.
The shift in engineering leverage. The long-term trajectory is clear: the leverage point is moving from writing code to orchestrating systems that write code. The most valuable engineering capability of the next decade won't be fluency in a particular language or framework — it will be the ability to design trustworthy, verifiable, and maintainable autonomous engineering systems.
Intentional human checkpoints as a design pattern. Paradoxically, as loops become more capable, the deliberate placement of human review checkpoints will become more sophisticated. Rather than reviewing everything (impossible at scale) or nothing (reckless), the best loop designs will include smart escalation logic — surfacing precisely the decisions that warrant human judgment while routing everything else to automated verification.
Conclusion — Build the Loop, Stay the Engineer
We began with Boris Cherny's observation that his job is now to write loops, not prompts. Having walked through the full architecture of Loop Engineering, the depth of that statement should be clear. Writing loops isn't easier than writing prompts — it requires understanding automations, worktrees, skills, connectors, sub-agents, state management, verification design, and failure modes. It requires engineering judgment about which tasks deserve autonomous loops and which deserve your direct attention.
What Loop Engineering offers isn't a shortcut. It's a leverage point shift. The work you do once — designing a loop, writing a skill, wiring a connector — compounds indefinitely. Every morning your triage loop fires, it returns value on that design investment. Every time a verifier sub-agent catches a bug before production, it validates the architectural choice you made when you split the maker from the checker.
But Osmani's caution bears repeating: "Two people can build the exact same loop and get completely opposite results. One uses it to move faster on work they understand deeply. The other uses it to avoid understanding the work at all. The loop doesn't know the difference. You do."
Loop Engineering is not an invitation to step back from engineering. It's an invitation to step up — to think at a higher level of abstraction, to encode your expertise into systems that scale it, and to remain the person who understands, owns, and is accountable for what gets built.
Build the loop. But build it like someone who intends to stay the engineer — not just the person who presses go.
Start today: pick one repetitive task in your workflow — a nightly test failure summary, a PR description generator, a CI failure reporter — write a SKILL.md for it, and wrap it in a scheduled automation. That's loop one. The architecture from there is just more of the same, applied with increasing confidence.
References: Addy Osmani — Loop Engineering · Addy Osmani — Agent Harness Engineering · Peter Steinberger on X · Boris Cherny via Rohan Paul on X





Top comments (0)