DEV Community

Cover image for The Head of Claude Code Stopped Prompting. That's Not a Tip. That's a Timeline.
Phil Rentier Digital
Phil Rentier Digital

Posted on • Originally published at rentierdigital.xyz

The Head of Claude Code Stopped Prompting. That's Not a Tip. That's a Timeline.

"You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."

That cracked me up when I saw it. Peter Steinberger dropped those 12 words on X on June 7, and 6.5 million people read them in 24 hours. And 5 days before that, Boris Cherny, the head of Claude Code, had said the exact same thing on stage at a WorkOS event. Nearly word for word.

It cracked me up 'cause I'd been doing this for months without a name for it.

TLDR: Cherny stopped prompting Claude. His job now is writing the systems that prompt Claude for him. If you've used /goal and walked away until it finished, you were already doing a version of this, without knowing what it's called or how far the gap between your version and full loop engineering actually goes.

I Called It "Figure It Out Mode"

The workflow was simple, maybe embarrassingly so. I'd set an objective in /goal, drop a CLAUDE.md with the project rules, give Claude Code the repo context, and leave. Come back 20 minutes later, sometimes 2 hours. Either there's a working feature or there's a mess that needs fixing. Both outcomes move the project forward.

I wasn't doing this out of any principled conviction. It just happened when I stopped watching the output stream and started treating Claude Code like a junior dev I could delegate to. Set the objective, give it the context, and leave.

No label for any of it.

Then Steinberger posted, and half my feed was nodding while the other half argued about whether this was actually new or just prompting with extra steps. And Cherny's clip from 5 days earlier started making the rounds. "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops."

Recognition, not discovery. I was already doing a version of this. It just got a name.

That naming matters more than it looks on first read. Without a term for a practice, you can't compare notes on it, you can't deliberately improve the pattern, and you can't tell if you're doing it well or badly. "Figure it out mode" worked fine as a personal shorthand. "Loop engineering" is something you can build a methodology around. The concept didn't change between June 2 and June 7. What changed is that now everyone in the same conversation is using the same word, and the people who weren't doing it yet now know what they're missing.

Which Rung Are You On?

TITLE "The 3 Rungs of AI-Assisted Development" + subtitle "From autocomplete to loop engineering". Metaphor: a staircase of 3 concrete platforms in a construction site, with a hard hat figure at each level doing different tasks. Style: engineer blueprint on aged paper, technical line art with hand-drawn quality, thick pen strokes, grid lines visible. Palette: steel blue #2563EB, concrete gray #9CA3AF, cream #FEF9E7, black #111111, amber #F59E0B. Content: platform 1 labeled "RUNG 1: AUTOCOMPLETE" shows figure typing at keyboard with agent tool in hand; platform 2 labeled "RUNG 2: PARALLEL PROMPTING" shows figure manually routing 5 agent boxes with arrows; platform 3 labeled "RUNG 3: LOOP ENGINEERING" shows empty platform with a spinning loop mechanism running alone, figure watching from the side. Highlight: RUNG 3 platform and loop mechanism in amber glow, outlined with double-weight lines. Legend: not applicable. Footer: © rentierdigital.xyz. NOT flat corporate vector, NOT minimalist tech startup aesthetic, NOT stock infographic style.


Three Levels of AI Development Automation

What Cherny described at WorkOS Acquired Unplugged on June 2 breaks down into 3 stages of evolution in how developers work with a coding agent.

Rung 1: you use Claude like autocomplete. Smarter than Copilot, but you're still writing code, reviewing every line, holding the tool. The agent assists. You direct every step.

Rung 2: you're prompting 5 or 10 Claudes in parallel. Handing off tasks, reviewing outputs, routing between them manually. You're still in the loop, just a busier traffic manager instead of a driver. A lot of people who think they're "advanced with AI" are here and assume they're at rung 3.

Rung 3: you're not in the loop at all. You built the system that runs the loop for you. Claude isn't waiting for your next message. It's executing against conditions, verification gates, and retry logic you defined once and that now runs without you. Your job shifted from "write the prompt" to "design what happens when the agent fails, succeeds, or hits something you didn't anticipate."

The difference between rung 2 and rung 3 isn't about skill at prompting. It's architectural. You don't get to rung 3 by prompting better. You get there by stopping prompting and encoding the logic into something that runs on its own. Think of it as the tower defense problem: stop defending every position manually and start placing structures that hold without you. Prompting is direct combat. Loop engineering is building your turrets before you leave the base.

The Gap Is Now Legible

The June 7 post wasn't a trend report. It was a measurement.

When the best practitioners in a field publicly announce a change in their own practice, the gap between people already doing it and everyone else flips from invisible to visible. That's what happened. The practice had been running for months. What changed is the scoreboard became public.

Karpathy's AutoResearch project is the clearest concrete proof on record. He's running 50 ML experiments overnight on a single GPU. The agent modifies the training code, runs it, reads the results, iterates, no human decisions in the loop. He coined "Loopy Era of AI" for exactly this, on a No Priors podcast episode that hit 875K views against a channel average of around 8,500. That's a 100x outlier on a research-level AI pod. The appetite for understanding this isn't theoretical anymore.

Cherny's own number is more direct: 100% of his personal code for the 30 days before December 2025 was written by routines he'd set up, not by him prompting Claude directly. And industry reporting from June 2026 puts Claude Code at close to 4% of all public commits on GitHub. 4% of the entire public GitHub graph is a massive footprint, and it's not happening through manual prompting session by session. That's loops running. At this point, running individual prompts to ship production code is the "it works on my machine" of agentic development.

The reason the timing matters more than the concept is the compounding logic, and this is the part most explainer threads skip entirely. A developer who prompts manually gets better at prompting, with faster iterations and more targeted results over time. It's linear improvement on a linear effort curve, and it's genuinely valuable. A developer who encodes loop logic is operating in a structurally different model. Each loop they design runs without them. Each improvement to that loop applies to every future run automatically. One trajectory improves the work they already do. The other builds a system that handles that category while they design the next loop. These 2 trajectories look nearly identical at the start. You can't tell them apart in week 1. The differential becomes visible over weeks, it compounds in the direction of the person who built the loop, and it's not recoverable by prompting faster. That's exactly what the June 7 moment made legible: the scoreboard flipped public, and you can now roughly tell which trajectory you're on just by looking at your last month of output.

The Loop You Didn't Name

Something I noticed when the Steinberger post started circulating: a lot of developers nodding in recognition had no idea they were already at rung 3 for some tasks.

/goal is already a closed loop. You define a stop condition. Claude iterates until it's met or hits a hard error. You're not making decisions between iterations. The feature shipped in Claude Code v2.1.139 in May 2026, and the developers who figured it out early, who set the goal, walked away, and came back to results were technically already doing loop engineering. I was running this before /goal even existed, just using long sessions with detailed context and hoping Claude stayed on task. Just hadn't named it.

The 3 things that separate "I used /goal and left" from a real production loop: a skill file that encodes the quality rules, a verification step that checks the output against those rules, and a review agent that sees the result fresh before anything ships. You might already have 1 of them without knowing the other 2 exist. A lot of developers have a CLAUDE.md. Not many have connected it to a verification layer. And fewer still have added the review agent, which is where the loop catches the things the build agent rationalized as acceptable.

The full anatomy, as Anthropic demonstrates in their verification video: a SKILL.md that encodes your project's non-negotiables, a browser verification step that checks the rendered output against those rules, and a second agent that reviews before anything gets merged. The CLAUDE.md you already wrote is the foundation, /goal runs against it, and the review agent gates the output before it ships. Connect those 3 and you have a loop that runs without you in the room.

For anyone who's already made the move from vibe coding to encoding project logic in prompts, the loop is the next layer of the same architecture. The instinct to make implicit project rules explicit and stop eyeballing the output after the fact. The loop just runs that logic in autonomous mode.

Side note that barely connects but I'm putting it here anyway. When I learned to code in the 90s, we had a shared Bull DPS 7000 at school. Old mainframe. 1 compilation slot at a time, first come first served. What I figured out was writing a dumb shell script that polled the compiler queue every 15 seconds and resubmitted my job the instant a slot opened. My code always got compiled. My classmates were refreshing manually. I admitted this to them much later. Sorry guys. Not that sorry, honestly.

The instinct to encode the retry rather than do it yourself by hand is 30 years old. The branding is new.

Your First Loop Doesn't Need a Fleet

Rung 3 doesn't require 100 agents and an orchestration layer. That's the version Karpathy runs for overnight ML experiments. Your first loop is simpler, and you can probably start it today.

The minimal production loop:

/goal "Implement the product filtering feature from the spec. Done when the test suite passes and there are no TypeScript errors."
Enter fullscreen mode Exit fullscreen mode

That's already rung 3 for that task. Claude runs until the condition is met or it hits an error it can't resolve. You're not in between. The key is the word "when" in the goal, because the stop condition has to be something the agent can verify automatically, not something you have to look at and judge afterward.

A loop without a verification layer is just automated guessing. The upgrade that makes it production-usable:

Step 2. Add a SKILL.md with your project's non-negotiables: your actual rules, for your actual project, written the way you'd brief a new dev on day 1. The conventions you enforce, the edge cases that always come back, the things you'd catch in code review 3 days later if nobody wrote them down. The more specific the rule, the more the loop behaves like someone who actually read your docs before starting.

Step 3. Add a browser verification step. Claude in Chrome or the Chrome DevTools MCP checks the rendered output against your quality criteria: layout shifts, Core Web Vitals, visual regressions. Things that don't show up in test suites but do show up in production. Anthropic's demo shows a layout shift caught automatically, outside the scope of the original task, because Core Web Vitals were already in the SKILL.md. That's the loop doing work you didn't explicitly ask for, because you encoded what "good" looks like in advance.

Step 4. Add a /code-review agent as a second pass. This agent sees the output fresh, without the history of how it got built. It catches the rationalized decisions the build agent slid past itself, which it will, 'cause the build agent has been staring at the same context for the whole run.

Start with steps 1 and 2 if you want to run something today. Add 3 and 4 when the base loop is stable.

I think the step that trips most people, and maybe I'm wrong on this but it tracks with every loop failure I've seen, is the stop condition. Specifically: setting one that can't be verified automatically. "Make the UI feel polished" is not a stop condition. It's a prayer. "No layout shift above 0.1 CLS" is a stop condition. Save point before the boss door, not after. Set the gate before the loop starts, or you're running the whole dungeon again. The gate has to be designed before you start, not checked when it's done.

A loop without a verification gate doesn't save time. It automates being wrong.

Before any of this works consistently, the scaffold underneath has to be solid. Vague spec, no test coverage, dependencies you inherited but don't really understand (the loop will run against those confidently and ship garbage). The 8-step Blueprint in Vibe Coding, For Real was built for exactly this: getting from broken demo to deployed app before you hand the iteration to an autonomous system. The loop needs something real to run against.

And when you're ready to extend the loop to external systems (trigger a deploy, run a service check, call an API), building that layer with CLIs rather than MCP connectors changes how debuggable and reliable that extension ends up being in production.

When the Scoreboard Went Public

I didn't know "figure it out mode" had a name. Didn't know it put me at rung 3 for some tasks. Didn't know the Bull DPS 7000 era and the Boris Cherny era were running the same instinct 30 years apart.

What the June 7 moment actually was: not the beginning of loop engineering for the people already doing it. The moment the gap between practitioners and everyone else became visible to both sides. Cherny had been running 100% of his code through routines since December 2025. Karpathy had been launching overnight experiments for months. The gap was already there. Steinberger's post just flipped the scoreboard public.

The people already doing it didn't learn anything new on June 7. The people who weren't doing it now know the clock is running.

The compound rate is real and it doesn't wait. Every loop you design runs without you. Every improvement to the loop applies to every future run. That's a structurally different trajectory from getting faster at prompting, and the gap between the 2 becomes measurable faster than most people expect.

At what rung are you right now, and is that the one you want to stay on?


Sources

  • Peter Steinberger (@steipete), X, June 7, 2026
  • Addy Osmani, "Loop Engineering," addyosmani.com, June 7, 2026
  • Andrej Karpathy, Skill Issue: Code Agents, AutoResearch, and the Loopy Era of AI, No Priors podcast
  • explainx.ai, "Loop Engineering: The Claude Code Guide," June 2026
  • datasciencedojo.com, "Agentic Loops: From ReAct to Loop Engineering," June 2026
  • Anthropic, How to get Claude Code to verify its own work, YouTube

This post may contain affiliate links. If you click them, I might earn a small commission (costs you nothing, and helps me keep shipping quality articles every day for your reading pleasure).

Top comments (0)