DEV Community

Thomas Landgraf
Thomas Landgraf

Posted on

Claude Code Workflows: The Plan Moves Out of Claude's Head and Into a Script You Can Edit

Subagents, skills, and agent teams all keep the plan in one place: Claude's head. Claude decides, turn by turn, what to spawn next, and every result has to fit back into a single context window before the next decision. That works until the work is too big to hold at once.

A Claude Code workflow moves the plan somewhere else — into code. Claude writes a small JavaScript script that holds the loop, the branching, and the intermediate results, and a runtime executes that script in the background across dozens of agents while your interactive session stays free. The orchestration is no longer something Claude has to remember between turns. It's a file on disk.

I made an 8-minute video that teaches the primitive twice — once on something trivial, once on a real feature build that ran for two hours, hands-off — and this article is the written companion. (Workflow is a research-preview tool, Claude Code v2.1.154+; the official docs are here.)

The mental model: Claude doesn't do the task — it writes a script that does

Here's the move that took me a minute to internalize. When you ask for a workflow, Claude doesn't perform the work in the conversation. It writes a script, and a runtime runs that script later, in the background.

To learn the shape, I asked — in plain English — for "a reusable workflow that lists every README.md in the project and grammar-fixes each one." Claude wrote a file. The skeleton of every workflow is a meta block plus a body that fans out work:

export const meta = {
  name: 'readme-grammar-fix',
  description: 'Grammar-fix every README.md in the repo',
  phases: [{ title: 'Fix' }],
};

phase('Fix');
await parallel(
  readmeFiles.map((file) => () =>
    agent(`Fix the grammar in ${file}. Preserve meaning, links, and formatting.`)
  )
);
Enter fullscreen mode Exit fullscreen mode

parallel(...) runs one agent() per file concurrently. I ran it, and the /workflows panel showed the run live: 6 agents, 37 seconds, 5 READMEs corrected. That's the entire mental model in one screen — one agent per unit of work, the script holds the fan-out. Everything else is the same idea at larger scale.

The real test: point it at 20 approved specs

The trivial example exists to set up the real one. In my test project I have an approved spec tree — a "Petstore Social Networking" feature with 20 specs sitting in approved status, waiting to be built.

First, the video shows the old way I'd have done this: generate a static implementation checklist (a Custom Implementation Plan (0/20)) and hand it to a coding agent to tick off, one item at a time. It works. But it's a to-do list an agent walks linearly — no fan-out, no structured handoff between phases, and nothing that advances the spec's own lifecycle as it goes.

So instead I vibe-coded a workflow. I described what I wanted in plain English — "find all approved specs, understand them, build dependency-aware work-packages, implement them in parallel, and advance each spec's status from approved to in-development to under-test" — and Claude authored a ~286-line script with four phases:

export const meta = {
  name: 'speclan-implement-approved',
  description: 'Implement every approved spec and advance its lifecycle status',
  phases: [
    { title: 'Discover' },     // find every approved spec
    { title: 'Understand' },   // read each spec + its dependencies
    { title: 'Plan' },         // group into dependency-aware work-packages
    { title: 'Build' },        // implement in parallel waves
  ],
};

phase('Discover');
const discovery = await agent('Find every spec in `approved` status.', {
  schema: DISCOVERY_SCHEMA,
});

phase('Understand');
const understood = await parallel(
  discovery.specs.map((s) => () =>
    agent(`Read spec ${s.id} and its dependencies.`, { schema: UNDERSTANDING_SCHEMA })
  )
);
// Plan → dependency-aware work-packages, then Build them in "maximum safe" parallel waves
Enter fullscreen mode Exit fullscreen mode

Two details in there are the whole reason this scales, and they're the parts worth stealing even if you never touch specs.

1. Structured output is a control surface

Each agent in the Discover and Understand phases is bound to a JSON schema (DISCOVERY_SCHEMA, UNDERSTANDING_SCHEMA). With a schema, the agent has to return the shape you asked for — not prose, not a best-effort summary, but validated JSON. That's what makes a multi-agent run controllable instead of free-form: the Plan phase can rely on the exact structure the Discover phase produced, because the runtime enforced it. Free-text handoffs between agents are where multi-step runs quietly fall apart; schemas are the fix.

2. The wave model

The Build phase doesn't blast all 20 specs at once. It groups them into dependency-aware work-packages and runs each wave at "maximum safe" parallelism — a groundwork package first (shared modules, the DB migration), then the independent feature specs concurrently on top of it. The script encodes the ordering, so you're not babysitting which spec can start when.

It declined to auto-run — and that was the right call

When the script was ready, Claude refused to run it automatically. It was about to write code across 20 specs, so it made me launch it explicitly with the command. I like that instinct a lot: a workflow that's about to touch your whole codebase should be a deliberate keystroke, not a side effect of a chat.

I launched it. It ran for about two hours, hands-off (sped up on screen in the video). Partway through I dropped into Source Control to check it wasn't faking — and there was a real 26-file changeset: new lib/social-* modules, a 0028_social_groundwork database migration, schema wiring, and groundwork tests. Not a demo stub. Actual, reviewable work.

And on the left, in the spec tree, the status icons changed colour as each spec was builtapproved (blue) → in-development (yellow) → under-test (purple). That colour flip is a status update the workflow itself performs: each Build agent advances the lifecycle of the spec it just implemented. You can glance at the tree and see exactly how far a two-hour run has gotten.

A workflow is just a script — so a bug is a normal edit

The run had a bug. It flipped every requirement's status correctly, but it left the three parent features sitting at approved — it had advanced the leaves but not rolled the status up to the branches.

Because a workflow is just a script, fixing it was a normal edit — and I made it by asking. I told Claude "you updated the requirements but not the parent features, please fix," and it diagnosed the rule (a feature is only "done" once all its children have advanced) and patched the script with a deepest-first parent-status rollup. Next run, the parents flip automatically. This time, I just nudged the three features to under-test by hand in the UI while Claude fixed the recipe — and when it re-checked, it found them "already under-test." Human and agent converging on the same state from two directions.

That's the part I want to land: you own the script. It's readable code you can patch by talking to it, not a YAML recipe you downloaded and pray over. Control over one workflow you understand beats a pile of git workflows you don't.

Where SPECLAN fits (creator disclosure)

Full disclosure: I'm the creator of SPECLAN, the free VS Code extension that produced the spec tree and the lifecycle-status mechanics in this video. SPECLAN keeps requirements as reviewable Markdown + YAML with stable IDs and an approved → in-development → under-test lifecycle.

I'm disclosing it because of why approved specs turn out to be such a good workflow input — and it generalizes past my tool. A workflow fans out best over a clean, enumerable unit of work. An approved spec is exactly that: stable ID, a clear boundary, and a status the agent can advance. So "one agent per spec" falls out of the data the same way "one agent per README" did. If your unit of work is tickets, test cases, or migration sites, the mechanics above are pure Claude Code and apply directly — the schema and the wave model don't care what the items are.

The payoff

Customer approves a spec tree → a working, review-ready feature in under two hours, hands-off. At the end of the video I open the app on localhost and there it is: a "Your social world" hub — Communities ("find your pack"), Memorials ("in loving memory"), Milestones ("Biscuit's 1st Adoption Anniversary") — built from the approved specs, over a lunch break.

The spec stays the durable, reviewable artifact. The workflow is how you build it at fleet scale. And because the workflow is a script you wrote by describing it, when it's wrong you fix it the same way you wrote it — by asking.


If you've built a workflow that fans out over your own unit of work, I'd genuinely like to know what schema you bound the agents to. That's the part I'm still tuning.

Top comments (0)