Most of Your AI Stack Is a Year Old

#ai #tooling #agents #opinion

Every few days another "here's my May 2026 AI coding workflow" post lands on my feed. They're usually polished. They're usually proud of a two- or three-tool stack. And they almost always paper over how much of the showcased workflow has been available for a year, in one tool, on a cheaper subscription.

This isn't an attack on the people writing those posts. I've done versions of it myself. The workflow-as-content-format is useful: it surfaces patterns, lets people copy what's working. The problem is that the way these posts are usually framed (here is the May 2026 stack you should be running) flatters the stack at the expense of the workflow underneath. That has the order wrong.

Workflow engineering is the discipline of building the loop. The tools are the cheapest part of the loop. If you treat tool-stack composition as workflow design, you end up paying twice for one workflow and calling that progress.

The mirroring tax

The clearest tell that a workflow has too many tools in it is the appearance of a third file whose only job is to keep the first two in sync.

The pattern goes like this. You start with Claude Code. You add Cursor for the visual bits. Both tools have a project-level instructions file (CLAUDE.md for one, a Cursor rules file for the other) and they drift apart, because of course they do. The fix you reach for is a shared conventions file at docs/conventions.md or similar, with CLAUDE.md and the Cursor rule both referencing it. One source of truth, two readers.

That sounds clean. It is not clean. It is a mirroring tax. The third file exists because you chose two tools. The tax shows up in every adjacent system. You mirror MCP servers across both tools, you mirror skills, you mirror plan-mode-vs-execute-mode habits. The cost is not the $20 a month for the second subscription. The cost is the cognitive overhead of maintaining parity between two stacks that will keep trying to drift.

The best evidence that this is a real problem and not my aesthetic preference is that the mirroring tax has spawned its own CLI ecosystem. rulesync unifies rule management across Claude Code, Gemini CLI, and Cursor. rule-porter converts Cursor rules into CLAUDE.md, AGENTS.md, and Copilot instructions. cursor2claude is exactly what it sounds like. There are write-ups on doing it with symlinks Unix-style, with a unified .ai/ folder, or with explicit reference syntax. The industry is now converging on AGENTS.md as a unifying standard, which is itself an admission that maintaining N copies of the same conventions across N tools is a cost worth eliminating.

When the workaround for your stack choice is a CLI ecosystem of sync tools, the stack choice is the problem.

If you started from "what's the smallest stack that gets the job done," you would not invent the mirroring file. You would pick one tool and add a diff viewer for the rare case you wanted to watch a change land in an editor. That is a cleaner workflow. It is also less photogenic, which is partly why nobody writes the post.

What's actually new in May 2026 (spoiler: not much)

Three changes now matter for this argument. None of them are step changes. All of them get sold as if they were.

/goal is a Ralph loop with a verifier. Geoffrey Huntley named the pattern in 2025: a dumb persistent loop that feeds the model its own output until a completion condition holds, named after Ralph Wiggum because the philosophy is ignorant, persistent, optimistic. People have been running Ralph loops in bash for most of the past year. Anthropic ships a first-party ralph-wiggum plugin in the official anthropics/claude-code repo. They are not pretending the lineage isn't there.

What /goal adds is a separate small model (Haiku by default) as the verifier. The agent doing the work doesn't get to decide it's done. That separation is a real design move, and it's worth crediting. But the loop shape is Ralph, the "completion condition until met" philosophy is Ralph, and the part of /goal that gets the breathless framing in workflow posts (let an agent run for hours or days without supervision) is the part that was already possible. The newness is incremental on a year-old pattern. Treating it as the autonomy unlock is a category mistake. And a dangerous one at that. I wrote about that specific failure mode in Remote Slop with Claude Code.

There's also a regression worth noting. The /goal evaluator doesn't run commands or read files independently. The proof of completion has to live in the conversation. A hand-rolled Ralph can grep test output directly, check git status, count files in a queue. So in some real ways /goal is less capable than a mature bash loop, just much cleaner to use. Fair trade for most cases. Just not a different category of system.

Claude Design ate the design-mode justification. This one is recent enough that I have some sympathy for posts that don't engage with it. Claude Design launched on April 17, 2026. It's bundled with Pro, Max, Team, and Enterprise, which is the same subscription you're already paying for if you use Claude Code seriously. I wrote about my own experience with it in Vibe Designing. It is not a drop-in replacement for Cursor's in-editor design mode, and I want to be careful here. They're not the same product. But the case for paying for Cursor specifically for design work got a lot weaker in April, and the May 2026 workflow posts that still lean on "Cursor for design mode" as a load-bearing justification haven't caught up.

The pattern that actually works, and that I've been running on Shelfcritter, is to build the design system in Claude Design first and then instruct Claude Code to build on top of it. You drop in screenshots of the parts that bug you, ask Claude Design for a design system and a redesigned flow, iterate until the clickable prototype is something you'd actually want to use, and then hand Claude Code the prompt-with-link that Claude Design generates. Claude Code reads the design tokens and the component hierarchy directly. No copy-pasting CSS variables. No reverse-engineering spacing from screenshots. The handoff is the whole point: the design system as the contract between the design tool and the coding agent.

The same shape works with other stacks. Figma piped into Codex. Design tokens fed straight to Cursor's agent. A screenshot and a prompt into whatever model you trust. The point is the handoff, not the brand. A second editor is not load-bearing in any of them.

The Playwright skill closes the rest of the visual gap. Add a Playwright skill to Claude Code and the agent can launch the running app, screenshot it, click through flows, and iterate against what it sees. That covers most of the "I want to see the diff land" case that the GUI was supposed to handle. It's not as nice as watching a real editor render a change, but it's sufficient for the workflow most developers run.

What's left after all that? Tab predictions. The inline editor feel. Parallel cloud agents if you genuinely use them. Real things, narrower than the workflow posts imply. If you're a developer who uses tab-prediction loops every day (I'm not), keep paying for the tool that gives you the best one. If your case for the second subscription is "design mode and long-running agents," check whether the case still holds in May 2026. It probably doesn't.

The sunk cost rationalization

Here is what I think is actually going on in most of these workflow posts.

The author built a Claude Code workflow, then added a second tool when the second tool was the obvious move for the visual part of the job. They invested in skills, conventions, MCP setups across both. The world shifted underneath them: Claude Code got /goal, Claude Design got bundled, Playwright skills got good enough. The justifications that made the second tool obvious got softer, one by one.

But the muscle memory is still there. The conventions file is still there. The parallel cloud agents are still firing off before meetings. So the workflow stays the same shape, and what gets written up is this is how I work in May 2026, when an honest version would be this is how I work, and most of the load-bearing reasons for the shape are six to twelve months old.

That's a fine way to work. People are allowed to keep workflows they're comfortable with even when the underlying argument for them weakens. It's a strange thing to recommend to other people as the May 2026 stack, when somebody starting fresh today would pick a smaller one. The fact that I run the Anthropic stack when their limits had half the internet writing breakup posts is just the practice-what-you-preach bit. I use Claude Code because I'm used to it and it solves all of the issues I encounter. You might opt for Codex. Or Kimi. Or no AI at all. There is no silver bullet.

The New Stack ran a piece earlier this year titled "Cursor, Claude Code, and Codex are merging into one AI coding stack nobody planned". The headline does most of the work. Convergence is happening, accidentally, and the developer is paying for the lack of planning. The piece itself lands on composability as the answer, and I half-agree: it's nice to have the option to compose tools when you need it. But the more durable move is to learn the concepts underneath the tools and switch when the industry actually requires it. Once you've internalised the patterns, transitioning between tools is cheap. Composing more of them rarely is.

We've been here before

In After the Panic, I told juniors not to sprint after every new agent framework, because I've lived through JavaScript framework fatigue and I know how this ends: the industry always converges on a handful of choices that actually work, and the rest quietly disappear into blog post graveyards.

The shape of AI tool fatigue in 2026 is the same shape as JS framework fatigue was in 2016, with one difference that matters. AI tool fatigue is daily; JS framework fatigue was monthly. The rate at which new tools land, demand evaluation, and get written up as the new best workflow is significantly higher than anything web developers were dealing with a decade ago. Most of these tools, like most of the JS frameworks that briefly mattered in 2014–2018, will be merged, deprecated, or quietly absorbed by the survivors within eighteen months.

The cost of running a two-tool workflow today is not just the mirroring tax. It's also the bet that both tools will still be around, in their current form, when you've finished investing in the parity overhead. That bet looks worse the more closely you look at the history.

The actual May 2026 minimal stack

For most working developers I'd put the floor at this:

One capable agent (Claude Code, Codex, Kimi, whichever clicks for you) with a small set of skills you've earned by repeating yourself. The Stop Repeating Yourself rule still holds: anything you've copy-pasted twice belongs in a skill, or whatever your agent's equivalent primitive is called.

A way to let the agent see what it builds: a Playwright skill in my case, or the equivalent in your stack. An MCP setup kept tight: four to six servers, rotated when projects change, not twenty mirrored across two clients. A design tool that hands off cleanly to the agent, for when you need actual prototyping or visual artifacts. An issue tracker if you want parallel agents working through scoped work. I use Beads for this, but anything that produces small, well-scoped units the agent can chew through one at a time will do.

One subscription per layer. One conventions file. One stack.

For me that maps to Claude Code, Playwright, Claude Design, and Beads. The shape is the point; the brand names are negotiable.

If your work really needs a second editor on top of that (a design mode Claude Design can't carry, parallel cloud agents you actually use, a tab-prediction loop you can't live without), then add it. That's a workflow decision, and a defensible one. It is not the default.

This isn't only my preference. BuildMVPFast's piece on AI tool fatigue reports that the developers who seem least stressed about AI tools "picked their stack and stopped looking," and that the most common answer is "I use Claude and that's it." The least-fatigued developers are already running the smaller stack. The workflow posts that read like trophy cases are coming from the other group.

The part that compounds

The leverage in 2026 is not in adding tools. It is in keeping the stack small enough that you can still see what's happening inside the loop. Every additional tool is a piece of the loop you no longer fully control, and the workflow posts that read like trophy cases tend to be the ones where the author has lost the most of that control without noticing.

The thing that compounds is the skill library, the conventions you've codified, the Ralph-shaped loops you've learned to write good completion conditions for, and the review discipline you bring to the agent's output. None of that is tied to a specific editor. All of it survives the next tool change. That is what survived JavaScript framework fatigue too. The people who came out of the 2014–2018 cycle in good shape weren't the ones who learned every framework. They were the ones who learned the things underneath that every framework had to honor. Boring things, mostly.

The editor matters less every quarter. The workflow underneath is what's left when you turn the trophy case off.

I'd rather write a boring workflow post than a loud one.

Reading for later

A few pieces I want to come back to for this argument, beyond what's already linked above:

Cursor, Claude Code, and Codex are merging into one AI coding stack nobody planned, The New Stack. The accidental-convergence framing.
AI Tool Fatigue 2026, BuildMVPFast. Source of the "I use Claude and that's it" coping pattern.
everything is a ralph loop, Geoffrey Huntley. The original Ralph piece, for the /goal lineage section.
Claude Code's /goal separates the agent that works from the one that decides it's done, VentureBeat. The worker/judge split framed from Anthropic's side.
Two AIs, One Codebase: Why I Run Claude Code and Cursor Side-by-Side, Stark Insider. A representative example of the two-tool workflow post if you want to engage one directly.
rulesync, rule-porter, cursor2claude. The mirroring-tax CLI ecosystem. Worth scanning the READMEs to see how the authors describe the problem.