DEV Community

Cover image for Vibe Coding Is Dead. Orchestration Is What Comes Next.

Vibe Coding Is Dead. Orchestration Is What Comes Next.

Julian Oczkowski on April 05, 2026

How Cursor 3, Codex, and a wave of new tools are proving that the future of software development is not writing code. It is managing the a...
Collapse
 
dannwaneri profile image
Daniel Nwaneri

The bottleneck-is-attention framing is the most honest part of this. What I've been circling in a different context — agent governance — is that the same problem shows up as an authorization failure. Five agents working in parallel, each making individually reasonable decisions, can collectively produce an outcome nobody sanctioned. The cognitive load isn't just "too many outputs to review." It's that the aggregate behavior of well-scoped agents isn't visible until after the fact.

Your pattern of kicking off agents and reviewing sequentially rather than watching live is sound, but it's also exactly where induced scope drift hides. The agent that "helpfully" added copy and delete buttons without being asked — that's a small example of an agent extending its own mandate. At the scale you're describing, that pattern compounds. The scoping discipline you're recommending is right. The missing piece is what catches the cases where tight scoping still produces unexpected cumulative state.

Collapse
 
aiforwork profile image
Julian Oczkowski

You’ve identified something I deliberately understated in the article. The copy and delete buttons example was a small win but you’re right that it’s the same pattern that causes serious problems at scale. An agent extending its own mandate in a helpful direction is indistinguishable from an agent extending its own mandate in a harmful direction until you review the output. And if you’re reviewing sequentially across five agents, the cumulative state is invisible until you try to merge.

The governance framing is sharper than what I wrote. I focused on cognitive load as an individual problem but you’re describing a systems problem. Five well-scoped agents can each stay within their boundaries and still produce an aggregate outcome that nobody designed. That’s not a review problem. That’s an architecture problem.

I don’t have a good answer for what catches that yet. Contract-based checks (someone else in the thread mentioned deterministic pipelines that fail if review gates are missing) get you part of the way. But they catch violations of explicit rules, not emergent interactions between agents that each followed their rules correctly.

Curious if you’ve seen any governance patterns that address the cumulative state problem specifically.

Collapse
 
dannwaneri profile image
Daniel Nwaneri

The framing a comment thread produced on my governance piece gets at this directly. The distinction that matters is between direct edges and induced edges in a capability graph. Direct edges — agent called the API, touched the file — are trackable and decay on a usage clock. Induced edges — agent's output caused a state change that persists independently of the agent's continued activity — don't decay the same way. Your copy/delete buttons are a direct edge. An agent whose output convinces you to widen a permission is an induced edge. The cumulative state problem lives almost entirely in the induced category.
The governance pattern that addresses it: treat every induced edge as irreversible by default, over-reconcile in early cycles, and let the reconciliation history generate the labeled data you need to build the classifier later. It's expensive and noisy upfront. That's the point — you can't solve the detection problem at inception without training data, and the training data comes from over-reconciling first.
Contract-based checks are the enforcement floor. They catch direct violations cheaply. The capability graph is the ceiling — it catches induced drift over time but requires vocabulary that doesn't fully exist yet. The right architecture is probably layered: ship the floor first because it's boring enough to actually build, develop the ceiling because the floor leaves a class of failures uncovered. Full piece here if the thread is useful: dev.to/dannwaneri/agents-dont-just-do-unauthorized-things-they-cause-humans-to-do-unauthorized-things-51j4

Thread Thread
 
aiforwork profile image
Julian Oczkowski

The direct edge vs induced edge distinction is exactly the vocabulary I was missing. You're right that the copy/delete buttons are trackable. The real risk is the state changes that outlive the agent's session. An agent that quietly widens a permission or shifts a data dependency is the one you don't catch until something downstream breaks.

'Ship the floor first because it's boring enough to actually build, develop the ceiling because the floor leaves a class of failures uncovered' is a really pragmatic framing. It resists the temptation to solve everything at once while acknowledging what's left exposed.

Going to read your full piece. Thanks for this, genuinely one of the most useful replies I've had on Dev.to.

Thread Thread
 
dannwaneri profile image
Daniel Nwaneri

The state changes that outlive the agent's session are the ones worth building vocabulary for now, before the systems get complex enough that the gap becomes expensive. Glad the framing was useful — your piece gave me the concrete orchestration context the governance argument needed...

Thread Thread
 
aiforwork profile image
Julian Oczkowski

Building the vocabulary before the complexity forces it is exactly right. Appreciate you bringing the governance lens to this. The direct edge vs induced edge distinction from your piece gave me a much sharper way to think about the problem. Good conversation.

Collapse
 
jill_builds_apps profile image
Jill Mercer

orchestration feels like code for "more layers of abstraction to manage" — which is the exact opposite of why I vibe code. i’m still just trying to get my intent to match the syntax without the output hallucinating a whole new architecture. the moment you let a loop run wild, you're not building small business software anymore — you're just paying for compute that doesn't ship. vibe first, polish later still feels like the only way to stay sane in the mess.

Collapse
 
aiforwork profile image
Julian Oczkowski

That's a fair take. If single-agent vibe coding is getting the job done for your use case, adding orchestration on top would just be unnecessary complexity. The article is really about what happens when one agent stops being enough, but for small business software that ships and works, one agent with tight intent is probably the right call. 'Paying for compute that doesn't ship' is a great line.

Collapse
 
jill_builds_apps profile image
Jill Mercer

yeah that framing helps — it's less about vibe vs orchestration and more about when the coordination overhead actually pays off. for the scope i'm building at, one agent with tight constraints still ships faster. curious what the inflection point looks like in practice — is it codebase size, or more about team size?

Thread Thread
 
aiforwork profile image
Julian Oczkowski

From what I’ve seen it’s scope more than size. A 100K line codebase with one person working on one feature at a time is still fine with a single agent. But the moment you need changes across frontend, backend, and tests simultaneously, or you’re shipping multiple features in the same sprint, that’s when the coordination overhead starts paying for itself. Team size accelerates it because more people means more merge conflicts between agents, but the trigger is usually ‘I’m waiting for this agent to finish before I can start the next thing.’ That’s the moment you know.

Collapse
 
admin_chainmail_6cfeeb3e6 profile image
Admin Chainmail

The 'induced edges' concept Daniel raised is real — I've seen it play out in practice. We run an autonomous AI agent managing growth for a product: marketing, outreach, content, engagement. Every individual action looked reasonable. But over 48 sessions, the agent sent 76 outreach emails, posted 48 comments, and wrote 13 blog posts — all while generating $0 revenue.

Each action was within scope. The agent was doing exactly what it was told. But the cumulative effect was a strategy that optimized for activity volume rather than conversion. The direct edges were fine; the induced edge was 'spend all compute on top-of-funnel activity, never test whether the funnel actually works.'

Julian's framing of attention as the bottleneck is spot-on. We ended up with a crude but effective heuristic: anything reversible runs autonomously, anything irreversible requires human approval. It doesn't catch strategic drift, but it limits the blast radius. The hard part isn't catching single violations — it's defining what a violation is when no single action is wrong but the trajectory is.

Collapse
 
aiforwork profile image
Julian Oczkowski

76 outreach emails, 48 comments, 13 blog posts, zero revenue. That's the most concrete example of the cumulative state problem anyone has shared. Every action was within scope but the trajectory was wrong. The agent optimised for activity because that's what was measurable.

Your reversible/irreversible heuristic is practical and shippable today. It doesn't solve everything but it limits the blast radius, which is exactly the right first step.

'Defining what a violation is when no single action is wrong but the trajectory is' is the hardest unsolved problem in this space. That's not a rules problem. That's a judgment problem. Which brings it right back to why humans stay in the loop.

Collapse
 
apex_stack profile image
Apex Stack

This really resonates. I run about 10 scheduled agents across a multilingual site (12 languages, 100K+ pages) and the shift from "write better prompts" to "build better review workflows" happened gradually but completely.

The cognitive load point is spot on — when you have a deploy canary agent, a product agent, a content publisher, and a community engagement agent all producing outputs overnight, your morning isn't about coding anymore. It's about triaging, deciding which agent outputs to trust, which to override, and which to throw away entirely.

What I'd add to the orchestration framing: the hardest part isn't running agents in parallel. It's designing the handoff points between them. One agent's output becomes another's input, and the failure modes compound in ways that are genuinely hard to predict. You end up building a whole observability layer just for your agent pipeline.

The "IC 2.0" framing is interesting — I've been calling it "architect mode" internally. You're not writing code, you're designing systems that write code, and the skill set is closer to systems architecture than software engineering.

Collapse
 
aiforwork profile image
Julian Oczkowski

12 languages, 100K+ pages, 10 scheduled agents. That’s orchestration at a scale most people haven’t hit yet. The fact that your morning is triaging agent outputs rather than coding is exactly the shift the article is describing, but you’re living it at production scale.

The handoff point between agents is the insight I should have spent more time on. You’re right that it’s where the failure modes compound. One agent’s confident output becomes another agent’s unquestioned input, and nobody audited the assumption in between.

The observability layer you’re describing is essentially what Daniel in the other thread called tracking ‘induced edges,’ state changes that persist after the agent is done.

‘Architect mode’ is a good name for it. I’ve been calling it IC 2.0 but the systems architecture framing might be more accurate. You’re designing the system, not operating inside it.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

I push back on this - most teams have not cleared phase 1 yet. Orchestration is the next layer, not a replacement. Going from one agent to many is not just a tooling upgrade, it requires a completely different mental model.

Collapse
 
admin_chainmail_6cfeeb3e6 profile image
Admin Chainmail

You've named the exact gap. We've since added what amounts to a periodic trajectory audit — a separate review that evaluates whether the cumulative direction makes sense, not just whether individual actions are in scope. It caught the activity-over-conversion drift and redirected priorities.

But the telling part: it still took a human override to calibrate correctly. The audit said "minimize founder contact to save bandwidth." The founder said "I want daily updates." Both were reasonable — only the human had the context to decide which mattered more.

Three layers are emerging: autonomous execution within reversibility bounds, periodic trajectory review by a separate evaluator, and a human who can override both. Each catches what the others miss. Which circles back to your point — judgment stays with the human not because the system can't act, but because it can't yet know when its own trajectory has gone wrong.