Brian Zimbelman

Posted on May 13 • Originally published at articles.zimetic.com on May 13

Your AI Coding Assistant Is Not Enough

#ai #softwareengineering #series

This is Article 1 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com. Previously: Beyond the Coding Assistant — A New Series. Coming next: Article 2 — Why AI Tools Make Some Teams Slower.

Pick a developer. Pick a Tuesday. Break down their workday.

How many of those hours were actually in the editor? How many were in tickets, dashboards, Slack, meetings, and review? Published survey data on developer time allocation isn't subtle on this question. Stripe's Developer Coefficient report found that roughly 42% of a typical developer's week goes to addressing technical debt and fixing bad code, with only a minority of the week left for the kind of new-feature coding the marketing pictures show (Stripe, 2018). Other analyses of how engineers spend their time put hands-on coding in a similar range — a few hours a day at best, not the bulk of the week.

The AI coding tools helped with those few hours. They did very little for the other five or six. And once you take into account everyone else on the team — and all the non-development tasks in the development process that the coding tools simply don't touch — even the best-case scenario only improves a fraction of the team's time. To realize the enormous gains these tools could deliver, we need to rethink the fundamentals of how we build software in this new era. Some of what was helping us is now holding us back.

What coding assistants actually do well

Let's be clear about the starting point. The current generation of AI coding assistants is impressive, and the breakthroughs are genuine. Inside a single repository, with a human driving, with a well-scoped task, they produce usable code at speeds that would have seemed absurd three years ago. Smart engineers have built real workarounds for the tools' limitations — scripts, custom harnesses, carefully tuned prompts, agents, and multi-session workflows that extend the tools' reach further than the vendors originally imagined.

None of what follows is a takedown of those tools. The argument is that they alone are not enough. They were designed for a specific context — editor, session, one developer, one repo, one well-bounded change — and that context is not where most of the engineering work in real organizations actually happens.

The iceberg

Shipping software is mostly not coding. It is requirements gathering, stakeholder conversations, design docs, architecture reviews, feature flag wiring, secrets provisioning, CI pipeline updates, deploy runbooks, dashboard setup, alert tuning, incident response, post-mortems, migration plans, and deprecation notices. It is the scheduled meeting that becomes a Slack thread that becomes an RFC that becomes a backlog ticket that becomes, eventually, a short coding session and a pull request.

Here's another test. Ask an engineer to take the next ticket off their queue and complete it end-to-end inside a single Docker container — code, build, run, validate, deploy. For 99 of every 100 real tickets in a real enterprise codebase, the answer is no. They need multiple repos. Several services running in dev or staging. Credentials to external systems. Documentation scattered across Confluence and a few other internal sources. And usually a conversation or two with product, QA, or a teammate who knows the corner of the system where the bug lives.

The current tooling treats the session as the unit of work and leaves everything outside the session — which is to say, most of the work — to the engineer.

The rest of the team

Engineering is a team sport. The primary focus of coding agents has been the engineer-writing-code role. That makes sense as a starting point; it's where the highest-volume, most-bounded, most-codifiable work lives. Plenty of clever people have built workarounds to extend that focus to other parts of the team — agents that draft tickets, agents that summarize Slack threads, agents that turn a paragraph of intent into a Figma flow — but these are workarounds, not the design center of the tools, and someone is still there monitoring the agent on every step of the process, giving detailed instructions and often repeating those instructions multiple times.

The next generation of tooling has to make those other roles first-class citizens of the process, not afterthoughts. If the goal is team throughput, then concentrating all the AI investment on one role is ineffective. It's the local optimum of "make the developer faster" rather than the global optimum of "help the team ship more." A tool that only helps developers can only move the bottleneck to whichever role is next in the handoff.

This isn't a hypothetical claim, either. There's already industry data showing that some teams, after rolling out AI coding assistants without changing the rest of their development process, have actually seen overall delivery slow down — the front-end coding step gets faster while review, testing, and coordination start to choke. We will dig into the data behind that observation in more detail in the next article.

A walk through the lifecycle

Every team in every company has a development lifecycle. Some teams write theirs down explicitly. Others make it up as they go. Some are formal and governance-heavy. Others are loose and improvisational. The names vary — ideation, design, specification, implementation, configuration, deployment, refinement, monitoring, debugging, retirement, and others — and so do the boundaries between phases. None of that matters very much for this argument.

What does matter is the coverage pattern. Today's AI tooling concentrates almost entirely in the implementation phase, and even there it misses most of the coordination work between developers and the people they hand off to and from. Whatever phases an organization uses, the next generation of tooling has to support the entire lifecycle if AI is going to deliver on the team-level promise people keep making for it. Anything less is a tool playing in one corner of a much bigger problem.

The engineer as orchestrator

Current coding tools require the engineer to drive at a low level — prompt by prompt, session by session. Different tools have different mechanisms (chat, autocomplete, slash commands, terminal CLIs, IDE plugins) but the underlying interaction is still pretty manual. The engineer asks. The tool responds. The engineer reviews. The engineer decides what to do next. Repeat.

This was an excellent way to start. When the tools were new and went off the rails easily, tight engineer-in-the-loop control was exactly the right design. Since then, several things have improved. We've learned how to keep the tools in line — better prompts, better guardrails, better evaluation. The underlying models have improved at producing quality work. The interfaces have grown new affordances. And yet the fundamental shape of the interaction hasn't changed very much. The engineer is still the orchestrator, asking the tool to perform every step, granting every permission, and reminding the tool of context the tool ought to have remembered on its own.

A whole sub-industry has grown up around helping the tools do the right thing the first time — custom agents, hooks, prompt libraries, role definitions, project context files, MCP servers, on and on. These help. They don't address the fundamental shape of the problem, which is that the tools should be capable of running a process largely on their own, with clear, well-defined points where the engineer's judgment is needed — and only at those points does the engineer have to step in. Until that shape changes, the engineer remains the bottleneck for everything that happens around the coding step.

That's not a small cost. Research on interrupted work and context switching is consistent and old: it takes around 23 minutes to fully regain deep focus after an interruption (Mark et al., CHI 2008), and the workflow most engineers have with AI tools today is essentially a context-switch generator. Recent measurement work from METR has shown experienced developers running roughly 19% slower at real work in some controlled conditions, in part because of the cognitive overhead of constant prompting and review (Augment Code summary). The "AI fatigue" conversation that has emerged in 2025 and 2026 is the engineer's-eye view of the same phenomenon (Cerbos; ZEN Software).

Why single-session tooling hits a ceiling

The session is the wrong unit of work. If all we cared about was the code, then sure a session is fine, we start a session, tell the agent to code something up and end the session. We have the code, all is good. But we care about more than just the code. We care about designs, architectures, tests, QA processes, security and performance reviews, and on and on and on.

A work item — the thing that actually gets shipped — persists across many sessions, many agents, many repos, and many days. If the tool's unit is the session, the unwritten assumption is that humans will glue the sessions together into something coherent. They do, and that gluing is where the time goes.

This isn't just an industry observation; it has academic backing. Researchers at MIT Sloan and Microsoft argue in Chaining Tasks, Redefining Work: A Theory of AI Automation that AI's biggest impact comes from reshaping entire workflows — how tasks are sequenced, grouped, and handed off — rather than from speeding up any single task in isolation. Their concept of "task chaining" — clustering AI-friendly steps so AI executes them as a continuous sequence — is exactly the gap that session-bound tooling can't close on its own. They also point out that every handoff between AI and human carries coordination cost: review, validation, adjustment. End-to-end workflows minimize those handoffs; task-level workflows accumulate them. The session-bound coding assistant is structurally a handoff machine (MIT Sloan: How AI is reshaping workflows and redefining jobs).

Amdahl's Law is the right rhetorical anchor here. If the part of the job you're speeding up is 20% of the total, the ceiling on your overall speedup is low no matter how fast you make that part. Even a 10× improvement on the coding step lifts whole-job throughput by only about 1.2× when coding was 20% of the job to begin with. The published data on developer time allocation has been consistently in that range for years. The math is not friendly to "make the coding step faster and call it a day."

Practices, structure, and the SDLC as the differentiator

There's one observation that keeps recurring across the team-level studies: the teams that genuinely benefit from AI tools tend to share a cluster of practices. Fast feedback loops. Clear testing standards. Documentation discipline. Shared conventions for how agents are prompted, what context they're given, and what they're expected to return. Architectural ownership. Small, well-bounded work items.

That cluster of practices is what an SDLC actually is, whether or not anyone wrote it down. The teams that have one — explicit or implicit — are the ones absorbing AI tools well. The teams that don't are the ones that struggle. And once again I'll mention that if we just take our existing practices and try to shoehorn the ai coding practices into it we will not find that it fits, it is the square peg in the round hole problem.

What changes if we treat the whole lifecycle as the unit

If the work item, not the session, is the unit of work — and if the tooling supports the entire lifecycle, not just the implementation phase — several things shift at once. Coordination becomes a first-class concept rather than a human chore. Artifacts become durable across phases rather than ephemeral within a session. Review gates become part of the workflow rather than a separate meeting. Costs become attributable. Roles beyond the developer get genuine support.

That is the frame shift this series argues for. The productivity gains from better autocomplete are largely tapped. The next order of magnitude is in orchestration across phases, not generation within one — and not just generation for one role out of many.

Coming next

In Why AI Tools Make Some Teams Slower, the team-level data point this article kept hinting at gets the spotlight. DORA's 2024 State of DevOps report found a paradox: AI adoption increased individual productivity but was associated with declines in delivery throughput and stability. The teams losing on that trade are losing for structural reasons, not because the tools are bad — and naming those structural reasons is the setup for everything that comes after.

DEV Community