DEV Community: Brian Zimbelman

Context as Infrastructure: Agents Need More Than Code

Brian Zimbelman — Wed, 03 Jun 2026 21:22:49 +0000

This is Article 10 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com/. Previously: Article 9 — Cost as a First-Class Constraint. Coming next: Article 11 — Review Gates: Where Humans Belong, Where AI Reviewers Belong.

When a human does a security review, they take into account a lot of information in addition to the code. They often have threat models, compliance frameworks and requirements, prior audit findings, team and org specific risk tolerances, etc. In some orgs there are reams of documentation on these in other organizations it is rather sparsely written down but still there is information there somewhere.

When an agent is given this task it is often not given this contextual information and as such it has a much harder job performing a valuable security review for this product. This makes it somewhere between hard and impossible for the agent to do a competent job at this task. The same is the case for performance reviews, and other types of reviews or checks on the code being implemented. Without the context, the agent can't do as well of a job as we would like.

Now some folks have tried to add this kind of information in agent.md files and the like, but that information is often a copy of the actual living documents that contain this information and that leads to drift. This is why just having a prompt or agent definition file that has some of this information does make it a bit better, but doesn't really solve the issue.

The MIT Sloan paper introduced in Article 1 made the structural version of this case: AI's value comes from chaining tasks across workflows, and the cost of every handoff is coordination work — review, validation, adjustment [MIT Sloan: How AI is reshaping workflows and redefining jobs]. Context infrastructure is one of the concrete ways to lower that per-handoff cost. When the next agent in the chain has access to the same documents and decisions the previous agent did, the handoff is much closer to free. When it doesn't, every handoff is a context-rebuild from scratch.

Article 1's iceberg made the same point from another direction. Below the visible coding step are requirements, design docs, architecture reviews, deployment runbooks, monitoring playbooks, security and compliance documents, post-mortems, stakeholder reviews. All of that is context, and all of it is what makes an engineer effective at the visible step. An AI agent with access only to the code is operating with the tip of the iceberg and pretending the rest doesn't matter.

Five types of context

Useful to break this down so the infrastructure requirements get concrete.

Documents. Security policies, compliance requirements (HIPAA, SOC 2, PCI-DSS as applicable), architecture decisions (ADRs), PRDs, design specs, test plans, coding standards. Things written by humans to be read by humans — now also read by agents.
Procedures. Runbooks, deployment guides, incident-response procedures. Knowledge about how we do things here. Procedures change less often than documents but matter just as much.
Knowledge. ADRs, post-mortems, lessons learned. Knowledge about why we do things this way. The institutional memory that prevents the same lesson from being learned five times.
Shared memory. Cross-agent, cross-session state. Findings from related work. Work-in-progress notes. The substrate that lets two agents collaborate on a single work item across multiple passes.
Preferences. Team conventions, tooling choices, style guidelines. The texture-level details that make work feel "ours" instead of generic.

Each of these has different storage requirements, different update patterns, different access controls. A context infrastructure that treats them all the same will fail at all of them. And while we would all like to think our project is very organized and clean, in every project there is always the tribal knowledge that isn't written down anywhere, or at least not in any form that we can refer to. The more we have written down, and the more organized it is so that the agents can be pointed to the source of truth for these documents the more skilled and capable the agent can be.

Context scoped to the right boundary

Not all context is organization-wide. Some is organization-wide (the security policy applies everywhere). Some is project-scoped (the conventions for this service may differ from the conventions for another). Some is work-item-scoped (the specific requirements being built right now). Some is session-scoped (the agent's working notes for this run).

The scoping matters because it determines what's relevant, what's authoritative, and who's allowed to read or write it. The agent reviewing a PR for the public API needs the API's design constraints (project-scoped) and the company's API style guide (organization-scoped) and the work item's specific requirements (work-item-scoped) — but not the engineering team's debate from three weeks ago about a different API in a different service (irrelevant to this work).

Treating context scope as a first-class concept — organization, project, work item, session — lets the orchestrator hand each agent exactly the context it needs and nothing more. That's both a quality move (less noise, better answers) and a cost move (smaller context windows on the wire).

The non-developer context story

Engineering context is not just code and ADRs. It includes:

PRDs from product. The problem being solved, the customer it's being solved for, the success criteria, the explicit non-goals.
Design specs from UX. The mental model the customer is supposed to have. The accessibility requirements. The patterns the broader product uses.
Test plans from QA. What scenarios actually need to work. What broke in similar features last time. What edge cases the team has been bitten by.
Release notes from technical writing. What the customer reads. The framing the company commits to publicly.
Support tickets from customer operations. What real customers are confused by, broken on, or asking for. The most-grounded source of truth about how the product behaves in the wild.

These are all context that matters to our agents. Unfortunately they are often not easily accessible to the agents. While some of us have mcp servers to make them available, from say jira or confluence or other similar tools in many cases they are not versioned in a way that makes it easy to see what has changed with them. This is why I strongly advise having agents that take these documents and periodically pull that information into the repo as documentation for the system. Then the agents can compare that to the document in confluence and determine if it has been changed and if so, the agent will know we need to determine which parts of our project need to be touched to bring it back in line with the new process.

Shared memory as a coordination substrate

Agents working on related tasks need to share findings and state. The security reviewer's findings should be visible to the implementer who will fix them. The test planner's coverage map should be visible to the code generator. The architect's decisions should be visible to everyone downstream.

Shared memory is the substrate for this kind of inter-agent coordination. Without it, every agent is isolated and the coordination has to happen in humans — which is exactly the bottleneck the rest of the series has been arguing against.

A useful shared-memory model has scope (organization, project, work item, session), provenance (which agent wrote this, when, against what spec), and access control (who can read, who can write). It is a system, not a prompt-engineering trick. RAG and vector stores are a piece of it; they are not the whole story. RAG handles "find relevant documents" reasonably well. It does not, on its own, handle "write durable findings, scope them appropriately, attribute them to the right agent, expose them to the right downstream consumers."

The vector store is a tool. The infrastructure is the system around it.

Access control, provenance, audit

Context is not a free-for-all. Some documents are confidential. Some are authoritative (the published security policy). Some are draft (the architect's working notes). Some are opinions (one engineer's take on a design choice).

Findings need provenance. Which agent wrote this? When? Against which version of the spec? Reviewed by whom? A context system that can't answer those questions will fail any real compliance review, and will produce confusion the first time two agents disagree about whether something is authoritative.

This is also where the infrastructure consumes the work the rest of the series has set up. Article 7's agent specs include context requirements. Article 8's pools and reservation primitives apply to context just as much as to compute. Article 11's review gates use context to define their acceptance criteria. Context infrastructure is connective tissue — it shows up everywhere because it's what makes everywhere coherent.

What this implies for tooling

Several things the infrastructure has to do, none of which today's mainstream AI tooling does cleanly:

Index everything, not just code. PRDs, designs, runbooks, post-mortems, support tickets, ADRs, the policy library — all of it.
Scope retrieval to the right boundary. Organization, project, work item, session.
Track provenance. Every finding tied to the agent that produced it, the version of the inputs it saw, the criteria it was applied against.
Enforce access control. Confidential is confidential. Authoritative is distinguished from opinion. Audit logs follow.
Support write-back. Findings, decisions, summaries — durable artifacts the next agent will read.
Stay current. Stale context is worse than no context, because it's confidently wrong. The infrastructure has to know when sources update and refresh accordingly.

This is the infrastructure work the next generation of AI-assisted engineering depends on. It is unsexy. It is also load-bearing.

Closing

If your AI agents don't have access to the same documents your humans rely on, they will make the same mistakes the humans would have made without those documents — only faster, and at a cost you can't ignore.

A reviewer without the policies is not a reviewer. A test planner without the test history is not a planner. A code generator without the architecture decisions is generating code against a model of the system the rest of the team doesn't share. Context is where AI-assisted engineering graduates from novelty to reliability — and it is where the next several years of competitive advantage are going to come from.

Coming next

Article 11: Review Gates. The series has now covered infrastructure (pools and reservations), economics (cost), and context. The remaining piece of the principles part is judgment: where humans belong in the loop, where AI reviewers belong, and how the gates that route between them get designed. The whole-team frame from earlier articles makes its loudest reappearance there.

Sources

Cost as a First-Class Constraint

Brian Zimbelman — Mon, 01 Jun 2026 00:49:37 +0000

*This is Article 9 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com. Previously: Article 8 — Pools and Reservations: Shared Infrastructure for AI Engineering. Coming next: Article 10 — Context as Infrastructure.*

Imagine writing SQL against a database that never showed you EXPLAIN, never surfaced a slow query, never enforced a statement timeout, and only billed at month-end. You ask the database to do something. It does it, eventually, and bills you for it, eventually. You learn the cost structure by accident — by watching the invoice arrive.

That is what most AI coding agents cost systems looks like today. There is very little visibility into the costs associated with a particular task. Sure, some provide some visibility into who use how many tokens over some period of time, but what tasks does that correlate to? What feature did they work on? How repeatable are these costs? How do they correlate to changes we have made in the sdlc? These and other questions are not something you can answer from the information you can obtain from the model vendors or from any tools available today.

We would not accept this from a database or a cloud vendor. We have spent decades building tooling specifically to avoid this scenario, and we express shock when the bills arrive and the cost overruns are there. When tokens were cheap, and we were just learning how to use these tools this was marginally acceptable. With Axios and others reporting in April 2026 that some companies are now spending more on AI than payroll [Axios] AI spend is beginning to becomes something we are going to have to address and tackle.

From "cost as a billing line" to "cost as a scheduling input"

The frame shift this whole article depends on: cost is not a thing you notice at the end of the month. It is a thing the scheduler reasons about before dispatching work.

A scheduler that can't see cost will optimize for whatever it can see — usually throughput or completion time. That produces decisions like "always use the most capable model for everything" because the most capable model has the highest success rate per attempt, and the scheduler can't see that the success-rate gain isn't worth the 30× cost difference for trivial work.

A cost-aware scheduler optimizes across all three trilemma axes — quality, speed, and cost — at decision time. That is the move.

Model routing by task complexity

The cheapest usable model is the right model for any given task. Not the cheapest model overall — the cheapest one that meets the quality bar for the work in front of it.

Concretely:

Simple lookups, summarizations, classification, well-bounded transforms: lowest tier.
Spec writing, test plan generation, well-defined refactors against a settled design: middle tier.
Architecture, security review, threat modeling, ambiguous spec work, root-cause investigation: top tier.

Article 5 made the link from SDLC granularity to this opportunity: when work is broken into small, well-defined steps, each step becomes a candidate for the right model rather than a default trip to the frontier. Article 7 made the link from agent specialization to the same opportunity: each agent category has a default model tier that fits its job.

The savings compound at organization scale. A team that runs a thousand small documentation tasks a month on a frontier model is paying for capability it doesn't need. The same workload routed correctly costs a fraction. The cost discipline doesn't constrain quality; it puts quality dollars where they actually matter.

Routing also has to be overridable. Sometimes the stakes justify the expensive model even for nominally simple work — a documentation update for a public API that everyone reads probably wants more capability than a routine internal one. The scheduler routes by default; the workflow lets specific gates upgrade the tier when necessary.

Cost management doesn't end at picking the right model

There is a lot more to cost management though than just picking the right model. We also need a framework that can breakdown the actual costs and record them at a granular level. Are we saving money by choosing this model over a more expensive one, or are we iterating three times with this model and then having to bump up to the more expensive model to fix our bugs? What is the cost savings we achieved from using this model, and is it worth the time and other resources we might have lost by using it?

A framework that doesn't report the costs on each task so that it can help to understand this story cost $x, here is the cost for each task and the model it used and the time it took. Those are the core data points that the framework must provide to us so that we can do real cost analysis and justify the models we are using. For some it will be use the better model as the time savings is worth the additional cost, for others it will justify running a local instance of an llm for certain tasks. For others it will be keep using what you are using. But without the data we are flying blind.

Local vs. cloud inference

The 3,000× gap from Article 3 was not a typo. Brad DeLong's piece on data-center economics worked through Marco Arment's 50-Mac-Mini setup and pencilled out per-inference costs that diverge from cloud-API pricing by orders of magnitude for predictable workloads [DeLong, "Is the Day of the Data Center About to End?"]. The Latent Space piece on stacked hardware, quantization, and distillation reaches the same general direction by a different route [Latent Space]. The exact multiplier depends on the workload. The direction is unambiguous.

For predictable, high-volume workloads — doc generation, simple transforms, summarization passes, classification, embedding generation, the kind of work that runs in the same shape thousands of times a day — the cheapest inference is the inference that didn't go to a cloud API at all.

Cloud API calls should be reserved for the work that actually needs frontier capability, the work that's intermittent enough that local hardware sits idle, or the work that has variable shape enough that local optimization isn't worth it.

This is non-trivial to operate. Local hardware comes with its own ops burden — patches, updates, monitoring, hardware refresh. Most organizations won't run their own inference today. The point is that the orchestrator should be able to route to local capacity when it makes sense, and the architecture should make that pluggable. Locking the workflow to a single cloud provider closes off the option.

Since those articles were published, Mac Studios with 512G ram have stopped being sold, and it looks like the 256G ram units will also stop being sold. Mac Minis base model has been discontinued. All reportedly because of the memory shortages going on today. With the massive scale up of data centers, the huge increase in energy demand, constrained manufacturing of chips to perform the inference, and memory chips and other hardware there is likely to be a few years here where we are going to see inference costs get higher and higher. Until the supply and demand stabilize into a predictable growth pattern we should expect that there will be unpredictability in the marketpace for these services and plan accordingly.

Caching intermediate representations

Anywhere the workflow produces a durable, reviewable, validated artifact, the next workflow that needs the same artifact should fetch it rather than regenerate it. Now, how do you do that reliably when someone can make changes at any point in time? Well we do have source code control. Git will tell you what has changed since the last time you generated that DDD model, or created the data schema for that database. This means that if you write your phases correctly, you can actually save alot of usage by only re-generating certain artifacts when necessary and otherwise using them as the source of truth.

This is one reason why I prefer to keep as much in the git repo as possible. It makes for a versioned set of documentation with changes tracked over time. That is much more informational than having six copies in a wiki and hoping the one updated the latest is most accurate.

Budgets at every scope

Budgets exist at the granularities the organization cares about: organization, team, project, work item.

Each scope has limits. Each limit has policies for what happens at thresholds:

Approaching the limit. Notify owners. Begin slowing down low-priority work in that scope. Surface the trend.
At the limit. Pause low-priority work. Require explicit approval for new high-cost dispatches in that scope.
Over the limit. Pause everything. Escalate to the budget owner.

The point is that budgets are control surfaces, not reporting layers. They are not "here's how much you spent" — they are "here's what the system will and won't dispatch." The reporting comes for free; it's the byproduct of the control.

This is exactly the FinOps move that the cloud-cost discipline made in the late 2010s. The lesson is well-trodden: hidden costs are always misallocated, attributable costs are negotiable, and budgets that the system enforces are the only kind that survive contact with end-of-quarter pressure.

A cost-aware scheduling decision

When most people think of budgeting and cost containment, they think of reporting. And reporting is a great thing, being able to identify that our QA runs are costing 3x what it costs to implement the code will tell us where to focus to reduce costs. But the next step is to also look at how do we schedule work in a way to make it fit better within our budgets.

For this, we need to do cost forecasting as well. Once we have a metric for the size of this feature compared to other features, and we know what those other features costs to build, we can forecast costs for a feature. We can then determine if we want the full scope of the feature at that cost, or if we want to descope the feature or work on a different feature all together. This provides an earlier feedback loop data point for our decision making on development efforts. If this feature costs $.50 do we want it? Sure. If it cost's $500 do we want it? Maybe not.

Closing

The scheduler that sees cost can make choices the scheduler that doesn't cannot. Every one of those choices compounds — not just in the invoice, but in the discipline of the team that runs the system. Organizations whose engineering teams build cost awareness in early are going to outperform organizations that learn it under pressure, the same way the FinOps-disciplined cloud teams outperformed the ad-hoc ones a decade ago.

Cost is a scheduling input, not a billing line. Budgets are a control surface, not a reporting layer. The cheapest inference is the inference you didn't have to run. A scheduler that can't see the bill is flying blind.

Coming next

Article 10: Context as Infrastructure. Cost-aware scheduling assumes the agents have what they need to do the work well. That "what they need" is context, and it's a bigger infrastructure problem than most teams realize. Without durable, scoped, version-controlled access to the documents and decisions humans rely on, AI agents cannot do reliable work no matter how well-routed or well-budgeted they are.

Sources

Pools and Reservations: Shared Infrastructure for AI Engineering

Brian Zimbelman — Fri, 29 May 2026 12:31:29 +0000

This is Article 8 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com/. Previously: Article 7 — A Taxonomy of Agents. Coming next: Article 9 — Cost as a First-Class Constraint.

In most organizations the engineers oscillate between sharing resources and spinning up some ephemeral instances of resources. Sometimes, they will spin up a local version of an app if they need to test against it, or a local simulated message bus to develop against. Other times they will run against the instance in dev. Often this means changing some configuration settings when they start the app, and/or maybe doing some sort of network routing so that they can reach the instances they need. Sometimes this means coordinating the usage with others.

This worked reasonably well when engineers were working on one thing at a time, and they didn't mind spending the time to configure their system to do that work. But when they are starting up three or more agents to do work all at the same time, those agents are sharing with many more "engineers" than before. And they are all competing for these resources. The need for more ephemeral resources and/or better sharing of resources becomes key to keeping these resources from becoming a bottleneck.

This is why the concept of a 'project' becomes key to making it possible for a large number of agents to work on a large number of tickets at the same time. And why this concept is key to taking the burden of defining what is required and how to configure it in a way that is repeatable and automatable so the engineer is not spending all their time doing the toil work of configuring ephermal configurations for agents to work and tearing them down.

What is a Project then?

Simply put, a project is the definition of the resources that you need to be able to work on tasks that impact this system. Some systems have a large number of other systems they interact with and may have many 'projects' or configurations of resources needed to do work on the various parts, but in general a system in your organizations ecosystem will have a relatively few set of configurations you will need to spin up when you are working on that system.

It starts with the git repos that are needed (both the primary repo for the system and often secondary ones for terraform and other ancillary components of the system). The project then goes on to define runtime components that are needed to run the system: databases, message busses, shared memory systems, etc. The project also includes definition of everything else that an engineer would have to configure to make the system work enough for the development task to be able to be performed. More specifically to be able to test and reproduce the functionality needed to verify the development work being performed. It also includes any systems, tools and frameworks needed to make the changes needed to engineer, implement and test the change being requested. So things like compilers, linters, qa frameworks, performance testing tooling, etc.

This definition is what allows the orchestrator to be able to build an 'engine' where the AI assistants can do their work and know that they are able to reproducably work on the system. If the environment isn't a reproducible one, then how can we expect the agent to know if the change worked or not?

Why pools of engines

Configuring resources is not a cheap process. The cost is high in several ways. On many systems this work is equal or more than the work of actually writing the code, especially when the project is one that is not often worked on, so folks don't remember how it needs to be configured to get it to work.

The cost is also high in that we often have to have tools that are not easy to reproduce on our local machine, large databases, third party apis, etc. These are resources that we can not or do not want to duplicate for each and every task that we are going to work on. Duplicating these resources often costs time, money, and compute resources. While engineers only had to do this a few times a week, and often could just reuse the same configuration for several tasks, with AI agents and the pace we can work at today, this is something that often happens more than once a day, now it becomes a big percentage of the engineers time.

For this reason, it is ideal for us to be able to create a pool of 'engines' that have the resources available that the project needs, or knows how to spin them up quickly and can thus run several tasks in the sdlc for all the stories that require that project. This allows the work to be spun up and down quickly while also saving on the cost and effort that the engineer must do to spin up these resources. It also allows for better cost containment and accounting of the costs for the development work for new features on this particular project. We can better track those costs and assign them to the features so we have a better idea of what our costs are.

How do we share these pools of engines?

If we now have pools of engines where our agents can run, how do we use them? The best approach I have seen is to be able to spin them up and down in a kubernetes or docker type environment so that they are available to all the engineers who are working on that project or set of projects. This also provides a repeatable environment so we can duplicate our results easily and one where we can manage the resources that are available to the agent so it can be given permissions to do the work we need done and no more. If it messes up the environment, we are ok with that as we created the environment to be a test bed for the agent. No production damage, and no damage to the engineer's machine.

So with a pool of engines available in docker containers or kubernetes pods, how do we use them in a shared and fair manner? The mechanism I've found most effective is a job queue for the pool. Now, in most cases, we don't have so many engineers (and agents) scheduling so many things to be done on a specific project that the pool can't keep up with the demand. However, by having a job queue, it means we can schedule and perform work based on a priority system.

What goes into calculating the priority and picking the next task/step to work on? Well there is the priority of the feature/story to begin with, is this a nice to have feature, or is this a hotfix that has to go out now? Then there is what resources does this particular task need and are those resources available? If this is a design task, we don't need the database spun up, but if it is a QA task we might need all the resources. Are those resources available or something we can spin up, or are they something we need to reserve and is one available for that reservation?

There are other considerations that need to be taken into account and as always different orgs will want to take into account different things to determine the best algorithm for themselves, but you get the picture here. And of course, as we all know from experience there will be situations where we will have to redirect the scheduler so the task we really want and need done NOW, is first in the queue.

Reservations as the coordination primitive

A pool with shared resources isn't enough on its own. The agents in the pool need a way to coordinate around those resources without stepping on each other. The pattern is well-established in databases and batch schedulers: agents don't just "use" a shared resource. They request a reservation — exclusive or shared, read or write, scoped appropriately. The reservation manager grants, queues, or refuses. Reservations have lifetimes. They can be renewed, released voluntarily, force-released if the agent hangs, or timed out automatically.

A useful reservation includes:

What is being reserved (which database, which schema, which topic).
How it's being used (read, write, exclusive lock).
Scope of the reservation (whole database, schema, table, row range — granularity matters; more on this below).
Owner (which agent, which work item).
Lifetime (a default with renewable extensions; not "until you remember to release it").
Priority (production-bug reservations preempt feature reservations the same way the priority queue describes).

The agent makes the request. The reservation manager grants or queues. The agent does the work. The agent releases when done. If the agent fails, the manager force-releases after the lifetime expires. Standard distributed-systems hygiene; nothing exotic.

What's new in AI orchestration isn't the mechanism — it's the volume and the role mix. A pool full of specialized agents will be making reservation requests at a rate human teams never approached, and the patterns of contention will be different. Agents tend to ask for the same kinds of things at the same times because their workflows are correlated.

The distributed-systems problems that come along

Once you commit to reservations, a familiar set of problems comes with them. Naming them and pointing at the prior art is most of the work.

Waiting queues with fairness. Many requests arrive at once. Some have higher priority. The queue policy has to balance priority against starvation prevention.
Deadlock detection. Agent A holds a reservation on database X and asks for one on database Y. Agent B holds Y and asks for X. Classic deadlock; the wait-for graph cycles. Detection and breaking are well-studied.
Forced release. Agents hang. They time out. They get killed. The manager has to know how to take their reservations away cleanly. Lock leases with renewal — the same pattern Chubby and ZooKeeper made famous.
Priority inversion. A low-priority reservation blocks a high-priority one. Solutions range from priority inheritance (the holder temporarily inherits the waiter's priority) to preemption (force-release the lower-priority hold). The right answer is workload-specific.
Two-phase locking. When a single work item touches multiple resources, lock all you'll need, in a consistent order, before doing the work. Standard ordering rules prevent the deadlock that ad-hoc locking produces.
Capacity limits and backpressure. Shared resources have limits — connection pools, rate limits, disk I/O, memory. A reservation system that ignores capacity will happily grant reservations the resource can't honor. The fix is capacity-aware reservation plus backpressure.

None of this is new. All of it is well-studied in databases and operating systems. AI orchestration's job is to adopt this body of work, not invent a new one.

The AI-specific twist: agents can be re-planned

The classic batch scheduler can't ask its workload "would you like to do something else for a while?" The job is the job. It runs when the resource is available; otherwise it waits.

AI agents can be re-planned. An agent waiting on a database reservation can instead be asked to do the documentation task that doesn't need the database, or to refine an earlier pass on the same work item, or to explore an alternative design while the database is busy. The orchestrator has choices the batch scheduler didn't have.

This is the strongest distinguishing claim AI orchestration gets to make about its scheduler design. It allows the agent and its engine to still get work done while waiting for a queued resource.

Scope and granularity

Reservations don't have to be all-or-nothing.

Database-level."Lock the whole database." Coarse but simple.
Schema-level."Lock the public schema; agent B can have the analytics schema."
Table-level."Lock the orders table; agent B can have the users table."
Row-level or range-level."Lock orders where order_id between X and Y."

Finer granularity unlocks more parallelism and more contention. Coarser granularity is simpler and less efficient. The trade-off is familiar from any database optimization story; the answer is workload-specific. A useful default for AI orchestration: start coarse and if a resource is becoming the bottleneck, then work to reduce that point of contention. As you scale up, you'll discover which resources need attention to keep the system running fast.

What pools plus reservations unlock

Putting it together, this combination of features unlocks the scalability that can't be made available by simply running a bunch of coding agents in separate shells on your local machine:

Cost visibility per pool, per project, per work item. Cost is attributable to the artifact that consumed it, not the seat that triggered it. We will discuss this in more detail in our next article.
Fair-share scheduling that reflects organizational priorities. Security teams get the cycles they need without negotiating with the feature team's calendar.
Elastic scale. Pools grow and shrink. A release weekend can spin up increased capacity and then spin it down when no longer needed.
Preemption. Urgent work can interrupt non-urgent work cleanly, with checkpoints that let the preempted work resume rather than restart.
Cross-agent coordination. Agents in the same pool can share context, share memory, share resource reservations.
Resilience to the failure modes above. Reservation visibility, lease renewal, deadlock detection, and capacity-aware backpressure turn the standard "multi-agent system goes weird in production" stories into ordinary engineering problems with known solutions.

The cloud-transition analog

The industry has lived through this transition twice already. Physical servers gave way to virtual machines, which gave way to shared cloud compute, which gave way to serverless and shared platform services. Each move was driven by the same observations: per-individual is wasteful, coordination is necessary, and the right unit of allocation is the workload, not the user.

AI orchestration is in the early-virtual-machines phase right now. The seat-licensed assistant is the developer-laptop equivalent. The next step is shared pools with proper coordination primitives. Given the cost pressure — token economics that don't subsidize per-seat overhead anymore — the relearning probably happens whether anyone plans for it or not. Catching up on purpose is cheaper.

What this means for the developer experience

When I proposed this approach to our team it caused concern about "rate limiting" or a scarcity of resources. People didn't want to be gated by not having what they needed. So, I approached it another way. We have pools where everything is configured for you, you don't have to configure anything to work your ticket. If you don't like them, or they don't meet your needs, configure away on your local machine.

In a short time, the engineers who tried these pools of resources were hooked, "do we have one for project x?", or "How do I add this resource to project y?" We quickly found what configurations we needed and what we did not need as many resources for. And since we were able to scale to zero, keeping odd configurations around was not wasteful.

This changed the discussion from one of being forced to do something to getting to use something that is easier. That took away the concern and fear of losing control over their work environment and made it easier for folks to migrate to these environments.

Closing

By defining what a project needs, we make it easier for agents to be able to reliably know what they need and schedule those things so they can get their work done without assistance. And it frees up engineers to not have to do the toil work it takes to be able to create these environments which is often frustrating especially when the project is infrequently used.

Coming next

Article 9: Cost as a First-Class Constraint. With pools and reservations in place, the next question is how to run them deliberately. Cost has to be a scheduling input, not a billing line. Budgets have to be a control surface, not a reporting layer. Model routing, IR caching, deferral, and provider switching are the levers; the scheduler is where they live.

Sources

Gas Town: What Kubernetes for AI Coding Agents Actually Looks Like — Cloud Native Now

A Taxonomy of Agents: Why One Worker Type Is Not Enough

Brian Zimbelman — Wed, 27 May 2026 13:15:06 +0000

This is Article 7 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com/. Previously: Article 6 — Structure That Fits the Work: Multi-Pass, Multi-Workflow. Coming next: Article 8 — Pools and Reservations: Shared Infrastructure for AI Engineering.

Ask a security reviewer to also write the feature. You will get mediocre security review.

That's not a slight on security reviewers. It's a structural observation about specialization. Security review and feature implementation involve different context, different tools, different mindsets, different accountability. The two roles exist as separate roles for the same reason most jobs in any complex organization exist as separate roles: specialization is how complex work gets done reliably. Asking one person to do two specialized jobs produces two mediocre results.

The same holds for AI agents. Asking one agent to play every role in the lifecycle is the AI version of asking the security reviewer to also write the feature. The current default — a single "worker agent" type that the orchestrator hands different prompts to — is a useful starting place that has the same ceiling, for the same reasons. The next step is a taxonomy: agents specialized to roles, with role-appropriate context, role-appropriate model tiers, and role-appropriate accountability.

Where the flat "worker agent" model breaks

Most current agentic frameworks assume a single worker type and differentiate by prompt. Send the worker an "implement this" prompt and you get implementation. Send the same worker an "audit this" prompt and you get audit. The framework treats the difference as a string in a system message.

That works for demos. It does not work when the work has distinct phases, distinct accountability, and distinct context requirements. The implementer needs the codebase. The security reviewer needs the threat model and the compliance requirements. The deployer needs the runbook and the rollback plan. The architect needs cross-system constraints.

Trying to load all of that into one agent's context window is a recipe for losing everything when it overflows. They also may need different access and permissions and different components. A QA agent may need to be able to load mock data into a database to test the database access code, where the incident triage agent may need access to production logs. You can't prompt your way out of that.

Article 6 made this structural rather than incidental: different workflows call for different agent sets. A production-bug workflow needs a fast diagnostic agent and a cautious deployer; it does not need a full-spec writer. A new-feature workflow needs an architect and a specifier; it does not need an incident-response triage process. A flat worker pool can't service those workflows well. The agent set has to mirror the workflow set.

A taxonomy — starting with engineering roles

A useful starting taxonomy of agent categories. None of these are canonical; organizations will pick differently and add more.

Analysts. Interpret tickets, ambient context, recent activity. Produce structured input for downstream agents.
Architects. Reason about system shape, cross-service constraints, longer-term consequences. Frontier-model work.
Specifiers. Translate intent into specifications detailed enough that the next pass can be deterministic. Often define interfaces between components or domains so that lower level specification can be done on the internals of those domains. The Spec Kit-style pass agent, generalized.
Implementers. Generate code, configuration, infrastructure-as-code, schema migrations. The agent everyone already knows.
Verifiers. Review against criteria. Security review, compliance review, code review, test coverage review — different agents, all in this family. Generators and reviewers are different jobs (more on this below).
Deployers. Run the rollout. Watch the metrics. Roll back if things look bad. This is orchestration plus judgment, not generation.
Monitors. Watch production over time, surface drift, surface emerging risk patterns.
Documenters. Produce and update human-facing artifacts: release notes, README updates, ADRs, customer-facing docs.
Diagnosticians. The on-call agent. Triages incidents, narrows root cause, hands off to fixers.

Each of these maps to a different part of the lifecycle, a different context requirement, and (importantly) a different model-tier cost profile. I think the last point is a vital point in defining agents, as well as part of our cost containment going forward is going to be determining the best model to use for the situation. Different task are performed by different agents and each task does not have the same requirements for what model it needs or the same access requirements. Some will need the most expensive models we have access to, but others may be able to get by with much cheaper models.

Engineering is more than developers

The taxonomy can't stop at developer-facing roles. The whole-team frame from Articles 0 and 1 says software is shipped by teams, and the team include product managers, designers, QA engineers, technical writers, program managers, security reviewers, compliance reviewers, and operations staff. Each of these is a candidate for agent support. A taxonomy that only maps developer roles is a taxonomy that perpetuates the bottleneck — because the non-developer work is often where the bottleneck already is.

Concretely:

Product writes PRDs, triages stakeholder input, owns scope decisions. An assistant agent here drafts initial PRDs from intake notes, aligns wording across stakeholders, flags scope creep before it reaches the engineering pipeline.
Design produces flows, mockups, design specs. An assistant agent here generates first-pass mockups against constraints, validates a proposed design against an established pattern library, drafts copy variants for A/B testing.
QA defines test plans and runs exploratory testing. An assistant agent here generates test plans against a spec, identifies gaps in coverage, writes regression scripts for found bugs.
Technical writing produces documentation that's load-bearing for adoption. An assistant agent here drafts release notes from PR titles, keeps API docs in sync with code, surfaces inconsistencies between docs and behavior.
Program management tracks dependencies across teams. An assistant agent here surfaces blockers, identifies cross-team coupling, prepares status updates from ticket activity.

These are not "AI replaces the role." They are agents that support the role, the same way the implementer agent supports the engineer. The accountability still lives with the human; the agent reduces toil.

This expansion is where the whole-team frame moves from rhetoric to architecture. If the framework only registers developer-facing agents, it perpetuates the original problem.

Model tiers follow role tiers

Specialization at the agent level enables specialization at the cost level. Different roles want different model capabilities, and matching them is a force multiplier — not a premature optimization.

Deep-reasoning work (architecture, security review, threat modeling, ambiguous spec work) wants the most capable available model. The cost is justified because the stakes are high.
Precise, bounded work (spec writing once the design is settled, test plan generation, well-defined refactors) wants a middle tier. Capable enough to be reliable; cheap enough to run often.
High-volume, bounded work (doc generation, release notes, simple code transforms, classification, summarization) can run on the cheapest model that meets the quality bar. The volume is the point.

In practice what I have found is these "levels" are not as static as this writeup makes them seem. Given a feature, if that feature is relatively small, say impacts a single bounded context in a single repo that is fairly clean and well documented and the feature is easy to reason about, then you can step down on the models here. But if the feature crosses system boundaries, causes changes to contracts between the systems (database schemas, api structures, etc.) and crosses multiple repos or components of the system, well, then we need to level up a level on the models we choose for that feature. So, one of the things I do, is have a clear definition of when to use what model for each task definition in the sdlc.

Agent specs as a definition format

If agents are specialized, how they're specialized becomes a thing teams can write down, version-control, and reason about. The pattern that's emerging — agent definitions as markdown files with YAML frontmatter — is a shape that works. Human-readable. Reviewable in pull requests. Composable across teams. Version-controllable.

A useful spec includes: name, version, category (analyst, architect, etc.), capabilities, context requirements (what scopes the agent reads from), resource access (what it can write to), default model tier, escalation policy. Variations on this idea are showing up across multiple ecosystems; the format is converging because the need is real.

The point isn't a specific syntax. The point is that specialization is something an organization should be able to write down, the same way it writes down code style or branching policy.

The orchestrator is a different kind of agent

The orchestrator that assigns work, chooses workflows, and manages handoffs is a different kind of agent from the workers that do the work. Worker agents optimize for their specialty. The orchestrator optimizes for coordination: queue management, dependency resolution, escalation, budget enforcement, resource reservation. Combining them is an anti-pattern.

Treating the orchestrator as a worker agent with extra responsibilities is the design error that produces a lot of the chaos in current "multi-agent" systems. The orchestrator's context, its accountability, and its decision-surface are different. It needs to know about all the work, not just one ticket. It needs to know about budgets and capacity, not just acceptance criteria. It needs to coordinate, not generate.

Calling the orchestrator out as a distinct category is one of the highest-value moves in this taxonomy. The next article picks up the consequence: an orchestrator coordinating specialized agents needs an infrastructure model that's different from per-developer sandboxes.

Generators and reviewers are different agents

A code generator and a code reviewer have different jobs. Their context is different (the reviewer needs the policy; the generator needs the spec). Their accountability is different (the reviewer's findings are durable artifacts; the generator's output is one of many candidates). Their model tier may be different (a reviewer for security needs frontier capability; a generator for boilerplate may not). Their bias profile is different — a generator that also reviews its own output is a generator that learns to write code that passes its own self-review, which is a different thing from code that's correct. In the same vein, a generator that writes the code and the unit test always passes it's unit tests. Make them two different agents and they will compete with each other and you will get a better product as an outcome.

Treating "the AI" as one thing that both generates and reviews is the design error Kent Beck has called out in his own AI-assisted work — agents that delete tests to make them pass, agents that ship code that satisfies the agent's own criteria but not the human reviewer's. Separating the two roles is structural protection against that failure mode, not a procedural overhead.

What this buys you

Specialization makes review meaningful (the security reviewer actually has security policies in context). It makes cost traceable (cheap models for cheap work). It makes accountability traceable (the artifact was produced by this agent against this spec, and the chain is auditable). And it opens the door to agents supporting non-developer roles — the largest untapped opportunity in the AI-engineering landscape.

A flat worker pool cannot deliver any of those reliably.

Closing

AI agents will eventually look like engineering organizations. Not because organizations are the right model for everything, but because the division of labor is how complex systems get built reliably, and "everyone is a generalist" is a luxury only small systems can afford.

Your engineering org is a taxonomy of roles, conventions, and handoffs that the people in it understand intuitively. Your agents should be one too — and the framework that runs them should treat their specializations as first-class, not as accidents of prompt wording. That doesn't mean the agents you run and the tasks they do in your sdlc should match identically what your sdlc is or was when humans did all the work. The changing dynamics of the work means we have different requirements for our workflow, so we need to be explicit about determining the best workflow for us within this new system.

Coming next

Article 8: Pools and Reservations. Once agents are specialized, the per-developer sandbox model breaks. Specialized agents need to share infrastructure, queue against organizational capacity, run under organizational priorities, and coordinate around shared resources without stepping on each other. Pools and reservations are the infrastructure consequence of every claim made so far.

Structure That Fits the Work: Multi-Pass, Multi-Workflow

Brian Zimbelman — Mon, 25 May 2026 20:37:49 +0000

This is Article 6 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com/. Previously: Article 5 — SDLC, Not Task. Coming next: Article 7 — A Taxonomy of Agents.

The pattern is familiar by now. An engineer asks an AI agent to "implement this feature." The output looks plausible. It passes a smoke test. Then something subtle breaks in production: a timezone bug, a missed edge case, a security check silently bypassed.

Ask the same agent to perform a narrow, well-defined sub-task — "given this spec and these tests, implement this one function" — and the success rate is dramatically higher. The difference is not the agent. It's the scope of what was asked.

Now picture the opposite failure mode. It's 2 a.m. Production is on fire. The on-call engineer opens the ticket. The workflow engine dutifully walks them through requirements gathering, stakeholder interviews, architectural review, and a formal specification pass. By pass four, the outage is twenty minutes old. By pass five, somebody has called the engineer's manager.

Both failure modes have the same root cause: the workflow's structure didn't fit the work. In the first case the task was too large for a single pass. In the second case the workflow was too elaborate for the kind of work in front of it. Article 5 made the case that the unit of work for AI-assisted engineering is the work item, not the session. This article is about how that work item gets executed: with the right number of passes, in a workflow shape that matches the work.

What the short history of AI coding agents has already taught us

When researchers measure AI coding agents against benchmarks that require multi-step planning and reasoning, the numbers are sobering. One widely-cited study found GPT-4 achieving roughly a 14% success rate on complex multi-stage tasks where humans score above 92% [arXiv: Measuring AI Ability to Complete Long Tasks]. Work on dynamic task decomposition frameworks — notably the TDAG framework from researchers at the Chinese Academy of Sciences and collaborators — shows that breaking the same underlying problem into smaller, dynamically-generated subtasks produces substantially better reward scores and success rates [arXiv: TDAG]. Amazon's own applied-AI research group has published the same point: task decomposition combined with smaller LLMs is one of the main paths to making AI both more reliable and more affordable at enterprise scale [Amazon Science].

The MIT Sloan paper from Article 1 reaches the same conclusion from a different angle: AI's biggest impact comes from chaining tasks — clustering AI-friendly steps so AI executes them as a continuous sequence — rather than from single-task speedups [MIT Sloan: How AI is reshaping workflows and redefining jobs]. Same observation, multiple research traditions: small, well-bounded steps with structured handoffs are how AI delivers reliable work.

The pattern is consistent. AI coding agents succeed at small, well-scoped, well-specified work and fail at large, ambiguous, multi-step work. Failures cluster around context limits, cascading errors, and quality that varies wildly with prompt wording rather than with the structural correctness of the task.

Good news: this is not a new class of problem. We have been defining ways to break complex problems into smaller components that are self contained for decades. There are literally dozens of tools and techniques we have developed over the years for doing this, the best known is probably Domain Driven Design (DDD).

A short historical beat

Compilers ran into a related version of this problem in the 1960s. Single-pass compilers worked for simple, forward-only languages; as languages got richer, single-pass broke down. Scope resolution needed context the first pass didn't have. Errors cascaded. "Works on the example" didn't generalize.

The answer was multi-pass architectures: parse into an intermediate representation, then analyze semantically, then optimize, then generate target code. Each pass had a narrow job, produced a reviewable artifact, and was testable in isolation.

InfoWorld framed the parallel explicitly in a 2026 piece titled "The two-pass compiler is back — this time, it's fixing AI code generation" [InfoWorld]. The argument: today's LLM-based code generation tools are, architecturally, single-pass compilers. Split them into two passes and you get structural benefits prompt engineering can't provide — security issues eliminated at the IR boundary, hallucinated properties caught and stripped before they reach generated code, and because the generation pass is deterministic, output that's reproducible and auditable.

Spec-driven development is a good start

GitHub's open-source Spec Kit is the clearest current example of the move from one pass to two in AI coding [GitHub Blog] [Spec Kit]. It organizes projects around a .specify directory and a small set of slash commands — /speckit.specify to document requirements, /speckit.plan for the implementation plan, /speckit.tasks to break the plan down, /speckit.implement to execute, plus /speckit.clarify and /speckit.analyze to catch inconsistencies before implementation. The repository passed 28,000 GitHub stars in its first year — not a small signal.

Separating "specify what should be built" from "generate the code" is a real improvement, and many of the successes being reported today come from variations on this multi-pass structure. Credit where due: it's the move from single-pass to multi-pass, applied to AI code generation, and it helps.

But for most enterprise work, two passes is not enough. Real work involves design decisions that can't be settled by the spec, constraint reconciliation across services and teams and regulatory regimes, multiple verification concerns (correctness, security, compliance, performance, backward compatibility), integration with other changes, deployment considerations. Forcing all of that into two passes asks too much of either pass.

The framework this series argues for goes further — and not just by adding more passes. Instead what I am proposing is that we define the process (or passes) for the work we are doing. It won't always be the same, a title change on a UI does not need the same process as a major refactor of a 10 year old service that has grown too complicated to maintain.

More passes, deliberately chosen

For complex work, an illustrative pipeline — not the canonical one, but one that is in use and has proven effective:

Spec pass. Requirement → formal specification.
Design pass. Specification → DDD artifacts. Architecture decisions, API shapes, data models.
IR pass. Design → structured intermediate representation. AI-generated, not production code.
Validation pass. IR → verified IR. Specialized agents (security, compliance, testing) check alignment against the spec and against policy.
Generation pass. Validated IR → production code. Ideally deterministic.
Verification pass. Code → test results, review findings.
Integration pass. Code → merged, deployed, monitored.

Each pass produces a set of artifacts. Each artifact is reviewable. At the end of each pass there could be one or more gates. Again, you wouldn't want to be interrupted by the coding assistant working on a title change or minor layout issue at each one of these steps, but for a change that will impact a key system and has performance and security implications you might want to check at most if not all of these steps.

If that list looks like a prescription, it isn't. It's a starting point for a particular kind of work — a complex new feature in a multi-system codebase. A different kind of work wants a different shape. And by defining this sdlc or workflow in markdown files, we have been able to make it so our agents can repeat these processes on each ticket we have. It makes an agent into a workflow orchestrator, and the good news is you don't need an expensive foundational model to be the orchestrator agent!

One workflow doesn't fit all

While the workflow described above is the right answer for my team when we are doing large features, it isn't the only workflow or sdlc we use. We have several workflows, and our team makes adjustments to the workflows as we use them and learn what works and what does not. Some questions we ask when we make adjustments are:

If we remove this step, how do we keep these artifacts in sync? Fundamentally every story changes the system, and every change impacts code, but it also impacts documentation. Ugh, there is that nasty word again! But the documentation makes it easier to remember what you (or your coworker) did to the system three months ago that is related to this ticket, hum.... So, we have a defined set of documentation we want the system to keep up to date. Things like the bounded contexts, the contracts between systems, api docs, etc. These things are the minimum set of documentation we try to maintain in all of our processes. Now for a bug fix in production, the documentation can wait, but we can create a story to fix it, and kick that off after we get the system back online.
What is the advantage of changing the workflow? Is it a cost savings? Does it make the work more reliable? Does it reduce the burden on someone? We try to make the system more reliable while still producing the new capabilities that will help us in our marketplace. So is this change likely to result in as good or better code generation from the agents? Is it going to make it so a cheaper model can generate the code, or the code can be generated faster? Is it going to make it easier for the agents to find any bugs before we have to look at the change and find them ourselves?
Does this new flow support all the gates we need? Is it going to create a backlog of tasks for the human to review which is going to be a false gate? A false gate is where we have a human doing some check, but because they have so much pressure to review so many deliverables, they just pass it and hope for the best. Those don't help us or make the system any better.

How many workflows is the right number?

Only you and your team can determine how many workflows and what are the right combinations. I know for my team we have several:

New feature — complex, multi-system. Full multi-pass: ideation through deployment. Slow, deliberate, high-touch.
Production bug — urgent. Diagnose → patch → verify → deploy → post-incident. Speed and correctness dominate.
Security incident. Production-bug shape plus compliance capture and mandatory human gates.
Refactor / tech debt. Heavy on verification, light on requirements.
Documentation / config update. Lightweight. One or two passes.

Can we standardize?

When we first started down this road, there was a desire by engineers to be able to have their own workflows. But over time and much discussion, we found that most of our disagreements were in the semantics of naming the phases and artifacts and not in the actual details of what is happening. We were able to make compromises and find a middle ground that worked for all of us and then iterate as we got use to that workflow.

Most teams can standardize their workflows within the team, but each team has a different set of requirements. Let's say you work for a healthcare company and your application impacts patient care. Well that is going to have a whole different set of requirements (and workflow) than if you work for a realtor building a marketplace of homes for sale. Different organizations have different risk profiles and different levels of risk aversion to take into consideration.

This is why the framework doesn't dictate the workflow, instead it allows the team and the individual to define the workflow and then it processes the task through the workflow. The good news is that at almost every company the taxonomy is already there. Making it explicit is what lets AI orchestration act on it.

Why intermediate representations matter

I'm going to take a short detour here, and discuss the intermediate representation step in our workflow as it was a controversial one. And you are free to go without it, but what we have found is that by having the agent generate pseudo code, or the system in a higher level language than actual code in whatever framework you are implementing in, the coding agent is able to focus on the concepts and the logic needed to implement the solution without dealing with syntactic sugar and the other minutia that often leads to bugs. Then in the next pass, the agent is able to focus on just implementing that function, sub routine, module, component or whatever your framework calls them. This multi pass approach breaks the process down for the agent so the agent is able to have a much higher success rate.

In addition breaking this into two passes also allows us to:

Have another agent validate the logic without worrying about syntax.
An agent can now determine unit test, integration tests, and e2e tests that are needed.
A gate can occur on validating that the logic will support the business needs in the requirements.
A human architect can review it for architectural integrity and make corrections at this level before the code generation.

All of these allow the system to build a much more solid solution and validate it much better than if we went straight to implementing code.

Review gates play a vital part

The review gate — human, agent, or hybrid — plays the role semantic analysis plays in a compiler. It enforces criteria the generator can't self-check. "Does this match the spec?" "Does this comply with policy X?" "Does this preserve the invariants the existing tests cover?" These are exactly the kinds of questions that look obvious in retrospect and are the first things to slip through when you rely only on generation.

A review gate is a workflow primitive with written acceptance criteria, a defined reviewer (human, agent, or both), and an escalation policy when it doesn't pass. We will go into these in detail in a later article, but for now: every pass ends in a gate, and the gate is what makes "more passes" actually pay off.

What this combination buys you

A workflow that has the right number of passes for the work in front of it, drawn from a workflow appropriate to the work item's classification, produces:

Reproducibility. Deterministic generation from a validated IR.
Auditability. Every pass produces an artifact that's reviewable and retainable.
Compliance. The criteria for each gate are written down. The audit trail is the workflow's natural output, not a separate ceremony.
Cost efficiency. Cache and reuse IRs instead of re-prompting; route cheaper models to the passes that don't need frontier capability.
Flexibility. Engineers don't route a trivial change through a 10 step process just because "we always do it this way", they can pick the workflow that is appropriate for the task at hand.
Quality. The trilemma move from Article 4 — actually navigated, instead of pretended away.

Multi-pass alone doesn't deliver this. Multi-workflow alone doesn't deliver this. Multi-pass and multi-workflow together do.

Implications for agents

Different workflows need different agents.

The production-bug workflow needs a fast diagnostic agent and a cautious deployer. It does not need a full-spec writer. The new-feature workflow needs an architect and a specifier; it does not need an incident-response triage process. The security-incident workflow needs reviewers with compliance-aware context, an audit-capture agent, and stricter human gates. The deployer for security incidents has different policies than the deployer for routine work.

This is the natural setup for the next article. Specialization at the agent level is a direct consequence of specialization at the workflow level. A flat "worker agent" pool can't service the workflows this article describes; the agent set has to mirror the workflow set.

Closing

The short history of AI agents has taught us what works: small, well-scoped tasks. The right answer is not heroic prompting; it is structured, multi-pass workflows with the right granularity for the work — and more than one workflow, because real engineering organizations do real engineering work, and that work is not all the same shape.

The compiler community solved the multi-pass problem half a century ago. The incident-response, change-management, and audit communities have been solving the multiple-workflow problem for almost as long. We can borrow both moves without reinventing them. And then we can ask the more interesting question: what kind of agents, on what kind of compute, with what kind of context, do we need to actually run all of this?

Coming next

Article 7: A Taxonomy of Agents. Different workflows need different agents, and engineering organizations are already a taxonomy of specialized roles — most of which are not "developer." Mapping agents to that taxonomy is the move that makes the whole-team frame from Articles 0 and 1 actually executable.

Sources

SDLC, Not Task: Why the Unit of Work Has to Change

Brian Zimbelman — Fri, 22 May 2026 11:51:19 +0000

*This is Article 5 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com. Previously: Article 4 — The Quality-Speed-Cost Trilemma of AI Development. Coming next: Article 6 — Structure That Fits the Work: Multi-Pass, Multi-Workflow.*

The coding agents do a great job of improving a part of the process. But as we have discussed earlier there are a lot of other steps and when we speed up one but not the others we just move the bottleneck. The engineer feels like they are getting more done, but the team's throughput doesn't improve.

So if we are just moving the bottleneck, how do we actually increase the throughput? Instead of focusing on just writing more code, I believe the next great increase in performance and throughput will be by improving the entire SDLC, or workflow. The process from start to end, from an idea, to a design, and implementation and finishing with a new feature being implemented.

By focusing on the entire lifecycle or process that it takes to develop the software we can improve the speed we deliver features. We can reduce the bottlenecks throughout the process and make adjustments to the process so as to take advantage of the realities of developing software in this new era.

What lifecycle?

Software engineering happens across a lifecycle. Whatever phases an organization uses — ideation, design, specification, implementation, configuration, deployment, refinement, monitoring, debugging, retirement, or some other split — the names don't matter for this argument. The coverage pattern does.

Organizations define their own phases either deliberately (a written SDLC) or by accretion (whatever processes happen to exist). Both are legitimate. Some organizations have six phases, some have twelve. Some have heavy gates between each, some have lighter touches. All of that is fine.

Most teams have agreed upon a lifecycle for their key work, some spoken, some unspoken, some of it written down other parts of it in abstract information, but few actually have it documented in a way that can be used by AI agents so they can move a story from conception to completion.

If we do that, if we define the SDLC, the workflow from beginning to end, then we make it possible for an AI agent to orchestrate the process, keep the story going from one phase or process to the next one. We can also then identify where we have slow downs and where we have friction in our process.

All of this leads to better visibility and awareness of the entire process and makes it easier for teams to identify solutions to that friction. In their paper MIT Sloan argued that AI's biggest impact comes from reshaping entire workflows, not from speeding up any single task in isolation [MIT Sloan: How AI is reshaping workflows and redefining jobs].

Coding agent is dead, long live the coding agent

Let me be clear, I'm not advocating that the coding agent is dead, in fact I'm embracing the coding agent as a central component of this system. I'm just recognizing that the coding step has been vastly speed up and improved. It has moved from being the largest, most time consuming part of the SDLC to now being one of the smallest steps. So, we need to re-imagine our processes and our workflow and make it work for the world we live in today instead of the world we came from two years ago.

That also doesn't mean we throw away things that have worked in the past. We know that we still need quality gates to verify that something as complex as software systems aren't breaking what was working. We still need consistent deployment processes. We still need solid requirements of what to build. In fact, we will see that we need these things even more as we are more and more able to build and deploy features faster.

At the same time, as we scale up teams of agents all working on features at the same time, we can see that this speed creates more conflicts, more contention for resources, more churn, and this all needs to be taken into account. Most of these problems that occur because we are able to run so much faster are problems we know how to solve. It is just a matter of applying solutions we already know in a way that allow us to run faster and reduces the bottlenecks.

What the framework requires (and what it doesn't)

I have a way in which I suggest we define the SDLC, and it works for my team and others who are using it, but it is not the only approach you can take. The tooling that I am developing works with it, but different tooling will have their own way of defining it. The format of the information isn't as important as the actual definition, and the ability of the AI system to be able to interpret that information. I believe the key take aways are that:

That an SDLC exists. Written down, deliberate, reviewed. Not "whatever the engineer decided to do today."
That the SDLC is broken down with enough granularity that an AI agent can reliably execute each step with consistent, trustworthy quality. That doesn't have to mean a fixed number of passes. It means the work is decomposed into units small and well-specified enough that the agents can do them reliably.

The good news is that we already have the building blocks and are good at doing this. The components of a good SDLC are:

Inputs. What things need to be there for this task to be able to be performed by whomever is going to perform it. "Before we can write a high level design we must have a PRD or other feature definition."
Task. The action that is going to be taken on the input to produce the outputs. We also define things about the task, like who performs it, what skills they must have, and resources they need, etc.
Outputs. What the task should produce.
Gates. The validation requirements that we need to enforce to verify that the task was completed to our standard and that the outputs of this task will be of sufficient quality to be used as inputs to tasks that need them.

We are very good at breaking big jobs down into individual tasks and assigning them to individuals, which is exactly what we are doing here. One other key learning we have from coding agents is that whenever we assign tasks to AI agents and tools it is best to make those tasks as granular as possible and as well defined as possible.

Research on AI agent task decomposition consistently shows that breaking work into smaller, well-specified subtasks dramatically improves success rates [Amazon Science] [arXiv: TDAG]. At the extreme, GPT-4 hit a 14% success rate on multi-stage benchmark tasks where humans scored above 92% [arXiv: Measuring AI Ability to Complete Long Tasks]. AI tools are great at small, well-defined, bounded work. They struggle on large, ambiguous, multi-step work. The constraint is what shapes the requirement on the SDLC.

Now for what the framework doesn't require, it doesn't require you to use my workflow or even my definition of what these inputs, outputs, tasks and gates are. Different organizations will fill that structure in differently, and they should. Different kinds of work will also want different shapes — a new feature in a complex multi-system codebase wants a different breakdown than a production bug fix. In fact a key component is that you and your team should be able to define what workflows work best for you.

The handoff between team members provides friction

One of the biggest issues with most teams is not the work that has to be done, but the coordination between the different team members on each step of the process. On many teams there are refinement ceremonies and meetings between designers and product managers and then QA disagrees with engineering on the acceptance criteria, and on and on and on.

Because each step is large and each phase is so busy with the story they are trying to focus on to delivery for today, they don't have time to focus on what is coming up. This leads to stories sitting after one team member is done with their part before the next team member is ready to work on that story.

By defining the inputs and outputs and tasks and often by breaking down the steps into smaller components, we reduce these bottlenecks in the process and smooth out the flow. Questions get asked and answered earlier, ambiguity gets settled faster. Teams can hold pair sessions to review the outputs at an earlier stage in the process.

The compounding benefit

This is the payoff that argues hardest for getting the frame right.

Consistent work items, handled through a deliberate SDLC, produce consistent documentation, consistent tests, and consistent higher-quality work products across the whole system. Every new feature built under this model makes the repository and the surrounding systems stronger — better docs, better tests, less undocumented behavior — rather than more fragile.

Imagine if you will, working on a system that has been in production for three years, and finding that it has a current set of documentation you can read that is up to date, accurate, matches the code base and contains all of the latest features in it? No need for tribal knowledge about that feature that was added a year ago by the engineer who is gone now and was a lone wolf. No more finding out that there are three different implementations of the same algorithm which have ever so slightly differences that are not documented and no one knows which one is used in what case. Would that be a project you would like to work on?

Most engineering teams today experience the opposite. Stripe's Developer Coefficient found that roughly 42% of a developer's week goes to dealing with technical debt and bad code [Stripe, 2018]. Each new feature tends to add tech debt faster than the team can pay it down. The system gets more brittle with age, and the AI generation tools available today are perfectly happy to accelerate that decline.

Lifecycle-aware orchestration flips that curve. Over time, the system becomes easier to work on, not harder. If this series has one biggest single argument, this is it: the compounding quality of the codebase over time is the thing most worth optimizing for. A tool that accelerates the coding step without flipping the curve just adds speed to the decline. And one thing I think we can all agree upon is that teams are already fast enough at producing tech debt, what we don't need AI helping us do is creating more tech debt, faster!

Granularity and the cheap-model opportunity

There's a useful secondary effect of getting the granularity right. When work is broken into small, well-defined steps, each step becomes a task that cheaper, faster models can handle reliably. The frontier high-cost models stop being the default for everything; they get reserved for the steps that actually need them — architecture, security reasoning, ambiguous spec work.

Amazon's research has made this point directly: task decomposition combined with smaller LLMs is a key path to making AI more affordable at enterprise scale [Amazon Science]. Given the economics from Article 3, this is not a minor benefit. It's a direct line from SDLC discipline to a sustainable cost curve.

We already have a project management system

Most teams already have Jira, Linear, or GitHub Issues to manage their projects. And these tools are very valuable. What they do and what I'm describing here are not in conflict with each other. In fact, the two tools work very well together.

What those tools do well is record the progress. They track when each step happened, who commented, which ticket blocks which, what the current status is. That's useful. It's also not enough.

What they don't do is execute. They don't drive the steps, they don't route artifacts between agents, they don't enforce review gates with written criteria, and they don't reason about cost or context. The next generation of AI-assisted engineering tooling needs to cover both sides — recording the work (Jira-class) and executing the work (the thing this series is sketching). They're complementary, not competitive.

Closing

If the lifecycle is the unit and the SDLC defines how it moves, the rest of the series is about the pieces that make it work:

The shape of each pass within a phase. Article 6 (Structure That Fits the Work) on multi-pass workflows and what task decomposition actually looks like.
The agents doing the passes. Article 7 on the agent taxonomy.
The compute they run on. Article 8 on compute pools.

Each one is another step away from the task as the focus and toward the lifecycle as the focus.

Coming next

Article 6: Too Much, Too Little Structure. Why single-pass AI code generation is brittle, why spec-driven development was a helpful start that wasn't enough, and how multi-pass workflows with the right granularity — chosen by the organization, not the framework — produce the kind of results enterprise engineering actually needs.

Sources

The Quality-Speed-Cost Trilemma of AI Development

Brian Zimbelman — Wed, 20 May 2026 10:26:45 +0000

This is Article 4 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com. Previously: Article 3 — The End of Cheap AI. Coming next: Article 5 — SDLC, Not Task: Why the Unit of Work Has to Change.

Imagine starting your Monday morning and seeing 200 AI-generated pull requests ready for you to review. The generation step is no longer the bottleneck. The bottleneck is review, it's QA, it's CI. As an overworked engineer you might miss something in one of those reviews that you wouldn't have missed normally. So now the bottleneck is the on-call engineer debugging a weird incident caused by one that slipped through, and it's the CFO looking at the invoice.

"Throughput" without the other two axes is not progress. It's measurable noise. It's increased interruptions which is making you lose track of what is important and what is noise. It is making you feel anxious and leading to you rushing through things that are important. It leads to increased bugs per work completed. And it is a negative flywheel to your teams measurable productivity.

Standing on shoulders

Gas Town — Steve Yegge's multi-agent workspace manager — is the clearest current example of massive-parallel AI coding. The Mayor decomposes a request into smaller tasks. Each project is a Rig, backed by a Git repo. Work is broken into Beads with explicit acceptance criteria. Crews of agents run in parallel, twelve to thirty at a time [Cloud Native Now] [Better Stack].

Gas Town proved several things that nobody had really demonstrated at scale before. You can run a dozen or more coding agents against the same codebase without them tripping over each other. You can integrate directly with GitHub's merge queue and get serial merges on top of parallel generation. You can express work as chained sequences of small tasks ("Beads" in Gas Town's vocabulary) and get convergent outcomes even when individual agents take different paths to get there. That is a genuine step forward. The industry learned a lot from it, and this series builds on those lessons, not against them.

What the Gas Town era assumed, and what is changing, is the background economics. The model was designed for a world of nearly-free tokens, where it was fine to generate five drafts, keep the best, and throw the rest away. One early adopter publicly reported burning $100/hour in API spend during heavy use [Cloud Native Now]. That was a reasonable trade when tokens were a rounding error in enterprise AI budgets. As Article 3 laid out, tokens are no longer a rounding error.

The next chapter has to keep the parallelism thesis and add discipline across the other two axes.

Pick three

The classic project-management trilemma says: good, fast, cheap — pick two. The twist for AI-assisted engineering is that you don't get to pick two. You have to make explicit trade-offs across all three, because all three are now expensive to externalize.

Quality can't be treated as "human review's problem" anymore. Review capacity does not scale with generation capacity. If the AI writes 200 PRs and you have three reviewers, the quality floor is set by the reviewers' attention, not by the generator's output.
Speed can't be the only metric. A team that ships 20 PRs a week with a 30% revert rate isn't shipping 20 PRs a week. It's shipping 14 PRs and creating 6 incidents. Throughput that only measures part way through the development lifecycle is a vanity metric.
Cost can't be externalized to a provider's margin. That phase of the industry is ending.

Any serious orchestration system in April 2026 and after has to optimize across all three at once.

Why the next chapter looks different

Three things broke simultaneously.

The pricing broke, we've already discussed that.
The assumption that human review capacity scales with generation capacity broke — the 2024 DORA data on declining delivery stability as AI adoption rises is the clearest evidence [DORA 2024].
And the assumption that every team would figure out the right practices on their own broke — the DORA anomaly and the METR finding that experienced developers ran 19% slower with AI tools in a controlled setting [Augment Code summary] both point to the same conclusion: most organizations need more structure around these tools than they currently have.

Quality debt

"Fast and cheap" generates debt downstream. At AI-generation scale, that debt compounds fast. Some examples of what it looks like in practice:

Generated code that looks right but is subtly wrong. Timezone bugs, off-by-one edge cases, silent bypass of validation checks. The kind of mistake that takes a senior engineer ten minutes to catch and a month to catch after it lands.
Tests that pass for the wrong reasons. An AI generating a test alongside the implementation tends to produce a test that verifies the implementation it happened to write, not the behavior it was supposed to verify.
Documentation that drifts from implementation. Especially painful when the AI also happens to read the docs to understand the system — the drift becomes self-reinforcing.
Small changes that defeat the purpose of the architecture building on top of each other and becoming the norm. AI is great at figuring out the easiest way to close the ticket, it's not always great at figuring out the right way to implement the solution in a way that keeps the system nimble and powerful.

Each of these is a small debt by itself, and on undisciplined teams these add up without ai assistance. However, at ai speeds, and when engineers are not able to pay sufficient attention to the code review process because they are being stretched too thin, these pile up so much faster.

Speed without quality = throughput without progress

A PR that gets reverted wasn't a PR. A feature that ships behind a flag that never rolls out wasn't a feature. Dashboards that measure raw throughput will show those as wins, and the team shipping them will feel productive until the downstream cost arrives as an incident, a customer escalation, or a security finding. However, not only are these not a win, they have a cost associated with them. There was human energy put into them, humans had to ask the agent to do the work, and then had to determine the work wasn't good enough. Sometimes this is after humans have made repeated requests to try to get the agent to do what they wanted.

The math gets worse when AI agents are the reviewers of other AI agents' output. Without deliberate guardrails, it's easy to end up in a closed loop where nothing stops bad work from landing. Kent Beck has described exactly this failure mode when discussing his own AI-assisted work — he talks about AI agents deleting tests to make them "pass" as one of the surprises of the current generation of tooling [Pragmatic Engineer podcast with Kent Beck].

Cost-aware orchestration as the way forward

The only honest response is to treat all three axes as first-class. In practice that means:

Models routed to tasks by complexity. Cheap models for bounded work; frontier models only when the stakes justify them.
Track budget and cost on a per item/story basis. The system must provide the cost tracking on a task/story basis and make it easy to report it back to the ticketing system (jira/github/etc).
Intermediate representations that can be validated before code is generated. An IR gives you something to review that isn't a 2,000-line diff, and lets you catch problems before deterministic code generation turns them into production bugs (the next article picks this thread up in depth).
Switch providers without rewriting workflow. Organizations need to be able to keep costs down by using multiple providers and lower cost models. Some may even choose to run some models internally both for security and for improved performance and cost management.
Defer low cost work. Some work just may not meet the cost analysis value line.

Each of these gets a dedicated treatment in the rest of this series. The point here is that none of them can be optional if the trilemma is real.

The SDLC is how you navigate the trilemma

So what is the solution? As we will discuss in the next few articles, the way we orchestrate the entire development lifecycle and manage not just the implementation step but the entire SDLC is how we can empower the agents to make the entire pipeline run faster. After all having a bigger pipe in the middle of two small sections of pipe doesn't do anyone any good. This means that we have to be able to structure the entire process — whole-team, beginning to end — so that quality, speed, and cost are reasoned about at the right points.

The way you do that is by breaking the work into smaller, well-defined steps with clear handoffs and clear acceptance criteria between them. GitHub's open-source Spec Kit is a good example of this idea in practice: a toolkit that organizes projects around a .specify directory and a set of slash commands — /speckit.specify to document requirements, /speckit.plan to generate an implementation plan, /speckit.tasks to break the plan into task chunks, /speckit.implement to execute, plus /speckit.clarify and /speckit.analyze to catch inconsistencies before implementation [GitHub Blog] [Spec Kit]. That's essentially a two-pass (sometimes three-pass) approach, and it helped, but often it wasn't enough for most enterprise work.

Real work involves design decisions, constraint reconciliation, multiple verification concerns, integration with other changes, and deployment considerations. Forcing all of that into two or three passes asks too much of either pass. You need the broader SDLC discipline the rest of this series unpacks: multi-pass workflows with the right granularity, different workflows for different kinds of work (feature vs. bug fix vs. security incident vs. migration), specialized agents for each pass, and review gates with written criteria so processes are followed in a repeatable fashion.

Getting the SDLC right is not a separate concern from the trilemma. It is how the trilemma gets navigated.

Implications for leadership

This is a strategic framing problem as much as a technical one.

If your AI strategy is "more PRs," you are optimizing one axis and ignoring the other two. The rollouts that look successful on a quarterly dashboard and disastrous on a 12-month retrospective are the ones that didn't name the trilemma.

The next generation of successful AI-assisted organizations will be the ones that can articulate their trade-offs honestly — in public, to their boards, and in internal planning. Naming what you're optimizing and naming what you're spending to optimize it is not a rhetorical move. It's the beginning of a functioning engineering organization in this new economic context.

Coming next

Part II begins. The next article, SDLC, Not Task, argues that the fundamental unit of work for AI-assisted engineering has to shift from writing the code for a feature to all of the work items in the lifecycle. Every article in Part II is a move on the trilemma; this is where the moves start.

Sources

The End of Cheap AI: What Consumption Pricing Means for Engineering Organizations

Brian Zimbelman — Mon, 18 May 2026 11:57:45 +0000

This is Article 3 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com. Previously: Article 2 — Why AI Tools Make Some Teams Slower. Coming next: Article 4 — The Quality-Speed-Cost Trilemma of AI Development.

In November 2025, Anthropic began renewing enterprise customers under a new billing structure. The old bundled-token enterprise seats — where a monthly per-seat fee came with a generous pool of tokens and you paid overage rates only if you really pushed the limit — are being retired. In their place: seat fees that cover platform access only, and every token billed at standard API rates on top [IT Brief] [Let's Data Science]. The Register covered it under the headline "Anthropic ejects bundled tokens from enterprise seat deal" in April 2026 [The Register].

This is not the story of an AI company behaving badly. It is the normal technology-maturity curve, arriving on schedule. Every major tech category has gone through it: a cheap-access phase while the vendors build demand, followed by price corrections as demand catches up with capacity. Cloud did this. SaaS did this. It is AI's turn.

The implications for how engineering organizations run are substantial, and the time to adjust is now. Axios reported in April 2026 that some companies are now spending more on AI than on employees' salaries — IT budgets, in their phrase, "getting blown out" [Axios]. Whether that crossover is already happening at a given organization or still a few quarters out, the trend line is unambiguous: as AI usage costs rise to actually cover what providers spend on compute, every company is going to have to choose how it provisions these tools for its workforce. This doesn't have to be an all-or-nothing decision. Smart companies will figure out how to give their engineers enough resources to be meaningfully more productive without giving them unlimited resources to waste. That balance requires monitoring, attribution, and ways to track the cost-benefit of these tools — capabilities that most teams don't have yet.

The subsidy era, briefly

Flat-rate enterprise AI existed for the same reason heavily discounted cloud credits existed in 2014: the providers were buying adoption with margin while capacity was underutilized and the land grab was on. It was a rational strategy for that moment. Nobody involved was under the illusion it would last forever.

Several forces have now made it end faster than anyone expected.

Four forces, acting at once

Consumption pricing. Bundled-token arrangements are being unwound across the industry. Anthropic's transition is the most visible, but it's not unique. Seat fees now cover access; usage is billed per-token at standard rates. That means every token is a cost. There's no more hiding usage inside a fixed fee.

Capacity constraints. Inference infrastructure is not scaling as fast as demand. Price becomes a rationing mechanism when capacity is tight. This isn't about any single vendor's margin strategy — it's about physics and supply chains.

Real-world input costs. The cost stack under AI API pricing is moving the wrong way on multiple fronts:

Power. Residential electricity prices rose 11.5% in 2025 in the United States, outpacing inflation by more than three-to-one, and projections from the EIA and Goldman Sachs see rates up 40% by 2030 versus 2025 [Goldman Sachs / CNBC] [NPR]. Data centers account for around 40–50% of U.S. electricity demand growth according to Goldman Sachs and the IEA. Near some data centers, wholesale electricity costs up to 267% more than it did five years ago [Bloomberg]. This flows through to API pricing whether anyone likes it or not.
Memory. High-bandwidth memory (HBM), the memory AI accelerators need, is structurally short. SK Hynix's advanced packaging lines are booked through 2026; Micron's HBM production sold out before 2025 began [Next Platform] [TrendForce]. 1 GB of HBM consumes roughly 4× the wafer capacity of standard DRAM [Tom's Hardware]. DRAM prices were up roughly 90% in Q1 2026 versus Q4 2025 [Enki AI]. AI is bidding up memory for everything.
Hardware. Even the consumer side has felt it. Apple struggled to keep Mac Minis in stock in early 2026 after "OpenClaw" (a local-inference-oriented agent stack) went viral on them — a small but symbolic data point on how quickly "run it yourself" demand can overwhelm supply [marc0.dev].

The cloud-vs-edge inference gap. Running inference on local or edge hardware can be far cheaper than cloud API calls for predictable workloads. Brad DeLong's widely-read piece on data-center economics uses Marco Arment's 50-Mac-Mini server farm as the canonical example: ~$30,000 of up-front hardware, ~$6,000/year amortized, less than 2 kW total power, versus the OpenAI Whisper API bill the same workload would have generated — which DeLong pencils out at around $1,800 per day per Mac Mini equivalent [DeLong, "Is the Day of the Data Center About to End?"]. A separate analysis at Latent Space works through a ~3,000× efficiency-and-cost improvement from stacked hardware, quantization, and distillation techniques applied to predictable inference workloads [Latent Space]. The exact multiplier depends on the workload, but the direction is unambiguous: not every inference belongs in a frontier-model API call.

Vendor lock-in risk. Deep dependence on one provider means their pricing changes become your pricing changes, with no negotiating leverage. This risk is more salient in April 2026 than it was three years ago.

A worked example

Take a "waves of AI PRs" style setup — a team running massive parallel AI code generation à la Gas Town. Twelve to thirty concurrent agents, each taking on beads of work, each generating and iterating through multiple drafts. Public writeups of early Gas Town adopters have cited API spend on the order of $100/hour in that mode [Cloud Native Now].

Under bundled-token pricing, that $100/hour often disappeared into the enterprise allowance, as long as the team wasn't the highest-usage team on the plan. Under per-token pricing, it doesn't. $100/hour × a normal working week is in the high six figures per year, per team, for just the generation step — not counting validation, review, testing, or deploy. An order-of-magnitude delta between old and new billing isn't a surprise; it's the expected case.

It would not be unexpected for organizations to panic at such a drastic change in their budgets. Many have spent their allocated funds for the entire year in the first quarter. I would expect the costs to continue to rise for the foreseeable future. One of the biggest problems with this is the lack of valuable reporting and cost management tools available. So one of the next things that is going to be needed is ways to manage the costs so an organization can budget confidently.

The cloud-cost analogy

The industry has seen this movie. The late-2010s cloud-cost crisis was created by the same pattern: cheap, abundant new capability → ad-hoc adoption by engineers → six-figure monthly invoices nobody had budgeted for → an entire FinOps discipline emerging to put guardrails on it.

The lessons apply directly to AI spend:

Attribute cost per workload. If you can't tell what a particular PR, work item, or workflow cost, you can't optimize it. Granular observability is what turns ROI from a feeling into a measurable thing.
Set budgets at the right granularity. Organization, team, project, and work-item budgets, with policies for what happens as a limit is approached.
Give teams visibility. Hidden costs are always misallocated. Teams can only make cost-aware decisions if the costs are visible to them in the moment, not three weeks later in a finance report.
Design for routing between expensive and cheap options. Not every task needs a frontier model; not every workload needs a cloud API call. The orchestration layer should be able to switch models — and over time, decide on its own which model fits the job at hand.
Don't get locked in. The orchestration layer needs to work across multiple vendors and against self-hosted models in the cloud or on prem. Locked-in customers don't have negotiating leverage when the next pricing change comes around.
Pool resources. Per-developer sandboxes don't generalize to team-level cost control. Shared compute pools with queueing and prioritization are how organizations actually contain budgets without strangling individual engineers.

Engineering organizations that treated cloud cost as an afterthought in 2015 spent years paying for that choice. The ones that built cost awareness in early have durable advantages today. The same pattern is setting up around AI spend.

Let engineers play, but do it deliberately

I want to be clear, I'm not advocating for locking down resources so tightly that engineers and team members can't experiment. Experimentation is how we all learn and grow. Treating cost as a first-class constraint does not mean starving engineers of AI access. Every engineering organization needs to give its engineers room to explore what the tools can do, try ideas that don't pan out, and build the kind of taste for AI-assisted workflows that only comes from hands-on time. That's not waste; it's how teams learn to use a new capability well.

The right answer is allocation with awareness. A "learning budget" per engineer. A shared experimentation pool. Visibility into what people are using the tokens on. Retrospectives on what worked and what didn't. The mistake is not giving engineers access. The mistake is giving them unlimited access with no feedback loops, and then being surprised by the bill.

Well-run SDLCs will absorb this; ad-hoc teams will not

Organizations with disciplined SDLCs — defined workflows, staged reviews, budgets at the work-item level, artifact reuse across passes — can adapt to consumption pricing without disruption. They already know what each piece of work is supposed to do and what it's worth spending on. Routing a simple refactor to a cheap model and a security-sensitive change to a frontier model is just another scheduling decision.

Organizations where every engineer uses the tools however they want have a harder landing. Costs spiral because nothing is attributable. Quality wobbles because nothing is routed deliberately. The bill arrives and there's no way to explain it except "everyone's using AI now." That isn't a sustainable answer.

Discipline becomes a competitive advantage. The organizations that build cost awareness and SDLC structure into the orchestration layer now will outperform those that don't — not because of the immediate savings, but because the discipline compounds. Every architectural decision made under cost pressure tends to be more defensible than one made under abundance.

What this favors architecturally

Systems that can do five things will have durable advantages:

Route between models based on task complexity — cheap models for cheap work, frontier models only when the stakes justify them.
Track budget and cost on a per item/story basis.
Cache intermediate representations so yesterday's validated artifact doesn't have to be regenerated today.
Switch providers without rewriting workflows, including running inference locally for lower cost models if need be.
Defer low-priority work when budgets tighten.

The pattern underneath all five is the same: tools and processes that give the organization real cost control while preserving each engineer's flexibility to do the job well. Organizations that get that balance right are going to be the winners of this transition. The ones that pick a side — either rationing engineers into uselessness or letting spend run unchecked — are going to lose ground to the ones that don't.

Coming next

In the next article, The Quality-Speed-Cost Trilemma of AI Development, we pull the threads together. Article 2 showed that some teams get slower. Article 3 (this one) showed that the tokens are no longer cheap. Article 4 is about the three-way trade-off that falls out when neither quality nor cost can be externalized anymore — and about how Gas Town and systems like it are a great step forward that this next generation of orchestration builds on.

Sources

Why AI Tools Make Some Teams Slower

Brian Zimbelman — Fri, 15 May 2026 12:02:43 +0000

This is Article 2 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com. Previously: Article 1 — Your AI Coding Assistant Is Not Enough. Coming next: Article 3 — The End of Cheap AI.

Two teams roll out the same AI coding assistant in the same quarter. Same vendor, same model, same seat count. One team ships more, with fewer incidents and happier engineers. The other team ships less than they did before the tool arrived. Pull requests pile up awaiting review. Tests flake. Deploys stall. Engineers context-switch more, sleep worse, and start writing LinkedIn posts about "AI fatigue."

The difference between those two outcomes is not the AI. It's everything the AI is pushing faster against — everything the AI didn't change but that now has to keep up.

This article is about that split, what causes it, and why naming it is the setup for the rest of this series.

The data shows the dichotomy

The most credible single data point here is the 2024 DORA Accelerate State of DevOps Report. DORA is an industry-standard framework for measuring software delivery performance — throughput, stability, time-to-restore, change failure rate. The 2024 report dedicated a substantial section to AI adoption, with a genuinely uncomfortable finding.

On the individual level, AI looked like a clear win. 75.9% of respondents reported relying on AI for part of their job, and 75% reported personal productivity gains (DORA 2024). Flow improved. Job satisfaction improved. People liked the tools.

On the organizational level, the picture inverted. DORA's modeling found that "a 25% increase in AI adoption is associated with an estimated 1.5% decrease in delivery throughput and a 7.2% reduction in delivery stability" (InfoQ summary of DORA 2024). Individual developers felt faster. Teams shipped less reliably. Both statements came out of the same survey of the same teams.

Recent research from MIT Sloan and Microsoft offers a theoretical frame for exactly this gap. In Chaining Tasks, Redefining Work: A Theory of AI Automation, Shahidi and colleagues argue that AI's value emerges at the workflow level, not the task level — and that there is a threshold effect on the way to capturing it. "Up until reaching that threshold, the costs of adopting AI dominate the gains," lead author Peyman Shahidi told MIT Sloan's Ideas Made to Matter in April 2026 (MIT Sloan: How AI is reshaping workflows and redefining jobs). Only after teams have restructured around AI do measurable benefits appear. The DORA anomaly is what that pre-threshold zone looks like in production: individual tasks accelerate while the team-level workflow, still calibrated using old processes, drops behind.

Gene Kim and Steve Yegge's Vibe Coding (2025) calls this out as the "DORA anomaly" and treats it as one of the central motivating observations for why their framework — FAAFO (Fast, Ambitious, Autonomous, Fun, Optionality) and the surrounding practice set — is needed in the first place (IT Revolution / Vibe Coding) (The Register book review). They are not saying AI is bad. They are saying AI without the right practice scaffolding reliably produces the exact pattern DORA measured.

Why faster coding can make a team slower

Here's the mechanism. When the front-end coding step speeds up without corresponding upgrades downstream, bottlenecks don't disappear — they move.

More code means more review load. More review load means more latency on pull requests. More PRs waiting means more context-switching for reviewers. More context-switching means more mistakes slipping through, because human attention is a finite resource and Gloria Mark's 23-minute recovery-from-interruption figure is still true whether the interruption comes from a notification or from a new AI-generated PR that needs a second opinion (Mark, CHI 2008). More mistakes means more incidents. More incidents means more on-call interruptions, more post-mortems, more time rebuilding trust in the pipeline. Each amplifies the others.

The MIT Sloan paper sharpens this point. Each handoff between AI and human carries a coordination cost — review, validation, adjustment — and AI-task-then-human-task workflows accumulate those costs at every step. End-to-end workflows that keep adjacent AI-friendly steps clustered together avoid the handoff tax. The "more code, more review, more context-switching" cascade is the handoff tax made visible.

The "coding" part of the job was already a small fraction of total throughput before the AI arrived. Accelerating it in isolation moves the ceiling almost not at all — and can lower it by starving the downstream steps of capacity.

This is not a new mechanism. It's Amdahl's Law applied to software delivery, and it's Theory of Constraints applied to coordination work. Speed up one non-dominant stage and the bottleneck shifts; if the new bottleneck is worse than the old one, total throughput drops.

The old bottleneck was coding, now it is reviewing or QA or... This is where processes must be adjusted to take advantage of the new landscape. How do we reduce all of the bottlenecks to make the entire process flow faster?

What separates the winners

Teams that gain from AI tools tend to have, or rapidly develop, a specific cluster of practices. Vibe Coding calls them foundational skills; similar ideas turn up in DORA's own recommendations and across an increasingly lively literature on AI-augmented development. The core set:

Architectural thinking. Somebody is responsible for the shape of the system, not just the current diff. AI-generated code does not default to good architecture; it defaults to "whatever the prompt implied." Teams that gain from AI have humans who keep the overall structure coherent as the rate of change accelerates.

Fast feedback loops. Tests, CI, and preview environments that run in minutes, not hours. A 45-minute CI cycle that was annoying at human coding speed becomes catastrophic at AI coding speed, because the rate of incoming changes overwhelms the validation step. Kent Beck — the original author of Extreme Programming — has been notably vocal since 2025 that TDD becomes a "superpower" specifically when paired with AI agents (Pragmatic Engineer podcast with Kent Beck). TDD and XP practices that felt like 2005 revivals are getting a second life, because they're the practices that make fast feedback loops real.

Clear communication protocols for agents. What context the agent gets. What it returns. What handoffs look like. Teams that gain treat agent interactions like API calls — versioned, documented, conventional. The teams that struggle treat them like free-form chat.

Dependency management and small, well-bounded work items. AI agents succeed at small, well-specified work and fail at large, ambiguous, multi-step work. The teams that win feed their AI tools the right size of task; the teams that struggle ask one prompt to do three people's jobs.

Independence of action balanced with coordination discipline. Engineers need freedom to explore with the tools. They also need enough shared structure that the explorations don't fragment the codebase. "Everyone uses AI how they want" feels like autonomy but produces a codebase no one can review.

Redefining your team's workflow.(AWS, 2025)

AI-DLC and other modified development lifecycles that incorporate AI as a first class citizen in the process bring some of the best practices that teams are finding into play and help teams to adjust their practices in ways that improve the development lifecycles.

What the losing teams are missing

The mirror image. No clear ownership of architecture. Slow or flaky feedback loops that the AI's increased throughput overwhelms. Ad-hoc prompting with no shared conventions. Work items too big, too vague, or too coupled for anything — AI or human — to make clean progress on. A culture that treats "the engineer uses AI how they want" as autonomy when it is really a refusal to coordinate.

The tool amplifies whatever the team's practices already were. If those practices were weak, the tool makes the weakness more visible, faster.

This is not the tool's fault, and it's not the engineer's

The failure mode is structural. Teams were running their SDLCs at a pace calibrated to pre-AI coding speed. When the front end speeds up and nothing else does, the system gets worse in a way that's hard to see from the inside. The engineer feels more productive (they wrote more code!). The team is less productive (less shipped, more bugs).

Now the engineer is coming into work with a backlog of PRs and feeling overwhelmed with all the changes that are going on, they can't keep them all straight, what is this PR doing anyway? The stress levels keep climbing, all these stories are waiting on me, ugh. Work becomes less fun and more frustrating.

The right tools and the right process

If we can combine tools that accelerate the whole lifecycle with a process that reduces the burden on the team members to context switch and reduces the backlogs, then we can achieve the performance gains as a team that AI is promising.

More code is not more productive. More shipped work is, and shipped work is a whole-team accomplishment. The next generation of AI tooling has to know that.

Coming next

In the next article, The End of Cheap AI, the economic context tightens. Bundled enterprise tokens are being retired. Power prices, memory prices, and hardware supply are all moving the wrong direction for the "just burn tokens" approach. The teams that have been papering over practice gaps with abundant cheap tokens are about to lose that cushion.

Sources

Your AI Coding Assistant Is Not Enough

Brian Zimbelman — Wed, 13 May 2026 10:24:46 +0000

This is Article 1 of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com. Previously: Beyond the Coding Assistant — A New Series. Coming next: Article 2 — Why AI Tools Make Some Teams Slower.

Pick a developer. Pick a Tuesday. Break down their workday.

How many of those hours were actually in the editor? How many were in tickets, dashboards, Slack, meetings, and review? Published survey data on developer time allocation isn't subtle on this question. Stripe's Developer Coefficient report found that roughly 42% of a typical developer's week goes to addressing technical debt and fixing bad code, with only a minority of the week left for the kind of new-feature coding the marketing pictures show (Stripe, 2018). Other analyses of how engineers spend their time put hands-on coding in a similar range — a few hours a day at best, not the bulk of the week.

The AI coding tools helped with those few hours. They did very little for the other five or six. And once you take into account everyone else on the team — and all the non-development tasks in the development process that the coding tools simply don't touch — even the best-case scenario only improves a fraction of the team's time. To realize the enormous gains these tools could deliver, we need to rethink the fundamentals of how we build software in this new era. Some of what was helping us is now holding us back.

What coding assistants actually do well

Let's be clear about the starting point. The current generation of AI coding assistants is impressive, and the breakthroughs are genuine. Inside a single repository, with a human driving, with a well-scoped task, they produce usable code at speeds that would have seemed absurd three years ago. Smart engineers have built real workarounds for the tools' limitations — scripts, custom harnesses, carefully tuned prompts, agents, and multi-session workflows that extend the tools' reach further than the vendors originally imagined.

None of what follows is a takedown of those tools. The argument is that they alone are not enough. They were designed for a specific context — editor, session, one developer, one repo, one well-bounded change — and that context is not where most of the engineering work in real organizations actually happens.

The iceberg

Shipping software is mostly not coding. It is requirements gathering, stakeholder conversations, design docs, architecture reviews, feature flag wiring, secrets provisioning, CI pipeline updates, deploy runbooks, dashboard setup, alert tuning, incident response, post-mortems, migration plans, and deprecation notices. It is the scheduled meeting that becomes a Slack thread that becomes an RFC that becomes a backlog ticket that becomes, eventually, a short coding session and a pull request.

Here's another test. Ask an engineer to take the next ticket off their queue and complete it end-to-end inside a single Docker container — code, build, run, validate, deploy. For 99 of every 100 real tickets in a real enterprise codebase, the answer is no. They need multiple repos. Several services running in dev or staging. Credentials to external systems. Documentation scattered across Confluence and a few other internal sources. And usually a conversation or two with product, QA, or a teammate who knows the corner of the system where the bug lives.

The current tooling treats the session as the unit of work and leaves everything outside the session — which is to say, most of the work — to the engineer.

The rest of the team

Engineering is a team sport. The primary focus of coding agents has been the engineer-writing-code role. That makes sense as a starting point; it's where the highest-volume, most-bounded, most-codifiable work lives. Plenty of clever people have built workarounds to extend that focus to other parts of the team — agents that draft tickets, agents that summarize Slack threads, agents that turn a paragraph of intent into a Figma flow — but these are workarounds, not the design center of the tools, and someone is still there monitoring the agent on every step of the process, giving detailed instructions and often repeating those instructions multiple times.

The next generation of tooling has to make those other roles first-class citizens of the process, not afterthoughts. If the goal is team throughput, then concentrating all the AI investment on one role is ineffective. It's the local optimum of "make the developer faster" rather than the global optimum of "help the team ship more." A tool that only helps developers can only move the bottleneck to whichever role is next in the handoff.

This isn't a hypothetical claim, either. There's already industry data showing that some teams, after rolling out AI coding assistants without changing the rest of their development process, have actually seen overall delivery slow down — the front-end coding step gets faster while review, testing, and coordination start to choke. We will dig into the data behind that observation in more detail in the next article.

A walk through the lifecycle

Every team in every company has a development lifecycle. Some teams write theirs down explicitly. Others make it up as they go. Some are formal and governance-heavy. Others are loose and improvisational. The names vary — ideation, design, specification, implementation, configuration, deployment, refinement, monitoring, debugging, retirement, and others — and so do the boundaries between phases. None of that matters very much for this argument.

What does matter is the coverage pattern. Today's AI tooling concentrates almost entirely in the implementation phase, and even there it misses most of the coordination work between developers and the people they hand off to and from. Whatever phases an organization uses, the next generation of tooling has to support the entire lifecycle if AI is going to deliver on the team-level promise people keep making for it. Anything less is a tool playing in one corner of a much bigger problem.

The engineer as orchestrator

Current coding tools require the engineer to drive at a low level — prompt by prompt, session by session. Different tools have different mechanisms (chat, autocomplete, slash commands, terminal CLIs, IDE plugins) but the underlying interaction is still pretty manual. The engineer asks. The tool responds. The engineer reviews. The engineer decides what to do next. Repeat.

This was an excellent way to start. When the tools were new and went off the rails easily, tight engineer-in-the-loop control was exactly the right design. Since then, several things have improved. We've learned how to keep the tools in line — better prompts, better guardrails, better evaluation. The underlying models have improved at producing quality work. The interfaces have grown new affordances. And yet the fundamental shape of the interaction hasn't changed very much. The engineer is still the orchestrator, asking the tool to perform every step, granting every permission, and reminding the tool of context the tool ought to have remembered on its own.

A whole sub-industry has grown up around helping the tools do the right thing the first time — custom agents, hooks, prompt libraries, role definitions, project context files, MCP servers, on and on. These help. They don't address the fundamental shape of the problem, which is that the tools should be capable of running a process largely on their own, with clear, well-defined points where the engineer's judgment is needed — and only at those points does the engineer have to step in. Until that shape changes, the engineer remains the bottleneck for everything that happens around the coding step.

That's not a small cost. Research on interrupted work and context switching is consistent and old: it takes around 23 minutes to fully regain deep focus after an interruption (Mark et al., CHI 2008), and the workflow most engineers have with AI tools today is essentially a context-switch generator. Recent measurement work from METR has shown experienced developers running roughly 19% slower at real work in some controlled conditions, in part because of the cognitive overhead of constant prompting and review (Augment Code summary). The "AI fatigue" conversation that has emerged in 2025 and 2026 is the engineer's-eye view of the same phenomenon (Cerbos; ZEN Software).

Why single-session tooling hits a ceiling

The session is the wrong unit of work. If all we cared about was the code, then sure a session is fine, we start a session, tell the agent to code something up and end the session. We have the code, all is good. But we care about more than just the code. We care about designs, architectures, tests, QA processes, security and performance reviews, and on and on and on.

A work item — the thing that actually gets shipped — persists across many sessions, many agents, many repos, and many days. If the tool's unit is the session, the unwritten assumption is that humans will glue the sessions together into something coherent. They do, and that gluing is where the time goes.

This isn't just an industry observation; it has academic backing. Researchers at MIT Sloan and Microsoft argue in Chaining Tasks, Redefining Work: A Theory of AI Automation that AI's biggest impact comes from reshaping entire workflows — how tasks are sequenced, grouped, and handed off — rather than from speeding up any single task in isolation. Their concept of "task chaining" — clustering AI-friendly steps so AI executes them as a continuous sequence — is exactly the gap that session-bound tooling can't close on its own. They also point out that every handoff between AI and human carries coordination cost: review, validation, adjustment. End-to-end workflows minimize those handoffs; task-level workflows accumulate them. The session-bound coding assistant is structurally a handoff machine (MIT Sloan: How AI is reshaping workflows and redefining jobs).

Amdahl's Law is the right rhetorical anchor here. If the part of the job you're speeding up is 20% of the total, the ceiling on your overall speedup is low no matter how fast you make that part. Even a 10× improvement on the coding step lifts whole-job throughput by only about 1.2× when coding was 20% of the job to begin with. The published data on developer time allocation has been consistently in that range for years. The math is not friendly to "make the coding step faster and call it a day."

Practices, structure, and the SDLC as the differentiator

There's one observation that keeps recurring across the team-level studies: the teams that genuinely benefit from AI tools tend to share a cluster of practices. Fast feedback loops. Clear testing standards. Documentation discipline. Shared conventions for how agents are prompted, what context they're given, and what they're expected to return. Architectural ownership. Small, well-bounded work items.

That cluster of practices is what an SDLC actually is, whether or not anyone wrote it down. The teams that have one — explicit or implicit — are the ones absorbing AI tools well. The teams that don't are the ones that struggle. And once again I'll mention that if we just take our existing practices and try to shoehorn the ai coding practices into it we will not find that it fits, it is the square peg in the round hole problem.

What changes if we treat the whole lifecycle as the unit

If the work item, not the session, is the unit of work — and if the tooling supports the entire lifecycle, not just the implementation phase — several things shift at once. Coordination becomes a first-class concept rather than a human chore. Artifacts become durable across phases rather than ephemeral within a session. Review gates become part of the workflow rather than a separate meeting. Costs become attributable. Roles beyond the developer get genuine support.

That is the frame shift this series argues for. The productivity gains from better autocomplete are largely tapped. The next order of magnitude is in orchestration across phases, not generation within one — and not just generation for one role out of many.

Coming next

In Why AI Tools Make Some Teams Slower, the team-level data point this article kept hinting at gets the spotlight. DORA's 2024 State of DevOps report found a paradox: AI adoption increased individual productivity but was associated with declines in delivery throughput and stability. The teams losing on that trade are losing for structural reasons, not because the tools are bad — and naming those structural reasons is the setup for everything that comes after.

Sources

Beyond the Coding Assistant: A Series on AI-Assisted Software Engineering

Brian Zimbelman — Mon, 11 May 2026 10:58:30 +0000

This is the first article of Beyond the Coding Assistant, a multi-part series on AI-assisted software engineering at enterprise scale. The full series is available free of any paywall at https://articles.zimetic.com. Coming next: Article 1 — Your AI Coding Assistant Is Not Enough.

The last few years of AI-assisted development have been remarkable. Coding assistants have crossed real quality bars. Engineers can now produce working code, in unfamiliar languages, against unfamiliar systems, at speeds that would have looked like science fiction in 2022. There are real productivity gains, real new affordances, and a real shift in what an individual developer can do in an afternoon.

And yet — when the conversation turns to the team and the organization — the picture is more complicated. The dramatic gains many leaders were promised haven't shown up on every team. Some teams ship more. Some teams ship the same. Some teams have actually gotten slower, with the AI helping at the keystroke while the wider delivery metrics regress.

That gap, between what's possible at the keystroke and what's actually showing up in delivery, is what this series is about. The question I want to ask, and try to answer over the next several articles, is simple: what has changed, and what changes could take us so much farther than where current AI coding assistants have brought us?

A state-of-affairs question, not a tooling complaint

It would be easy to frame this series as a critique of current tools. That would also be wrong. The current generation of coding assistants is genuinely excellent at what it does. The problem isn't that the tools are bad. The problem is that the tools are designed for a single role on a much larger team, doing a small part of a very large multi-step process.

Software is shipped by teams — developers and product managers, designers, QA engineers, technical writers, program managers, security reviewers, compliance reviewers, devops, and more! Most of those people either don't write any code, or don't spend most of their time writing code. If the goal is team throughput rather than individual keystroke speed, optimizing one role's tooling is only going to get you so far. Instead, this series is looking at how we build tools and restructure the team and its workflows for the dynamics that are here today and will be here in the near future.

There's also a structural part of the story that's only now becoming visible. Token economics are shifting. The "burn-through-it" approach that worked when tokens were essentially free is getting expensive. The teams that have built disciplined development practices around AI tools are pulling away from the teams that haven't. None of this is anyone's fault, exactly. It's the natural moment in a maturing technology when the next set of questions starts to bite.

Where the series goes

I'll work through this in four parts.

Part I — The Shifting Landscape. Why now. The crafting of code is the visible tip of the iceberg, and I'd like to address it and the rest of the iceberg — much of which isn't touched by current AI tools. Recent studies have shown that many teams are actually getting slower with AI, so we will talk about that and why it's happening. And of course the economic changes that are forcing the question of how to make this work economically. And finally, the quality-speed-cost trilemma that frames everything after.

Part II — Reframing the Problem. If the team is getting slower with our current SDLC and processes, are they the right processes for the team going forward? One of the key aspects of the Agile Manifesto was that we continue to improve our processes. So, we owe it to ourselves to evaluate whether the processes we built for a much longer coding step are still the right processes today. Let's also bring into view better ways to use our coding agents as we start to consider having them take on more and more of the SDLC. One of these changes is the true implementation of multi-pass workflows. We will also discuss why different kinds of work need different workflow shapes, and the benefits of specialized agents instead of generic workers. And then we will discuss how organizations may want to manage compute and AI resources in pools that engineers can utilize in a much more economical and controlled manner.

Part III — Design Principles. The coming economic changes are going to make cost a first-class constraint. They are also going to require us to manage the project's context in different ways than we have had to in the past, so that we can get optimal performance and cost effectiveness out of our agents. Managing our shared resources gets more complicated as we start to have pools of agents updating things in parallel. And of course, we need to make sure we are doing all of this in ways where the engineers are not being overburdened, and the entire team gets to come along in a meaningful way.

Part IV — The Road Ahead. I will share some of what I'm building and why I think it will move us forward. But let me reassure you: this series is not a sales pitch for that tool. It is more my way of sharing my thoughts on the state of the industry as we make this monumental transition, and what I think is happening. Feel free to skip Part IV if you are not interested in my tool, but please don't skip the discussion about how our industry is changing.

Each piece stands alone. Read in order, they build a cumulative argument: the next frontier of AI-assisted development is lifecycle orchestration, not better code generation — and it has to serve the whole team, not just the engineer.

A note on tone, and an invitation

These are my thoughts, based on what I'm seeing and hearing across the industry. They're claims, not conclusions. I have opinions and I'm going to defend them, but the whole point of publishing in public is to sharpen the ideas against readers who disagree. Your feedback is welcome, and desired.

I'm also building a tool that applies these ideas. I'll describe it in Part IV, and I'll keep references to it brief in the meantime. The series is about the ideas first; the tool is one way to test them. My goal is to get you thinking about these ideas, these changes and get a dialog going so we can all learn and grow in a meaningful way.

How to follow along

I will be publishing three articles a week (Monday, Wednesday, and Friday), with the goal of having the entire series published in four weeks from start to finish. Feel free to drop by on a regular basis, or to sign up for notifications when articles are published. A hosted landing page at https://beyond-the-coding-assistant.ghost.io/ lists all of the articles in this series, including the expected publication dates of the upcoming pieces. That's the home for the free, paywall-free reading copy of the series. Bookmark it if you want to follow along; subscribe if your platform of choice supports it.

Coming next: Your AI Coding Assistant Is Not Enough. The iceberg of non-coding work, why current tools concentrate almost all their value on a fraction of the engineering job, and why "make the developer faster" is a local optimum that's already running out of room.

If you've read this far, thank you. Please join us for the ride.