It is 2026, and many software engineers around the world are realizing that coding agents are capable of generating high-quality outputs. Yet adopting these tools involves trade-offs. Teams vary in how much effort they invest up front in specifying design/requirements versus afterwards in reviewing and testing the AI’s output.
Developing functional software with agents involves a three-stage process:
-
Upfront specification. Developers initiate the task by defining the task’s goal, outlining a plan, and furnishing essential context. This might include providing agents with specific rules (e.g., via
AGENT.MD, memory banks, and other markdown files that can fit the agent’s context) and detailed instructions, which the agents will use as the foundation for implementation. - Code generation. Agents utilize the provided context, rules, knowledge-banks, and plan to automatically generate the necessary code that fulfills the ultimate objective set by the developer.
- Output revision. In the final stage, the developer is responsible for reviewing the generated code, testing the software to ensure it runs correctly, and verifying that it functions as intended.
We can classify four distinct “modalities” of using coding agents, based on whether Upfront Specification Effort is low or high, and whether Output Revision Effort is low or high.
| Output Revision Effort | |||
|---|---|---|---|
| Low | High | ||
| Upfront Specification Effort | Low | Vibe coding. Loosely specified input prompt. The output is only functionally validated. | Guided Prototyping. Loosely specified input prompt. The output is functionally validated and also its implementation details. |
| High | Autopilot Coding. Uses coding agents to produce product increments with a high level of trust in the Agent outputs. The input is well specified in terms of functionality and technical design, but the output is loosely reviewed as it is considered of good quality. | Agentic Engineering. Using Agents to generate code following a software engineering process. Each line or the majority of lines of generated code is reviewed. There is a testing strategy in place that ensures software fulfills its intended use. |
Below, I describe each modality, its pros and cons, and recommendations for when to use it.
Vibe Coding
Vibe coding (term introduced by Andrej Karpathy) refers to using an AI coding assistant with minimal upfront specification and minimal code review. You “follow your vibes” by providing a loosely specified natural-language prompt for the feature or program you want, letting the AI generate the code, and then running it to see if it works. Crucially, you do not meticulously review the code; you validate it only by testing its functionality (does the app run and do what you asked?).
In Karpathy’s words, vibe coding means “fully give in to the vibes, embrace exponentials, and forget that the code even exists”.
The human acts more as a product manager or tester, focusing on describing goals and trying the software, rather than reading or structuring the code.
Pros
- Enables non-programmers to create software. Even people with little coding experience can produce working applications by describing what they want in plain English. For example, an accountant or designer could use tools like Replit’s Ghostwriter or Cursor’s natural-language interface to build simple apps, whereas before they might be limited to Excel macros. In short, vibe coding democratizes software creation by making “the hottest new programming language English”.
- Encourages rapid experimentation. Because you’re not spending time on detailed specs or boilerplate coding, you can quickly try out new feature ideas or product prototypes. Product managers and UX designers can spin up proofs-of-concept with AI to demonstrate an idea in code rather than writing a PRD or drawing a Figma mockup (see here). This fast, “just try it” approach aligns with the idea of optionality, which involves exploring multiple solutions in parallel since the cost to attempt each is low.
- Lowest upfront cost and development time. For a hobby project, hackathon, or pre-seed startup, vibe coding can deliver a minimal viable product extremely quickly and cheaply. Entire weekend projects or MVPs can be built in days or hours rather than weeks. This acceleration has already been observed in practice (e.g., Y Combinator’s CEO noted that in their Winter 2025 batch, 25% of startups had 95% of code generated by AI, heralding that “the age of vibe coding is here”).
Cons
- No control over code quality or architecture. Since you rely on the AI’s outputs without deep inspection, the codebase can be inconsistent or poorly structured, accumulating significant technical debt. One developer’s 27-day AI-coding experiment found that as the project grew, similar functions ended up implemented differently across the codebase, and components lacked awareness of each other. This kind of hidden complexity becomes costly later.
- Higher risk of bugs and security issues. Lack of code review means vulnerabilities can slip through. There have been incidents of AI-generated code doing dangerous things unexpectedly. For example, an AI coding assistant for one user deleted an entire database despite explicit instructions not to. Other vibe coder founders have been victims of their own unsecured creations, leading to “maxed out usage on api keys, people bypassing the subscription, creating random shit on db”.
- “Works now, breaks later” maintenance challenges. Because the human doesn’t fully understand the generated code, future modifications or debugging become daunting. As one engineer quipped, many vibe-coded apps hit a “maintenance wall” – everything looks great in a demo, but when a bug arises or a new feature is needed, the AI’s fixes often introduce new problems, since the AI has no memory of why it made certain architectural decisions. Without an engineer who groks the code, each fix can become a frustrating game of whack-a-mole.
Recommendation
Vibe coding can be recommended for non-developers or novice coders who want to build small, non-critical applications. It allows people with domain knowledge (but not coding skills) to automate tasks or create tools they otherwise couldn’t. For instance, a finance analyst “vibe-coding” a custom report generator.
In industry, some product and design teams use this modality to prototype ideas and create working demos to hand off to engineering, instead of just static specs. It’s also useful for early-stage startups or solo developers trying to validate a product idea quickly on a shoestring budget.
In these scenarios, the lack of code quality is acceptable because if the idea proves valuable, the team can rewrite or heavily refactor the code later (and doing so is now cheaper with AI assistance anyway). However, teams should avoid shipping vibe-coded prototypes to production in any long-lived or critical system. As one GitLab principal engineer put it, “No vibe coding while I’m on call!”.
Use vibe coding to explore and validate ideas, but plan to invest in a proper engineering pass if the project needs to be maintained.
Guided Prototyping
Developers or even non-technical people run a PoC to see how the AI would implement a feature. The output is reviewed to see if the implementation makes sense and is suitable to be used. In their book on Vibe Coding, Gene Kim and Steve Yegge call this modality “tracer bullet testing”. Tracer bullet testing involves “... creating a thin, working slice of a system that touches all critical components—UI, backend, database, and APIs—to prove the architecture works, rather than just prototyping isolated parts”.
In Guided Prototyping, the developer still provides a relatively loose natural-language prompt or goal to the coding agent (similar to vibe coding), but with an important twist: the developer thoroughly reviews the implementation details of the AI’s output and verifies that the approach makes sense.
Essentially, the AI quickly produces a proof-of-concept implementation, and a human then inspects that code (and likely runs it) to evaluate its correctness, quality, and suitability. This modality is akin to what Gene Kim and Steve Yegge (in their Vibe Coding book) refer to as “tracer bullet testing.” In software, a tracer bullet is a thin, working end-to-end slice of the system that touches all critical components (UI, backend, database, APIs) but only implements the bare basics. It’s a skeletal but functional version of the feature that shows the architecture will work.
Guided prototyping uses AI to fire off one or more of these tracer bullets: the AI writes a minimal implementation across the stack, and the developer examines this tracer code to see if it’s on the right track.
By reviewing and possibly refining the AI’s code, the engineer ensures the prototype’s implementation (not just functionality) meets their expectations. This might involve checking that coding best practices were followed, that the approach aligns with the intended architecture, or that the code is extensible. Essentially, guided prototyping uses the AI as a rapid coder to generate prototypes, but keeps a human in the loop for technical validation (as opposed to vibe coding’s pure “just run it” approach).
Pros
- Fast feedback on architecture & feasibility. Because tracer-bullet prototypes are working end-to-end, stakeholders get immediate visibility into a feature’s implementation. You can validate early whether the chosen tech stack, APIs, and integrations will actually support the requirements. This reduces risk by exposing architectural or interoperability problems earlier rather than after weeks of development.
- Mitigates risk through early testing. Instead of building parts in isolation, the thin vertical slice will reveal if any critical component (database, external API, etc.) is going to be a blocker. This “system smoke test” catches high-risk issues when they’re easier (and cheaper) to fix. In other words, you prove the path works before investing heavily.
- Team alignment and technical calibration. Having a working end-to-end prototype ensures that frontend and backend developers, QA, etc., share a common understanding of how the pieces connect. It’s a concrete reference implementation to discuss. Everyone sees the same thin slice, which promotes synchronization on design decisions early (avoiding big surprises later).
- Rapid course-correction and learning. If the tracer implementation is off-target, you find out quickly and can adjust the design or requirements with minimal wasted effort. The team can iterate on the prototype or try an alternate approach. In fact, with AI generation being so fast, you could have multiple prototype implementations of a complex feature created in parallel, exploring different approaches. This is an extension of the “optionality” benefit of AI – you might let the agent build two or three variant solutions and then pick the best one after review.
- Tracer code often becomes the foundation. Unlike throwaway prototypes, tracer bullet code is meant to evolve into the final product. So the effort isn’t wasted; the initial AI-guided slice provides a base to gradually flesh out with full functionality, with confidence that the foundations are solid.
Cons
- Can create a false sense of completeness: Since it touches all components, the "thin slice" might be mistaken for a production-ready feature.
- Requires high alignment on architecture first: If the underlying architecture is flawed, the tracer bullet will expose it, but potentially after significant effort has been invested.
- Overhead for non-trivial systems: Setting up a fully functional end-to-end slice, even a thin one, can be time-consuming for very complex systems with many dependencies.
- Focus on breadth over depth: The implementation might lack the robustness, error handling, and performance tuning required for production code.
Recommendation
I would recommend using Guided Prototyping when the team faces uncertainty in the implementation and you want to de-risk those unknowns early.
It’s essentially the AI-era equivalent of a technical spike or proof-of-concept, but more formalized and kept runnable. Teams can leverage coding agents to spin up multiple alternative tracer bullet implementations in parallel, then compare which approach is best.
This modality is valuable for projects where the architecture is not proven, or there’s debate about the best design: you can test the waters with minimal cost. Afterward, the knowledge gained should inform the real implementation.
In practice, organizations that already embrace Agile “spikes” or proofs-of-concept will find guided prototyping with AI a natural fit. It’s a way to harness AI’s speed to get concrete answers early in the development cycle, improving decision-making and reducing costly late-stage changes.
Autopilot Coding
Autopilot Coding is a modality where the team (or often a solo developer) provides a well-specified prompt and context to the coding agent. This means the desired functionality and even high-level technical design are clearly described and then let the agent generate a substantial chunk of the codebase with minimal human intervention or review.
In other words, the developers put the coding agent on “autopilot,” trusting it to produce production-quality code from a solid spec, and they do only a cursory review (if any) of the output. This approach has emerged in some cutting-edge small teams and individual projects, and it’s being pushed to extremes in research experiments.
In practice, autopilot coding might look like “outsourcing the development” to the agent by providing a detailed functional spec (and perhaps a software design document) and asking it to implement the whole thing (or a large portion of it) while you monitor progress occasionally.
It assumes high confidence in the AI’s capabilities and output quality. In fact, the team behind Cursor (an AI-augmented IDE) recently did an experiment very much like this: they ran hundreds of AI agents for a week building a browser from scratch, which resulted in over 1 million lines of code across 1,000 files. Simon Willison has also written about the Cursor experiment.
I believe this approach has a future as long as the software to be built is very well specified and can be verified in some way by the agent. If you want to build a web browser, your agents can implement multiple specifications for HTML, CSS, and JavaScript.
This modality is unorthodox, but it’s attractive to some because it promises unprecedented development speed. If one agent can write code 10× faster than a human, what about 10 agents working in parallel? Autopilot coding is essentially the “move fast” approach: spec well, then let the AI rip, and only later worry about fixes.
Pros
- Potential order-of-magnitude productivity gains. Advocates claim that with the right prompts and context, AI agents can generate features or even entire products extremely quickly. There are anecdotal reports of individual developers achieving “100×” or even “1000x” engineers using AI coding assistants end-to-end.
- Scales beyond a single agent (or human). Autopilot coding can involve multiple AI agents working in parallel, further boosting throughput. Tools like Conductor might facilitate this task. Also, Linear started to offer a new modality to run agents as independent coworkers.
- Extreme velocity for well-defined problems. If the problem is very well-specified (e.g., implementing a known standard or algorithm), coding agents can churn out a correct solution quickly. For self-contained, verifiable (there is an existing test suite that can validate behavior), specification-driven projects, autopilot coding essentially lets you “skip ahead” in time.
Cons
- Uncovered bugs and security issues.
- Potential tech debt accumulation
- Requires a thoughtful upfront specification for long-horizon agent work.
- For some companies, the code is their IP (Intellectual Property). Not understanding your own IP will affect the business.
- Unproven and high-risk. This methodology is very new and largely experimental. We have a few real-world success stories of an AI-developed codebase at scale that didn’t require major human fixes. As the 2025 DORA research noted, AI tends to amplify whatever setup you have. It can boost high-performing teams but also magnify dysfunctions in struggling teams.
- Uncertainty, lack of control, and security issues. When you trust the agent output without thorough review, you inherently accept a lot of uncertainty. The code may run initially, but hidden bugs or suboptimal choices can lurk beneath the surface. Security vulnerabilities, inefficient algorithms, or simply incorrect edge-case handling might only surface later (possibly in production). Indeed, data from early adopters shows that teams using a lot of AI without adapting their process have seen bug rates increase (by ~9%) and longer review times, with no overall improvement in delivery speed. These teams were able to speed up coding, but just created a bottleneck elsewhere or quality issues that neutralized the gains.
- Accumulated technical and comprehension debt due to “black box” code. A codebase produced largely by AI, without human oversight, can become difficult for developers to understand or maintain. This ossification is a new kind of tech debt: the AI’s design might be suboptimal, but changing it later is extremely costly when no one has the mental model of the code. This new type of effect has been called by some practitioners “comprehension debt.
- Loss of competitiveness. Moreover, if an organization’s competitive advantage is its software, treating the code as a mysterious artifact created by AI (rather than something the team deeply understands) is dangerous. The company’s intellectual property isn’t just the final code output, but the knowledge of why it’s built that way. By letting AI write everything, you risk turning your own codebase into a foreign legacy system from day one
- Bugs and incidents can be harder to resolve. In autopilot mode, you might encounter the nightmare scenario described earlier in vibe coding, but at a larger scale: something breaks in production and now no engineer is intimately familiar with that part of the system. Debugging is much harder when you have to ask an AI to explain code it wrote (and that AI might not even have the full context anymore).
- Requires exceptionally clear specifications. To even attempt autopilot coding, you must provide very detailed, precise requirements and technical guidelines to the agent(s). Some software components we use every day are heavily documented, and they must also comply with strict specifications. In those cases, this approach might work. For the rest of the software landscape, building such detailed upfront specifications has been proven problematic (remember the problems associated with waterfall processes?).
Recommendation
I think this modality is best suited for developers embarking on greenfield projects, where they are building something entirely new and unconstrained by legacy code. It should also be viable for developers tackling small-scale brownfield projects that they are intimately familiar with, where the existing codebase is manageable and well-understood.
However, a critical caveat must be applied: this approach is not recommended for mission-critical projects. Relying heavily on agents in such high-stakes environments significantly increases the risk of unforeseen bugs, complex technical debt, and system failures, the kind that might trigger your pager at the most inconvenient hour, such as 3 AM. The cost of a failure in a mission-critical system far outweighs the productivity gains.
For solo developers and small teams, this agent-centric coding methodology can deliver good initial outcomes and accelerate the early development phase of a project. The rapid generation of functional code provides a strong starting momentum. Nevertheless, this advantage is often counterbalanced by substantial risks as the project inevitably grows in size and complexity. The primary danger lies in the unreviewed technical debt and structural inconsistencies that agents can inadvertently introduce. This "unreviewed" code starts to creep over the codebase, acting like a slow poison.
This process significantly increases the code entropy—the measure of disorder and degradation in a codebase. High code entropy has a dual negative effect:
- Human Comprehension and Maintainability: It drastically affects a human developer's ability to understand, navigate, and therefore maintain the code. Complex, poorly structured, or inconsistent code slows down debugging and feature development.
- Agent Performance: Ironically, the agents themselves begin to suffer. As the codebase becomes more chaotic and less coherent, the coding agent's own ability to accurately understand the existing context and generate correct, high-quality new code diminishes. The agent is effectively generating code based on a deteriorating foundation.
Furthermore, if you are engaged in building something novel and innovative, the resulting code constitutes your Intellectual Property (IP) and is a core business asset.
In this scenario, you must maintain a deep, granular understanding of how the code works, including all the architectural decisions and underlying trade-offs. This deep knowledge is essential for strategic reasons: it allows you to respond quickly and effectively to market feedback, competitive pressures, and unexpected technical challenges.
By outsourcing this foundational knowledge and control to coding agents, the business inherently places itself at significant risk. The loss of direct, human-expert IP ownership can compromise the ability to innovate and adapt, fundamentally putting the future of the product and the business itself in jeopardy.
Agentic Engineering
In this modality, teams have adopted not only coding agents but also AI throughout the whole SDLC. They invest in context-engineering so when asking the coding agent to create a plan, the agent has access to several internal documents and specifications that can guide the generated implementation into the organization's best practices. They have coding rules, documented architecture, documented design patterns, and best practices available for Agents to be used. When the developer generates a plan, it is usually very detailed in terms of functionality and non-functional requirements. The chunk of work specified in the plan must be kept as small as possible so the size of the generated implementation is also acceptable for a human reviewer (as the 2025 DORA report shows). Teams also leverage the coding agent to review the code or implement specific reviewer agents. Once the PR is created, other human reviewers will finally approve the code. For this approach to work and be production-ready, teams implement several guardrails: rigorous automation testing, high testing coverage, TDD, CI/CD, linting rules, among other practices. Another recommendation provided by the DORA report is Value Stream Management (VSM). VSM is described as the practice of visualizing, analyzing, and improving the flow of work from idea to customer. The use of VSM should help organizations to track how AI affects lead time, rework, and deployment frequency (more here).
Essentially, it’s traditional software engineering supercharged with AI, rather than letting AI run wild.
Key characteristics of agentic engineering include:
- Extensive context and prompt engineering. Teams invest heavily in providing the AI with the right context: up-to-date internal documentation, architectural guidelines, coding standards, and even organization-specific knowledge bases. The agent is not coding in a vacuum; it’s informed by the company’s best practices and the project’s design specs.
- Small, incremental tasks. Rather than asking the agent to build an entire feature in one go, work is broken into small chunks (perhaps a few lines to a few dozen lines of code) that fit within the AI’s attention span and can be reviewed easily. The 2025 DORA report found that working in small batches is still crucial even with AI. Teams that maintained incremental change discipline reaped more benefits, whereas AI tended to increase PR sizes by 154% when unmanaged.
AI-assisted code review and testing. In agentic engineering, AI doesn’t just write code but also helps review code. For instance, after an AI generates code, a separate AI agent (or the same agent with a different prompt) might statically analyze that code, suggest improvements, or point out potential bugs. The human developer then reviews both the code and the AI’s review comments. Lately, there has been a lot of innovation in this space.
Human oversight and final approval. Unlike autopilot coding, here every line of code is ultimately reviewed by a human (or at least the vast majority of lines, with trivial changes possibly an exception). This ensures that knowledge of the code is internalized by the team and nothing unintelligible slips in. As Kim and Yegge’s book emphasizes, “delegation of implementation doesn’t mean delegation of responsibility”.
Comprehensive guardrails in the SDLC. Agentic engineering often requires that the organization have strong engineering practices already in place: continuous integration/deployment, high test coverage (often >90%), linting and static analysis, security scans, etc. These guardrails catch mistakes, whether made by humans or AI. In fact, DORA’s 2025 research found that AI acts as an amplifier where good practices yield even better results with AI, and poor practices just get amplified into bigger problems.
Value Stream Management (VSM) and metrics. Because of the propensity of AI to shift bottlenecks, top teams use VSM to track the flow from idea to production. The DORA 2025 report specifically highlights VSM as critical to turn individual AI productivity into organizational performance.
Pros
- Significant productivity boost without sacrificing quality. Organizations practicing this modality report substantial improvements in throughput while maintaining or even improving code quality. For instance, Booking.com found that after introducing specialized AI coding agents (within a robust dev process), they achieved a ~30% productivity gain, lighter code reviews, and faster deliveries. Unlike vibe coding, where speed comes at the cost of quality, agentic engineering strives for both speed and quality by catching issues early and often.
- Human oversight ensures maintainability and shared knowledge. By having developers review AI contributions, the team remains in control of the architecture and understands the codebase. This avoids the “black box code” problem, where knowledge is not lost to the AI.
- Faster iteration from plan to working code. When AI can handle the boilerplate and rote coding tasks, developers spend more time on higher-level design and polishing.
- Focus on higher-value work for humans. Since the AI is churning out the basic code, human developers can focus on what humans do best: making judgment calls on architecture, tackling particularly tricky algorithmic or edge-case problems, and handling creative design tasks.
- Robust, high-quality codebase. With strong guardrails (CI, tests, linting) and careful review in place, the final code that reaches production can actually be more robust than before, because the team can afford to enforce stricter quality standards.
Cons
- Requires process maturity and upfront investment. Not every organization is ready to implement this modality. If your CI/CD is flaky, tests are lacking, or your documentation is poor, trying to add AI into the mix can backfire. DORA’s research indicates that only the most mature teams currently see strong benefits from AI, whereas many teams see little to no improvement because their bottlenecks simply move elsewhere.
- Slower than pure AI in the short run. By insisting on human review and small batches, you naturally throttle the AI’s raw speed. For startups in very early stages, this overhead might feel stifling.
- Needs high-quality prompts and context engineering. The effectiveness of the coding agents depends heavily on the quality of the input they’re given. A poorly guided AI can waste time, requiring multiple redos, negating the efficiency gains.
- Maintaining developer skills and engagement. If the AI is handling a lot of the routine coding, developers might lose practice in those areas. Some have expressed concern that over-reliance on AI for low-level coding could, over time, erode engineers’ ability to code without AI or to dive deep into debugging complex issues. Additionally, there’s a cultural shift: developers have to embrace a more meta-role (like a conductor) which not everyone may enjoy or excel at. There can be initial resistance (“Am I just here to babysit the AI?”). Engineering leaders need to manage this change and ensure team members still feel ownership and pride in the work.
- New failure modes and complexity. Introducing AI into the workflow creates new ways things can go wrong, like introducing a security hole that all reviewers miss because it looks fine, or the AI’s suggested refactor might break something non-obvious. Debugging an issue in code that was partially AI-generated might be tricky if the code is written in an unfamiliar style. Moreover, coordinating multiple AI agents (for coding, reviewing, etc.) is itself a complexity; it’s like adding new team members who work blazingly fast but need training. The processes to manage AI output (like deciding when to trust it vs. override it) are still evolving.
Recommendation
Agentic engineering is emerging as a suitable modality for professional software teams that need to balance velocity with reliability. This is especially true for enterprise environments, organizations building long-term proprietary systems, and any product where understanding and maintaining the code is critical.
If your software is core IP for your business (your competitive advantage), you can’t afford to treat it as throwaway; you need to deeply understand it, which means humans in the loop.
This approach is well-suited for teams in regulated industries or mission-critical domains (finance, healthcare, etc.) where there’s zero tolerance for unchecked code. It’s also appropriate for large codebases and legacy systems, such places where uncontrolled AI codegen could wreak havoc, but guided AI use could significantly improve developer productivity (for example, using an agent to safely refactor a legacy module under close tests and review).
To implement this modality, organizations should lay the groundwork: invest in automated testing (high coverage), continuous delivery, and developer tooling. They should also train the team in how to work with AI (the “head chef mindset” of orchestrating AI assistants). The 2025 DORA report suggests adopting “clear AI policies” and providing training and playbooks for AI usage. Everyone on the team should know how and when to use the coding agent, what the review protocols are, and what the acceptance criteria for AI-generated code are. Essentially, treat the AI as an ultra-fast junior developer who needs mentorship and strict code review.
Conclusion
Coding agents are a reality in 2026, and teams are widely adopting them. This article defines four distinct modalities for adopting these agents, each with its own pros, cons, and recommended use case.
It's important to note that a single team can adopt multiple modalities across different phases of their Software Development Life Cycle (SDLC). For instance, teams using Agentic Engineering might also employ:
- Vibe Code for developing small utilities or dashboards.
- Guided Prototypes to evaluate the trade-offs of a feature before starting analysis.
- Autopilot Coding for running controlled migrations in specific cases.
The greatest value from this technology will be realized by teams that effectively integrate it to solve business problems, all while carefully managing the long-term benefits and potential challenges.
Top comments (0)