The shift toward agentic development

#ai #agents #speckit #programming

Some thoughts on where software engineering is heading

Over the past two years, software development has changed in ways that feel significant. These are patterns I'm noticing both in my own work and across the industry.

I've been using AI coding tools in personal projects for over two years. The evolution has been clear. It started with copying code from ChatGPT and pasting it into an IDE. Then came tab completion with Cursor and GitHub Copilot, which was helpful but not transformative. The real shift happened when Cursor introduced agentic capabilities, before Copilot had similar features. More recently, I got access to IBM Bob at work, which resembles Cursor 1.X and GitHub Copilot in approach. Most recently, Claude Code with its predominantly agentic workflow has reinforced what seems to be the direction things are heading. Cursor 2.0's release in late 2025 appears to confirm this trend, with the agentic approach as the default option and the traditional IDE features taking a secondary role.

We're moving beyond the IDE as the center of our work

The most striking change is where time actually gets spent during development. With Cursor 2.0's release in October 2025, it became undeniable: the vast majority of time is now spent in the agentic part of the tool rather than traditional IDE features. There's something ironic about this, given that Cursor's name presumably references the blinking cursor where we type code. That cursor, that act of typing line by line, increasingly feels like it's from a different era.

What I mean by "agentic" is end-to-end generation with supervision rather than autocomplete. This isn't about tab completion suggesting the next line (though Cursor does that too). It's about describing what needs to be built, providing context about the codebase, and then supervising as the AI generates entire features across multiple files. This is a fundamentally different interaction model from copying snippets from ChatGPT two years ago.

But here's what's important to clarify: this doesn't mean the IDE is dead or that understanding code is no longer necessary. Rather, the center of gravity has shifted. Where development used to mean spending the day writing code with occasional AI assistance, it now means orchestrating AI to write code, with occasional manual intervention. It's a subtle but fundamental difference in how the work feels.

The emergence of specification frameworks

This shift has created a need for better ways to communicate intent to AI systems and to go beyond vibe coding. Spec Kit, which GitHub released in September 2025, illuminates something important: what we're really dealing with is a structured way to create advanced prompts that prepare the AI for the coding phase.

The term "advanced prompts" isn't meant dismissively. Speckit implements a four-phase process: Specify, Plan, Tasks, and Implement. What this does, conceptually, is force thinking through the problem at a higher level of abstraction before any code gets written. The tool itself isn't magic, it's essentially a framework for breaking down requirements into pieces that an AI can reliably execute. But that structure matters enormously.

What's particularly valuable is that this approach allows making corrections at the specification level, which is much cheaper than making them in the code. If there's ambiguity or a misunderstanding, catching it during the specification phase means you don't waste time generating and then debugging incorrect code. Anecdotally, this is interestingly another manifestation of shifting left in the development process, catching issues earlier when they're cheaper to fix.

This addresses something important that often gets glossed over in discussions about AI coding. There's a lot of talk about "vibe coding" or full-app generation, where you describe what you want at a high level and the AI just builds it. That sounds appealing, but in practice, it rarely works well for anything beyond trivial examples. What Speckit does is something more subtle and more valuable. It's not trying to enable vibe coding. It's trying to enable structured thinking that then guides the AI through a disciplined implementation process.

The pattern is consistent: when the specification phase gets skipped and the jump goes straight to asking the AI to build something, results are almost always disappointing. The AI might generate code that looks reasonable at first glance, but it doesn't quite align with what was actually needed. It's not because the AI is bad at coding, it's because the specification wasn't clear enough, or edge cases weren't thought through, or the specification was ambiguous in ways that only became apparent when seeing the implementation.

By writing clear specifications first, using a framework like Speckit, ambiguities in thinking get caught before they turn into code that has to be rewritten. This is fundamentally different from just prompting an AI to "build me a user authentication system." The specification phase forces articulation of details like: what should happen when a user tries to log in with an expired session? How should password reset tokens be generated and validated? What's the token lifecycle? These details matter, and thinking them through upfront leads to much better results from the AI.

GitHub's own explanation for why they built Speckit captures this well. They wrote that "we treat coding agents like search engines when we should be treating them more like literal-minded pair programmers." That's exactly right. These tools do what you ask, not what you meant. The specification step is where you figure out if what you're asking for is actually what you mean. It's the difference between treating AI as a magic wand that somehow divines your intent versus treating it as a very capable but very literal collaborator that needs clear instructions.

The workflow is inverting: from coding to planning and review

This brings us to what may be the most significant change happening right now. The workflow of software development is inverting. Instead of spending most time coding with some time spent planning and reviewing, development now means spending most time planning and reviewing with the actual coding handled by AI.

The time breakdown appears to follow a pattern: roughly 40% goes into setting up context and writing specifications, maybe 20% waiting while code gets generated, and then 40% reviewing what was produced and verifying it does what was intended. Other developers describe similar patterns, which suggests this isn't idiosyncratic to any individual workflow.

What strikes me about this distribution is how much it resembles the way senior technical people have always worked. An architect or tech lead typically doesn't write most of the code themselves. They do the planning, set the strategic direction, delegate the implementation to other engineers, and then review the output through pull requests to ensure alignment. The difference now is that the "other engineers" doing the implementation are increasingly AI systems rather than junior human developers.

It's worth clarifying what "review" means in this context. The pull request review process isn't primarily about checking code quality anymore. Tools like linters, formatters, and code coverage systems have handled quality checks for years. The review is about verifying alignment with specifications and customer needs. Does the implementation actually solve the problem we're trying to solve? Are edge cases handled correctly? Does this fit with the broader system architecture? These are the questions that require human judgment.

There's another valuable aspect to reviewing AI-generated code. Spending time examining the implementation often generates additional thoughts and perspectives on the next steps. It's a bit like bouncing ideas off a colleague. You see how the AI interpreted your specification, which might reveal gaps in your thinking or suggest alternative approaches you hadn't considered. This feedback loop, where reviewing code feeds into the next iteration of development, is valuable beyond just catching errors.

The shift to smaller, senior-heavy teams

Instead of a team structure with 3-4 senior engineers coordinating with a larger group of junior engineers (sometimes outsourced) who do the actual coding, it's now possible to have effective teams of just 3-4 senior engineers who do the planning, specification, and review while AI handles the implementation. The coordination overhead drops dramatically because there are fewer handoffs and less need for detailed task delegation to human implementers.

The implications here are sensitive but worth addressing directly. This shift seems likely to impact the people who currently do most of the actual coding: junior engineers and potentially outsourced engineering teams. The pattern in the job market seems to reflect this, although it is still early and to take with some caveats. More than half of open roles are now at senior level or above, and companies are increasingly prioritising candidates with AI engineering skills. The emphasis is shifting toward engineers who can architect, specify, and review rather than those who primarily implement.

For those with experience, the change has been largely positive. Iteration happens much faster. The feedback loop is tighter, which means catching problems earlier. Instead of leaving a review comment on a pull request and waiting days for the developer to make changes and submit another version, that next iteration can happen almost immediately. This cuts down significantly on meetings and coordination overhead. But this advantage accrues to those who have the experience to know when generated code is subtly wrong or when a specification was ambiguous. The question of what this means for people trying to enter the field without that foundation is a real concern.

The longer-term implications are harder to reason about. If we're increasingly selecting for senior engineers and reducing opportunities for juniors, where do the senior engineers of 2030 come from? There will always be a need for humans in the loop, if only because someone needs to be accountable. Someone needs to interface with customers, make judgment calls when requirements conflict, and take responsibility for the outcomes. But the path to building that expertise may need to look different than it has historically.

Frontier models: past the big bang phase

The final observation relates to the AI models themselves. Over the past year or so, there haven't been the kind of dramatic breakthroughs in raw intelligence for text generation that characterized the 2022-2023 period.

What's noticeable, both in industry positioning and in practice, is that OpenAI and Anthropic seem to be focusing more on building ecosystems around their existing models rather than racing to release dramatically more intelligent versions. By ecosystem, I mean tools and integrations that make the models more useful: Claude Code as a coding interface, Claude's integration with third-party providers, the emergence of Model Context Protocol (MCP) for better context handling, OpenAI's Atlas browser for web interaction. The focus is on making existing model capabilities more accessible and practical rather than just making the models themselves smarter.

A separate but related trend is the rapid commoditization of model capabilities. Claude Haiku 4.5 came out in October 2025, and Anthropic explicitly stated that it delivers similar levels of coding performance to what Claude Sonnet 4 did five months earlier, but at one-third the cost and more than twice the speed. What was frontier performance half a year ago is now available in a much smaller, faster, cheaper model. This pattern of capabilities trickling down to smaller, more efficient models seems to be accelerating.

This pattern suggests we're in a different phase now. The big bang phase of "let's make the model smarter by scaling it up" seems to be giving way to a phase of optimization and finding better ways to use what we already have. Advancement is happening more in the tools around the models (things like agentic workflows, better prompt engineering frameworks like Speckit, and improvements in speed and cost efficiency).

The technical leaders in the field seem to be saying similar things, though they phrase it in different ways. Yann Le Cun from Meta has been quite vocal that current LLMs represent something of a dead end, though he's talking more about AGI than about practical coding applications. The broader point (that we're not seeing the same rate of improvement from simply making models bigger) seems increasingly accepted.

For practitioners, what this means is that competitive advantage increasingly comes from how well you can orchestrate these tools rather than from having access to a slightly better model. The models themselves are commoditizing. Opensource models, such as Llama models from Meta, are catching up to the proprietary ones. The difference between Claude Sonnet 4.5 and GPT-4 feels less significant than the difference between someone who knows how to write good specifications and someone who doesn't.

Looking back over a two-year journey with these tools, the pattern becomes clearer. The shift isn't that model intelligence matters less now. Rather, the LLM architecture itself seems to be reaching a point where making it significantly smarter becomes increasingly difficult. The dramatic jumps in capability we saw from GPT-3 to GPT-4 are harder to reproduce. This is why the industry focus has shifted toward building better ecosystems around existing models and optimizing what we already have. It's not that intelligence doesn't matter, it's that we're hitting diminishing returns on the "make it smarter" approach, so the innovation is happening elsewhere.

Where this leaves us

These are observations and patterns, not predictions. The software engineering profession is changing in real-time, and it's worth trying to make sense of it while working in the middle of that change.

The implications for the profession are significant and somewhat uncomfortable to think about. There's a real question about how people develop the expertise needed to be effective in this new model if the traditional path of starting as a junior engineer and learning by doing is being disrupted. There's also a question about what happens to the large population of developers whose primary value was implementation speed rather than system design or architectural judgment.

But there's also opportunity here. For experienced engineers, these tools are genuine force multipliers. Building more, iterating faster, and maintaining higher quality than before is now possible. For organizations, smaller teams can accomplish more with better tooling and clearer specifications. And for the field as a whole, if we can figure out how to preserve the knowledge-building pathway while taking advantage of AI assistance, we might end up with a profession that's more focused on problem-solving and less on the mechanical aspects of translating solutions into code.

Are others seeing similar patterns? Do the days feel different than they did a year or two ago? The conversation about AI and software development often feels polarized between "nothing will change" and "everyone will be replaced." The reality seems to be somewhere in the messy middle. Things are definitely changing, the changes are meaningful, but the future isn't predetermined. How we adapt, what we choose to value, and how we structure the profession going forward will shape what software engineering looks like in five or ten years.