DEV Community

Cover image for Breaking the doom-prompting loop with spec-driven development
Vincent Burckhardt
Vincent Burckhardt

Posted on • Originally published at bitsofrandomness.com

Breaking the doom-prompting loop with spec-driven development

Bringing software engineering discipline to AI-assisted coding

Every developer using AI coding tools has experienced the loop. You prompt, the AI generates code, something isn't quite right, you prompt again, the AI breaks something else while fixing the first issue, you prompt again. An hour later you're deeper in the hole than when you started, caught in what's now called doomprompting: you keep going because you've already invested so much time.

This is the dark side of vibe coding, Andrej Karpathy's term for fully surrendering to AI-generated code without really understanding it. Karpathy himself noted it's "not too bad for throwaway weekend projects." For anything more substantial, the approach tends to collapse.

I've been using spec-kit, GitHub's toolkit for spec-driven development, and it's changed how I think about AI-assisted coding. The core insight is simple: catching problems in specifications costs far less than catching them in code.

The shift-left principle applied to AI coding

Shift-left testing is the idea that catching defects earlier in development is cheaper than catching them later. Everyone who's debugged a production issue knows this intuitively: finding a problem in requirements costs almost nothing, finding it in code review costs some rework, finding it in production costs a lot more.

Spec-kit applies this principle to AI-assisted development, but shifts even further left. Instead of catching issues through testing code, you catch them through reviewing specifications. The four-phase workflow makes this explicit: Specify, Plan, Tasks, then Implement. Each phase has a gate where you review before proceeding.

This feels familiar to anyone who studied software engineering formally. I remember university projects where we spent weeks on specifications and architecture before writing a line of code. The discipline felt excessive at the time, but the coding phase was remarkably smooth when we finally got there. Spec-kit brings that same rigor to AI-assisted development.

What spec-kit actually provides

The toolkit is agent-agnostic and works with Claude Code, GitHub Copilot, Cursor, and other AI coding tools. At its core, it's a set of slash commands that guide you through structured phases:

The /specify command forces you to articulate what you're building. The /plan command generates research and technical direction. The /tasks command breaks the plan into discrete implementation steps. Finally, /implement executes those tasks.

Each phase produces markdown files that serve as both documentation and AI context. The specifications, plans, and task lists persist across sessions, acting as memory that keeps the AI aligned with your intent.

Spec-kit also introduces what it calls a "constitution" (I prefer "principles," but the concept matters more than the name). This file establishes cross-cutting rules for your project: testing approach, coding standards, architectural constraints. These non-functional requirements apply to everything the AI generates.

How the flow changes day-to-day work

My workflow with spec-kit looks different from the typical AI coding loop. I spend time reviewing and editing the specifications and task list, then let the AI implement the full feature. I treat the AI less like a pair programmer and more like a developer I'm delegating work to. I review the resulting code the way I'd review a pull request from a human team member.

This mental model matters. With pair programming, you're watching every keystroke. With delegation, you're reviewing outcomes against specifications. The latter scales better with AI tools that can implement substantial features autonomously.

The plan phase has become the most valuable. The AI performs research on the technical direction, and I've learned things from this process. More importantly, I catch misunderstandings early. During one project, the plan revealed the AI assumed an IBM Cloud serverless service was deployed on a VPC, which is incorrect. Catching that during plan review was far cheaper than discovering it through broken infrastructure code.

I don't review every single code change anymore. Instead, I review the specifications carefully, let implementation run with auto-accept enabled, do smoke testing, then review the full changeset. If issues emerge, I iterate through the full flow (plan to tasks to implementation) rather than jumping straight to code fixes. This keeps the specifications accurate and aligned with what actually got built.

The overhead question

Spec-kit adds overhead. For simple tasks, that overhead isn't worth it.

But for larger features, I've found the investment pays back. The specifications force me to think through requirements properly. Architectural problems surface during plan review rather than after I've invested in code. And I avoid the doom-prompting loop because ambiguities in my thinking get resolved during specification, not through trial-and-error prompting.

This parallels traditional development. Some developers code first and spend months fixing bugs and refactoring. Others invest in architecture and specifications upfront. Both approaches can work, but they have different risk profiles. For complex work, the methodical approach tends to win. The same applies to AI-assisted development.

The token usage goes up when using spec-kit. You're generating specifications, plans, and task lists before writing code. But these tokens typically pay for themselves by avoiding the doom-prompting loop where you might burn through tokens endlessly without making progress.

Prompt-based flows versus coded pipelines

One aspect of spec-kit's design surprised me. My initial instinct would have been to implement most of the workflow in a traditional programming language with explicit control flow. Instead, spec-kit encapsulates the flow in detailed prompts with minimal supporting scripts.

This approach works well with frontier models. The prompts describe phases in natural language, and the AI follows them reliably. The templating approach with gates provides deterministic outcomes without requiring coded orchestration nodes like you'd find in LangGraph.

I suspect this approach would be less reliable with non-frontier models. The ability to follow complex, multi-phase instructions consistently requires the kind of instruction-following that frontier models do well.

Beyond the underlying model, I've noticed the tools available in each AI assistant matter. The plan phase benefits from web search, codebase search, and other research capabilities. Claude Code includes these out of the box, including deep search for thorough research. Other AI assistants may lack some of these capabilities, and I've seen the most variance in plan quality when research tools are limited.

Configuring MCP tools before running through the flow also improves results. For instance, I configure tools for Terraform module registry search and cloud provider documentation lookup. These help the AI generate better-informed plans.

Adapting for infrastructure as code

When I started using spec-kit, I thought it would apply directly to infrastructure as code. As I progressed, I realized IaC has specific characteristics that need different handling: the declarative nature of tools like Terraform, the need to separate cloud-agnostic requirements from provider-specific implementations, governance concerns around security and cost that differ from application code, and validation against actual cloud provider APIs and module registries.

I ended up creating iac-spec-kit and open-sourced it to get more collaboration on the approach. It started as a fork, but ended up as a complete reimplementation of the commands, instructions, and templates. The only common layer is around the installer and the overall approach. The templates and prompts needed to be tuned specifically for infrastructure concerns.

The goal is to fill a gap where users can start with a high-level requirement like "deploy WordPress" or "set up a three-tier web app" and have the AI guide them through specification, planning, and code generation with review gates at each phase. The toolkit is cloud-agnostic and works with AWS, Azure, GCP, IBM Cloud, and others. Early tests look promising. I documented one end-to-end example at vburckhardt/wordpress-ibm-cloud, which shows the full workflow from initial requirements through generated Terraform code.

A specific focus has been getting AI to compose higher-level Terraform modules rather than using lower-level providers directly. AI-generated code that glues together curated, supported modules is more maintainable and supportable than code that reinvents infrastructure patterns using primitives. It's similar to teaching AI to use a well-designed library instead of writing everything from scratch.

What this enables

Spec-kit enables going beyond vibe coding. The structured flow feels right because it aligns with how sound engineering should work. You're not just prompting and hoping. You're defining intent, reviewing plans, and delegating implementation.

The specifications also work well for collaboration. They're markdown files that can be checked into source control and versioned. I can see workflows where teams have validation gates on specifications and plans before implementation begins. The artifacts serve as shared understanding, not just AI context.

For resuming sessions, the specification and task files act as memory. Instead of re-explaining context to the AI, the toolkit instructs it to load the existing artifacts. This makes long-running projects more manageable.

The structured flow also enables working on multiple features in parallel. While the AI implements one feature autonomously, I can work on specifications for the next one. This pattern is emerging more broadly with tools like OpenAI Codex that explicitly support parallel task execution. I expect this to become more common. The implications cut both ways: it lets independent developers and small startups move faster with limited headcount, but it also raises questions about expectations placed on developers in corporate settings.

The flow does require discipline. It's tempting to skip straight to implementation when you think you know what you want. But ambiguity in your thinking becomes apparent when you try to write it down as a specification. That's the point. The specification phase forces clarity before you've invested in code.

When it's worth it

Spec-kit won't eliminate all the friction from AI-assisted development. The overhead is real, and it's not worth it for every task. But for substantial features where you'd otherwise end up in a doom-prompting loop, the structured approach catches problems when they're cheap to fix.

The shift-left principle applies: review specifications, not just code. Treat AI implementation as delegation, not pair programming. Invest in the plan phase when research can improve technical direction.

If you're frustrated with vibe coding results on anything beyond weekend projects, spec-driven development is worth trying. The discipline feels familiar to anyone who's done rigorous software engineering, and the payoff is similar: smoother implementation because the thinking happened upfront.

Top comments (0)