Part 1: Spec-Driven Development - Building Predictable AI-Assisted Software

#specdriven #ai #productivity #claude

You know that feeling when you're using AI to generate code, and it seems to understand exactly what you want? The AI suggests a complete implementation. You accept it, run your tests, and... something's off. So you refine your prompt with more context. The AI generates a different approach. Tests still fail. You add more details to your conversation, but now the context is getting long and the AI starts hallucinating—confidently suggesting functions that don't exist, mixing up variable names, or contradicting its earlier recommendations. Before you know it, you're caught in a trial-and-error loop, spending more time debugging and re-prompting than you would have writing it yourself.

I've been there. And I've found a better way.

This is the first part of a series where I'll share how I built Station Station—a personal project for tracking Melbourne train commuters' office attendance—using Spec-Driven Development (SDD). This isn't a promotional piece. I'm going to show you the actual challenges, the times when AI failed spectacularly, and the structured approach that made AI assistance genuinely productive instead of frustrating. By the end of this series, you'll have a clear framework for deciding when SDD makes sense for your projects and when it's overkill.

Vibe Coding vs Spec-Driven Development

That trial-and-error loop I described? It's often called "vibe coding"—chatting with AI, trying what it suggests, debugging when it breaks, and iterating until something works. No upfront planning, no structure, just vibing with the AI and seeing where it takes you. For quick experiments and throwaway scripts, vibe coding is perfectly fine. But for real projects you want to finish and maintain? The lack of structure becomes a problem.

Spec-Driven Development takes a different approach: structure first, then code. Instead of chatting your way to a solution, you invest time upfront to document what you're building, why you're building it, and what success looks like. Then you let AI implement those documented requirements. The AI still does the heavy lifting, but within guardrails you've defined.

The contrast is striking. Traditional AI chat often becomes a trial-and-error loop—you provide a vague prompt, get generated code, test it, realize it doesn't quite work, and start over. Spec-Driven Development follows a predictable path: gather requirements, write detailed specifications, break into tasks, implement systematically, and produce reviewable output. You spend more time upfront on specs, but save time on debugging and rework.

What Is Spec-Driven Development?

Spec-Driven Development is exactly what it sounds like: you write detailed specifications before any code gets generated. Instead of throwing vague prompts at an AI and hoping for the best, you invest time upfront to document exactly what you want to build, why you're building it, who it's for, and what success looks like.

Here's the core insight: AI is incredibly good at implementing well-defined specifications, but it's terrible at mind-reading. When you provide a structured spec with clear requirements, user stories, and acceptance criteria, AI can generate code that actually works. When you give it fuzzy requirements through conversational chat, you get fuzzy results.

Tools for Implementing SDD

There are several tools that help implement Spec-Driven Development workflows. I've experimented with a few, including OpenSpec, which takes a different angle on the problem.

OpenSpec focuses on change proposals for existing systems. Its workflow centers around proposing, reviewing, and implementing changes to established codebases, with support for multiple AI tools through the AGENTS.md convention. It's particularly strong when you're working with a team using different AI assistants or making incremental updates to existing projects.

For Station Station, I chose agent-os, which is optimized for building complete products from scratch. It handles the full product lifecycle—from initial idea through deployment—with deep Claude integration and sophisticated multi-agent orchestration. The spec format is requirements-based with explicit goals, user stories, and technical implementation details. They've recently introduced version 2 with enhanced capabilities.

Since I was building a greenfield personal project solo with Claude, agent-os was the natural fit.

The important thing isn't which SDD tool you use—it's that you use some structure to align human intent with AI implementation.

The Agent-OS Workflow

For Station Station, the agent-os workflow followed five phases:

Create Product: Define your product mission, target users, and core value proposition
Shape Spec: Gather requirements through structured AI-human dialogue where the AI asks clarifying questions
Write Specs: Convert those requirements into detailed technical specifications with explicit scope boundaries
Write Tasks: Break the spec into granular, actionable implementation tasks
Implement Tasks: AI-assisted coding with human review at key checkpoints

The magic isn't in any of these individual steps—product managers have been writing specs for decades. The magic is in how this structure channels AI's strengths while keeping humans in control of architecture and quality.

The Station Station Case Study

Over this series, I'll use Station Station as a concrete example of SDD in action. Here's the quick version: I'm a Melbourne train commuter working hybrid, and my company has a 50% office attendance policy. But there's no automated way to track whether I'm meeting that threshold. I have to manually review my Myki (Melbourne's metro card) transaction history and count which days I tapped on at my work station.

Tedious? Absolutely. Perfect problem for automation? You bet.

Station Station automatically determines office attendance by analyzing Myki transaction data. If you tapped on/off at your designated work station, it counts as an office day. The app presents monthly statistics, attendance calendars, and lets you export the data for compliance tracking. It's live at https://koustubh25.github.io/station-station/. You can use or deploy this tool yourself for free—I'll show you how in the final part of this series.

Here's what makes it a good SDD case study:

Real complexity: Bypassing Cloudflare bot detection, reverse-engineering undocumented APIs, handling timezone edge cases
8 completed features: Built incrementally across 3 phases using the agent-os workflow
Honest challenges: There were bugs AI couldn't fix. I'll show you exactly where human intervention was critical.
Real-world application: ~6,300 lines of code, fully responsive React dashboard with Lighthouse score 95+

This wasn't a toy project. It's a real application I use daily, built with real constraints, encountering real problems. In Part 2, I'll walk through the project in detail—the problem it solves, the technical challenges, and the 8-feature roadmap. In Part 3, we'll dive deep into the agent-os workflow. In Part 4, I'll share the honest truth about where AI failed and why.

But before we get there, let's set realistic expectations.

SDD Is Not Full Automation

Here's the critical thing to understand: Spec-Driven Development is not about handing requirements to an AI and walking away while it builds your app. It's a structured human-AI partnership where you maintain control of architecture and quality while the AI handles implementation and boilerplate.

Think of it this way: you're the architect and reviewer, the AI is your implementation assistant. You design the system, write the specs, approve the task breakdown, and review the code. The AI generates the boilerplate, implements standard patterns, writes tests, and handles the repetitive work you'd rather not do manually.

There's a collaboration spectrum, which I'll detail in Part 4:

Tier 1 - AI Can Handle Alone: Boilerplate generation, standard CRUD operations, test scaffolding
Tier 2 - AI + Human Review Required: Complex business logic, external API integration, cross-file refactoring
Tier 3 - Human Must Lead: Debugging multi-layered issues, architectural decisions, security implementations

Most real development work spans all three tiers. SDD helps you identify upfront which tier each task falls into, so you know where to invest your review energy and where AI can run autonomously.

I learned this the hard way. When I added manual attendance date tracking to Station Station, the AI couldn't fix a date-handling bug even after several debugging rounds. I had to review the code myself, identify the specific function where the problem lived, and guide the AI to the solution. The AI was perfectly capable of implementing the fix—once I identified what needed fixing.

That's the partnership. AI accelerates implementation. Humans guide architecture and troubleshoot the hard problems.

What's Next

In Part 1, we've established what Spec-Driven Development is, why it's more predictable than ad-hoc AI chat, and how it compares to other approaches like OpenSpec. We've introduced Station Station as a real-world case study with honest challenges included.

In Part 2, we'll explore the Station Station project in detail: the problem it solves, the 8 features I shipped, the technical challenges like Cloudflare bypass and API reverse engineering, and the metrics that prove this approach works in production.

If you're tired of the AI chat trial-and-error loop, if you want to leverage AI assistance without sacrificing code quality, or if you're just curious how a structured spec can turn AI from unpredictable to reliable—stick around. This series will show you exactly how it works, warts and all.