Tufail Khan

Posted on Apr 24 • Originally published at tufail.dev

Spec-driven development with Claude Code: shipping features in an hour

#claudecode #aicoding #developerproductivity #workflow

The developers I know who are shipping the most in 2026 aren't the ones with the fastest typing speed. They're the ones who've rewired their workflow around spec-driven development with tools like Claude Code.

I've been using this pattern for nine months on everything from Savyour to Vettio. My output has roughly doubled. My bug count is down. My reviews are shorter.

Here's what the workflow actually looks like — not the marketing version, the messy version.

The shift: from chat to spec

The first generation of AI coding assistants (2023-early 2024) were chat-based: you'd have a long conversation with the model, paste code back and forth, and iterate. It was faster than solo, but the context was ephemeral, the quality was uneven, and it didn't play nicely with git.

Claude Code, Cursor's agent mode, and similar tools inverted this. The new loop:

Write a spec — a markdown document describing what to build.
Hand the spec to the agent — it reads it, explores the repo, writes the code.
Review the diff — like reviewing a junior engineer's PR.
Iterate via spec amendments, not chat.

The spec becomes the source of truth. The agent is the implementer. You stay in the architect / reviewer role.

What a good spec looks like

Specs that produce clean PRs share a few traits:

1. Intent and constraint, not instructions

Bad spec:

Open app/routes/users.ts, add a new function called getUserByEmail, call the prisma client...

Good spec:

Add an endpoint GET /users/by-email?email=... that returns the user profile. Must hit the existing Prisma-backed users table. Must respect the existing auth middleware on the /users router. 404 when not found. Covered by a unit test in the same style as the existing /users/:id test.

The good version tells the agent what to build and what rules apply, not how to build it. The agent figures out the how from reading the codebase.

2. Acceptance criteria

End every spec with a bulleted list of what "done" means:

## Acceptance criteria

- [ ] New route passes all existing auth middleware
- [ ] Returns 200 + user JSON when the email matches
- [ ] Returns 404 with a `{"error": "not_found"}` body otherwise
- [ ] Email lookup is case-insensitive
- [ ] Test added alongside `users.spec.ts`
- [ ] No changes to the DB schema

The agent uses these to self-check. You use them to review.

3. Out-of-scope callouts

This is the one most devs skip, and it's the difference between a focused PR and a sprawl:

## Out of scope

- Do NOT refactor the existing `/users/:id` route
- Do NOT add rate limiting (we'll do that in a follow-up)
- Do NOT touch the signup flow

Agents, like junior engineers, will happily "improve" adjacent code unless told not to. Make the boundary explicit.

The iteration loop

Real workflow, from spec to merged PR, on a typical 200-line feature:

10 min: Write the spec (specs/2026-03-02-user-by-email.md)
30 sec: claude "implement the spec at specs/2026-03-02-user-by-email.md"
3-8 min: Claude reads the codebase, writes the code, runs the tests
5-10 min: I review the diff. I ask for a change. The agent makes it.
2 min: CI runs. Green.
Merge.

Total: ~30 minutes of my time for work that used to take 2 hours. Most of the savings aren't typing — they're not-context-switching because the agent does the file-hunting.

What the agent is bad at — and how to compensate

Three failure modes I've seen repeatedly:

Over-abstracting

Agents love to introduce helper classes, utility modules, and "future-proofing" abstractions you didn't ask for. Explicit "keep it simple, match the surrounding code style" in the spec mitigates this 80% of the way.

Silent test deletion

Sometimes an agent will disable a failing test rather than fix the underlying bug. I've caught this half a dozen times. Mitigation: always grep the diff for .skip, xit(, @pytest.mark.skip before approving.

Confident wrong answers on versioning

If your codebase uses an unusual library version, agents will default to the current version's API. Mitigation: pin the spec to "read package.json first and match versions" or include a short "stack notes" section.

The CI piece: trust but verify

I treat AI-written code with slightly more suspicion than my own. My CI for agent-produced PRs:

Standard test suite
grep -n 'skip\|FIXME\|TODO' diff check
Secret scanner (agents occasionally echo-back test credentials)
Bundle-size budget check
Type-coverage threshold

If any of those fail, the PR goes back for revision via a spec amendment, not a code fix on my side.

Where spec-driven development fails

Not every task is a fit:

Highly exploratory work ("figure out why this is slow") is still better with an interactive shell session, not a spec
Very small changes (a one-line fix) have too much spec overhead
Deep refactors spanning >10 files often do better broken into multiple specs handed off sequentially

For the 200-line-feature sweet spot — the majority of backend and glue work — spec-driven is my default.

The meta-skill

The thing that's changed most about my job in 2026 isn't the model. It's that writing precise English has become my single most leveraged engineering skill. A good spec is:

Unambiguous about intent
Explicit about constraints
Clear about what "done" looks like
Honest about what's out of scope

Which, now that I think about it, is also what a good pre-2023 design doc looked like. Maybe we've come full circle.

DEV Community