The developers I know who are shipping the most in 2026 aren't the ones with the fastest typing speed. They're the ones who've rewired their workflow around spec-driven development with tools like Claude Code.
I've been using this pattern for nine months on everything from Savyour to Vettio. My output has roughly doubled. My bug count is down. My reviews are shorter.
Here's what the workflow actually looks like — not the marketing version, the messy version.
The shift: from chat to spec
The first generation of AI coding assistants (2023-early 2024) were chat-based: you'd have a long conversation with the model, paste code back and forth, and iterate. It was faster than solo, but the context was ephemeral, the quality was uneven, and it didn't play nicely with git.
Claude Code, Cursor's agent mode, and similar tools inverted this. The new loop:
- Write a spec — a markdown document describing what to build.
- Hand the spec to the agent — it reads it, explores the repo, writes the code.
- Review the diff — like reviewing a junior engineer's PR.
- Iterate via spec amendments, not chat.
The spec becomes the source of truth. The agent is the implementer. You stay in the architect / reviewer role.
What a good spec looks like
Specs that produce clean PRs share a few traits:
1. Intent and constraint, not instructions
Bad spec:
Open
app/routes/users.ts, add a new function calledgetUserByEmail, call the prisma client...
Good spec:
Add an endpoint
GET /users/by-email?email=...that returns the user profile. Must hit the existing Prisma-backeduserstable. Must respect the existing auth middleware on the/usersrouter. 404 when not found. Covered by a unit test in the same style as the existing/users/:idtest.
The good version tells the agent what to build and what rules apply, not how to build it. The agent figures out the how from reading the codebase.
2. Acceptance criteria
End every spec with a bulleted list of what "done" means:
## Acceptance criteria
- [ ] New route passes all existing auth middleware
- [ ] Returns 200 + user JSON when the email matches
- [ ] Returns 404 with a `{"error": "not_found"}` body otherwise
- [ ] Email lookup is case-insensitive
- [ ] Test added alongside `users.spec.ts`
- [ ] No changes to the DB schema
The agent uses these to self-check. You use them to review.
3. Out-of-scope callouts
This is the one most devs skip, and it's the difference between a focused PR and a sprawl:
## Out of scope
- Do NOT refactor the existing `/users/:id` route
- Do NOT add rate limiting (we'll do that in a follow-up)
- Do NOT touch the signup flow
Agents, like junior engineers, will happily "improve" adjacent code unless told not to. Make the boundary explicit.
The iteration loop
Real workflow, from spec to merged PR, on a typical 200-line feature:
-
10 min: Write the spec (
specs/2026-03-02-user-by-email.md) -
30 sec:
claude "implement the spec at specs/2026-03-02-user-by-email.md" - 3-8 min: Claude reads the codebase, writes the code, runs the tests
- 5-10 min: I review the diff. I ask for a change. The agent makes it.
- 2 min: CI runs. Green.
- Merge.
Total: ~30 minutes of my time for work that used to take 2 hours. Most of the savings aren't typing — they're not-context-switching because the agent does the file-hunting.
What the agent is bad at — and how to compensate
Three failure modes I've seen repeatedly:
Over-abstracting
Agents love to introduce helper classes, utility modules, and "future-proofing" abstractions you didn't ask for. Explicit "keep it simple, match the surrounding code style" in the spec mitigates this 80% of the way.
Silent test deletion
Sometimes an agent will disable a failing test rather than fix the underlying bug. I've caught this half a dozen times. Mitigation: always grep the diff for .skip, xit(, @pytest.mark.skip before approving.
Confident wrong answers on versioning
If your codebase uses an unusual library version, agents will default to the current version's API. Mitigation: pin the spec to "read package.json first and match versions" or include a short "stack notes" section.
The CI piece: trust but verify
I treat AI-written code with slightly more suspicion than my own. My CI for agent-produced PRs:
- Standard test suite
-
grep -n 'skip\|FIXME\|TODO'diff check - Secret scanner (agents occasionally echo-back test credentials)
- Bundle-size budget check
- Type-coverage threshold
If any of those fail, the PR goes back for revision via a spec amendment, not a code fix on my side.
Where spec-driven development fails
Not every task is a fit:
- Highly exploratory work ("figure out why this is slow") is still better with an interactive shell session, not a spec
- Very small changes (a one-line fix) have too much spec overhead
- Deep refactors spanning >10 files often do better broken into multiple specs handed off sequentially
For the 200-line-feature sweet spot — the majority of backend and glue work — spec-driven is my default.
The meta-skill
The thing that's changed most about my job in 2026 isn't the model. It's that writing precise English has become my single most leveraged engineering skill. A good spec is:
- Unambiguous about intent
- Explicit about constraints
- Clear about what "done" looks like
- Honest about what's out of scope
Which, now that I think about it, is also what a good pre-2023 design doc looked like. Maybe we've come full circle.
Top comments (0)