Part 2: Spec Is Not Enough

#ai #vibecoding #productivity #learning

This is the second post in a series about spec-driven development from a practitioner who has been maintaining a spec across nine SDKs for three years. The first post covered what a spec actually is and how to keep it useful over time. This post is about what happens when you start feeding that spec to an LLM.

There is a bold idea gaining traction in the SDD community right now: spec is the new high-level programming language, implementation details do not matter anymore. Write a precise enough specification, hand it to a frontier model, and good software comes out the other side.

It is an appealing idea. And it is not entirely wrong. But after a year of seriously using our spec to drive LLM-assisted development at Ably, I think it is only half the story. The missing half matters quite a lot in practice.

The Problem With "Spec Is the New Code"

Anthropic's engineering team recently built a working C compiler using Claude, driven largely by documentation and specification. It compiled real programs. It passed the tests. You can read about it on the Anthropic engineering blog.

By the "spec is the new code" definition, job done.

But a compiler built this way will be orders of magnitude slower than a hand-optimized implementation like GCC. Performance is not a spec concern. Idiomatic code is not a spec concern. Maintainability, memory efficiency, behavior under production edge cases - none of these live in the spec. They live in the implementation, and right now they still require engineering judgment to get right.

Correct and good are not the same thing. A spec defines expected behaviour. It says almost nothing about how well those things should be done. Treating a spec as a complete instruction set for generating production software is asking it to do something it was never designed to do.

Maybe the gap between correct and good narrows as models improve. But right now, if you hand a complex spec to an LLM and walk away, do not be surprised when the output passes every test and still needs significant work before it is ready to ship.

When Spec-Driven Generation Works Brilliantly

I want to be clear: I use LLMs to generate code from the spec regularly. The point is not that it does not work. The point is that it works exceptionally well in some situations and needs more care in others.

The situations where it works brilliantly share a few properties: the behavior is deterministic and fully specified, there is plenty of test data to verify the output, and performance is not a meaningful constraint.

A good example is writing SQL queries. When you ask an LLM to retrieve specific data from a database, the output is consistently good. The LLM has everything it needs, and there is very little room for it to go wrong in ways that matter.

This pattern holds across a wide range of tasks: serialization logic, retry policies with well-defined backoff rules, state machine transitions with explicit conditions. Anywhere the spec can fully describe the problem and the problem has no hidden depth, spec-to-code works well.

The key question before going direct from spec to code: is there anything important about this implementation that the spec does not capture? If the honest answer is no, hand it to the LLM. If the answer is yes, or even maybe, you need something in between.

When You Need More Than a Spec

The further you move from well-bounded deterministic problems, the more the direct spec-to-code path shows its limits. Not because the spec is wrong or the LLM is bad, but because the gap between expected behaviour and good implementation becomes too wide to cross in a single jump.

Think about how this plays out in UI development. Most developers would not take a requirements document and immediately generate code from it. They sketch wireframes first, then mockups, then component structure. Each step narrows the design space before any code is written.

The same principle applies at the lower level. A spec tells you what a feature should do. It does not tell you what the public interface should look like, what abstractions make the internal structure clean, or how data should flow between components. These are design decisions, and they matter enormously for the quality of what comes out.

When a task has non-obvious design decisions, the right approach is to iterate on the design layer before generating any code. Not a full architecture document — just enough to answer the key questions: what are the public interfaces, what are the main abstractions, how does data flow through the feature. Once those questions are answered, the spec and the design together give the LLM everything it needs to produce something genuinely good rather than just technically correct.

The spec is still essential. It anchors the intent and defines expected behavior. But it is the backbone, not the whole skeleton.

Two Paths From Spec to Code

After working this way for some time, I have settled into a simple mental model.

Spec → Code when: behavior is deterministic and fully specified, test data is available, performance is not a constraint. Go direct, include the test data, let the LLM work.

Spec → Design → Code when: interfaces are non-obvious, multiple components interact, platform constraints matter, or performance demands specific implementation choices. Spend time on the design layer first — a short markdown file, a few interface definitions, a rough diagram. Make the implicit design decisions explicit before any code is generated.

The spec feeds into both paths equally. What changes is whether you go direct or through a design step first.

A practical rule of thumb: if you can fully describe the task by pointing to a spec entry and nothing else, take path one. If you find yourself wanting to add context or explain how components fit together, that is the signal you are on path two.

What Comes Next

The two-path model sounds straightforward in theory. In practice, the design layer in path two is where most of the interesting and difficult work happens. Getting it wrong means the generated code inherits the design mistakes and you end up refactoring anyway.

In the next post I want to get into what happens after the code is generated. Having a spec gives you something most vibe-coding workflows lack: a way to verify that what the LLM produced is actually correct.

As always, if any of this resonates or if you think I am drawing the line in the wrong place, I would love to hear it in the comments.

Follow or subscribe so you do not miss Part 3.