AI coding assistants are generally quite good at producing code.
However, they are less reliable when they have to decide what that code should do.
In other words, they struggle less with syntax than with intent.
Given a clear description of behaviour, an assistant can often produce a reasonable implementation. Given an ambiguous one, it will still produce something — but that “something” may not align with what was actually intended.
It’s a consequence of how we describe systems, and not a failure of the model.
The Real Problem Is Ambiguity
In most codebases, behaviour is only partially explicit.
Some of it lives in:
- method names
- comments
- documentation
- tests
- conversations between developers
The rest is assumed.
Those assumptions work reasonably well when the same people are working on the system and context is shared. They break down as soon as:
- new developers join
- features are modified across teams
- systems grow in scope
- or code is generated rather than written
AI coding assistants make the breakdown more visible.
They don’t share your assumptions.
They operate on what is written, not what is implied.
Try this experiment: pick an open-source project and scan its code, PRs, and tickets for the kinds of artifacts mentioned above.
Are you able to write a functional specification that perfectly describes the system’s intended behaviour? Do you think a coding agent would perform any better?
Where BDD Fits
Behaviour-driven development is often discussed as a testing technique.
More accurately, it is a way of making behaviour explicit.
A scenario like:
Given a document is submitted
When it is reviewed
Then it should receive a score between 0 and 100
doesn’t describe implementation.
It describes intent, and that distinction matters.
When behaviour is expressed this way:
- there is less room for interpretation
- there are fewer implicit assumptions
- and there exists a clearer boundary between what the system does and how it does it
That clarity certainly benefits humans, but it also benefits systems that generate code.
Why This Matters for AI-Assisted Development
When behaviour is implicit, AI has to infer intent.
That inference is where things start to go wrong.
An assistant may:
- implement the “most likely” interpretation
- generalize beyond what was intended
- introduce edge cases that were never discussed
- or omit constraints that were never stated explicitly
The result often looks reasonable in isolation, but it may not match the actual expectations of the system.
When behaviour is expressed explicitly — for example, through Gherkin-style scenarios — that ambiguity is reduced.
The assistant no longer has to guess what the system should do.
It can focus on how to implement what has already been defined.
This shifts the problem from interpretation to execution as we move from an imperative style to a declarative style.
BDD as a Constraint System
In previous discussions, I’ve described architecture as a constraint system.
Patterns like:
- bounded contexts
- aggregates
- dependency direction
- ubiquitous language
all restrict how a system is allowed to evolve.
Behaviour-driven development introduces another form of constraint:
It constrains behaviour.
A well-defined set of scenarios limits:
- what the system is expected to do
- how it should respond under specific conditions
- and what outcomes are considered valid
These constraints operate at a different level than architectural boundaries, but they serve the same purpose.
They reduce the space in which incorrect changes can occur.
For humans, this improves communication.
For AI-assisted workflows, it reduces guesswork.
Tooling, Not the Point
Frameworks like Cucumber or other Gherkin-based tools are often used to execute these scenarios.
That’s useful, but it’s not the most important part.
The primary value of BDD in this context is not test execution.
It’s the act of making behaviour explicit.
You can get much of the benefit even without a full BDD toolchain, as long as:
- behaviour is clearly described
- expectations are shared
- and scenarios are treated as part of the system’s definition
The tooling helps enforce it, similarly to how we might use ArchUnit to enforce architectural constraints.
The clarity is what makes it work.
Where This Helps — and Where It Doesn’t
Making behaviour explicit improves outcomes, but it does not eliminate the need for discipline.
BDD does not:
- define architectural boundaries
- prevent poor domain modelling
- or replace the need for governance
It complements those things.
It works best when combined with:
- clear architectural constraints
- a well-defined ubiquitous language
- and enforcement mechanisms that keep the system aligned over time
Without those, even well-written scenarios can (and do) drift.
Closing Thoughts
AI coding assistants are not inherently unreliable.
As anyone who's used one knows, they are sensitive to ambiguity.
When intent is implicit, they infer.
When behaviour is explicit, they implement.
Behaviour-driven development is an excellent way to make that intent visible.
Not as a testing technique alone, but as a constraint on what the system is supposed to do.
In systems that evolve quickly — whether through teams, automation, or AI-assisted development — that constraint becomes increasingly valuable.
Top comments (0)