Behaviour Is Not a Methodology
How BDD actually works once you stop treating it like one
When I first tried to implement Behaviour-Driven Development, it didn’t go particularly well.
Not because the team resisted it, but because I treated it too literally.
This article is about what actually made it work - especially in low-level and technical systems.
I focused on Gherkin, on syntax, on “proper” scenarios, and on reproducing what the books and talks described.
What I got was a lot of feature files, a lot of glue code, and very little shared understanding.
In practice, I was spending a lot of time translating low-level concepts - things like firmware updates, communication protocols, or system states - into high-level “natural language” scenarios that didn’t really help anyone. They were too abstract to guide implementation, and at the same time not concrete enough to give confidence when discussing behaviour with the client or product owner.
The second time I tried BDD, I stopped trying to implement BDD as described, and started trying to implement communication that actually worked.
Since then, I’ve seen BDD become genuinely useful but only when it adapts to the reality of the project and the people involved. I’ve used classic Gherkin, modified Gherkin, Excel sheets with formulas as behavioural models, diagrams, and UI mocks.
All of them worked. None of them were “pure”.
And that’s the core lesson behind everything in this article:
BDD fails as a framework and succeeds as a forcing function for communication and only works once you stop following it and start bending it.
Behaviour Is Not a Language Problem
Most BDD material assumes that behaviour should be expressed in “natural language”.
That sounds reasonable until you remember one uncomfortable fact: natural language is subjective.
Humans struggle to express what they want even in everyday life. Ask two people to agree on dinner and you’ll get friction. Yet we expect the same people to write “clear, universal, executable specifications” in English.
This gets even worse when not everyone comes from the same country or culture. Even the same word can carry different meanings depending on context (British and American English are full of subtle differences). And once you add people working in a second language, the idea of a single, shared “natural” language becomes even more fragile.
Natural language is ambiguous, culturally biased, emotionally loaded, and interpreted differently by different people.
So the real rule is not “use natural language”.
The real rule is: Use whatever medium creates shared understanding fastest.
Sometimes that’s text.
Sometimes it’s tables.
Sometimes it’s diagrams.
Sometimes it’s pictures.
Sometimes it’s Excel with formulas.
All of these are valid expressions of behaviour.
The Myth of Universal Natural Language
BDD literature often implies that scenarios should be understandable by everyone.That is impossible. We call it ubiquitous language.
There is no universal natural language. Language is always cultural and contextual. And in our lives, business driven.
“Natural” for a lawyer is not natural for a developer. “Natural” for a control engineer is already technical. And that’s fine.
If all stakeholders understand protocols, standards, schemas, or state machines, then those are the natural language of the business.
A perfectly valid BDD scenario can be:
Given the device is in state CONNECTED
When a READ request with OBIS code X is sent
Then the system must respond with frame Y within 200ms
That is technical.
That is business-relevant.
That is natural for that domain.
Simplifying it would make it less true, not more accessible.
Behaviour Can Be Expressed in Many Forms
In one project or business area, behaviour can be best expressed as Excel:
Given these inputs → apply these formulas → this is the output.
- Rows are scenarios.
- Columns are inputs and expected results.
- Clients understand it. Developers understand it. Tests can be generated from it.
In another project, behaviour may be best expressed with UI mocks:
Given you are here (picture), with specific options selected, configuration values set, feature flags enabled, and the system already in a particular state,
when you click this (picture),
then this appears (picture).
All of that context was captured naturally in a single visual. Trying to express the same thing purely in text would have resulted in long scenarios full of and, and, and connectors and still less precision.
BDD does not require prose.
It requires a shared behavioural model.
Language is just one possible encoding. Ultimately you might even reach the conclusion that you need multiple types of media to best convey your system.
Automating BDD Without Cucumber
BDD is not about making scenarios executable.
It is about making behaviour traceable and verifiable.
The only real automation requirements are:
- you can trace tests to agreed behaviour,
- you can produce (good) reports showing scenario coverage and status.
If tests are a perfect 1:1 translation of scenarios, great.
If not, but they still give traceability and evidence, also great.
BDD automation is an evidence system, not a syntax engine.
Language Decays are Avoidable
BDD assumes a shared language exists. Reality: shared language drifts over time.
- People join.
- People leave.
- Terms get overloaded.
- Meanings become implicit.
Six months later, the same word means different things to different people and nobody notices. So BDD requires something most teams never build: a maintained glossary.
Not documentation for its own sake, but a semantic source of truth. Every time someone asks “what does this mean?”, there must be a place to consult. And if that place does not exist, that’s not a discussion - that’s a missing artefact.
Language is infrastructure. If you don’t maintain it, behaviour becomes ambiguous again.
Development and Testing Cannot Be Isolated
BDD assumes that behaviour is a shared concern: that design decisions, implementation, and validation all revolve around the same understanding of how the system should behave.
Most organisations, however, are structured around strong role separation: design happens in one place, implementation in another, and validation in yet another, often across different teams and phases.
These two ideas are fundamentally incompatible.
The emulator problem is a perfect illustration of this.
In complex systems, especially those that integrate with external platforms or hardware, meaningful testing often requires emulators or other types of test doubles. Testers depend on them to validate behaviour, but only developers usually have the technical knowledge to build them. Product owners don’t own them, budget rarely plans for them, and testers cannot realistically implement them.
So emulators become a kind of no-man’s land: critical for system validation, but owned by no one. They are built late, are often incomplete, and quickly become fragile and outdated.
At that point, system behaviour is something that is discovered afterwards. You cannot have shared ownership of behaviour and at the same time isolate responsibility for validation. The moment behaviour is validated only after “development is done”, it stops driving design and becomes post-mortem verification.
Which defeats the purpose.
The Knowledge Gap Problem
There is another pattern that often appears in organisations with strong role separation: dedicated validation teams frequently have little to no deep technical knowledge of the systems they are validating. In some cases, these teams are not technical at all, by design.
This is often justified using the classic test pyramid: developers are responsible for unit and integration tests, while testers/validators focuses on system and acceptance testing at the top.
The pyramid itself is not the problem. Having multiple layers of testing is generally a good idea. The problem arises when the pyramid is used to justify a separation of people, rather than a separation of concerns.
On paper, this model looks clean. In practice, it creates a structural knowledge gap.
Validation teams are expected to validate end-to-end behaviour, but are rarely given the technical context required to do so. They often cannot realistically automate tests themselves. At best, they can adjust pre-made scripts or operate tools prepared by others, but they are structurally unable to design test infrastructure, build emulators, or reason about system-level behaviour.
This can work reasonably well for user-facing applications, where behaviour is mostly observable through interfaces and workflows. But it breaks down completely in more technical domains.
If you are testing an embedded system, for example, you cannot meaningfully validate behaviour without understanding:
- the underlying protocols and communication patterns,
- the difference between technologies (e.g. DLMS, Zigbee, or similar standards),
- the configuration and state of the device,
- the constraints and failure modes of the hardware itself.
In these contexts, black-box testing is often not just insufficient — it is actively misleading. The most important behaviours are not visible at the UI level and cannot be reasoned about without technical context.
Which again contradicts the idea that behaviour can be validated without understanding the system.
Behaviour Is Necessary, Not Sufficient
Scenarios define what the system should do.
They rarely define performance, resilience, security, or regulatory constraints.
Yet those are often the real reasons systems fail.
For example, contracts sometimes explicitly or implicitly require that the system needs to comply with ISO 27001 or some other framework.
Very rarely do behavioural scenarios capture what that actually means in practice: audit trails, access control policies, incident response procedures, data retention rules, and similar constraints.
This is a structural blind spot in many BDD examples: non-functional and regulatory aspects are often omitted, even though they are frequently the most critical requirements in real systems.
In practice, scenarios alone are not enough to drive engineering decisions.
Ultimately, these aspects can also be modelled using scenarios, but doing so often becomes an exercise in translating existing requirements into a different format, rather than improving understanding.
Architecture notes, protocol references, regulatory standards, and technical constraints must live alongside behavioural scenarios.
Not in a separate universe.
Behaviour Beyond the System
One last point that is often overlooked: behaviour should not only condition development, but everything around it.
If BDD is really about shared understanding of behaviour, then that understanding should apply not just to the system, but also to the way people work together.
That includes the team. But it also includes the client/product-owner.
We all know about scope creep. We all know about fixed-price projects that claim to be agile in scope, but not in budget. A lot of conflict in software projects does not come from technical failure, but from mismatched expectations about how collaboration itself should work.
In fixed-price or highly constrained projects, one way to address this is to explicitly define behavioural expectations as part of the contract or engagement model.
Not in abstract terms, but very concretely:
- scenarios are defined collaboratively in workshops,
- client input is required within specific timeframes,
- test evidence and traceability are provided as part of each delivery by delivering X and Y reports,
- demos happen at an agreed frequency and in a specific format,
- and anything outside of that flow is explicitly considered out of scope.
Does the client require a specific documentation template? If so, it should be captured in this document. And if the client does not require any specific template, that should be captured as well.
Will the client do any testing themselves? If yes, write down which level is their responsibility. And write down which levels of testing are covered by your team.
The same idea can apply internally within a team: documenting code practices, review standards, how demos are conducted, how decisions are made, and what “done” actually means.
Not as bureaucracy, but as shared behavioural expectations.
Because in the end, most project failures are not caused by missing features. They are caused by people having different mental models of what “collaboration” was supposed to look like.
Making that behaviour explicit is, in many cases, more important than any technical scenario.
This is just one possible format and other approaches can work just as well, but the underlying principle is the same: behaviour should be explicit, shared, and continuously validated.
What to keep from all this rambling
If there is anything worth keeping from all of this:
- Behaviour first - but there are many valid ways to express it.
- Natural language - only within a specific context.
- Tests - traceable to behaviour. Tests, tests, tests.
- Tests as living documentation - the best ramp-up material.
- No strict phase separation - a feature is only done when behaviour works and is validated.
If, after reading all this, you realise that your team already does most of it, just without calling it BDD, then congratulations.
You were never missing a framework. You were already doing Behaviour-Driven Development.
And if someone tells you that what you are doing is “not really BDD”, that’s fine too. Labels matter much less than outcomes.
And if this approach doesn’t work in your context, that’s also fine. The point is not to copy a process, but to find your own way of following the same principles.
That, in the end, is what BDD was always meant to be about.
Top comments (0)