As developers, we’re trained to spot leaky abstractions.
When a model simplifies reality too much, edge cases pile up, behaviour diverges, and the system starts lying to you.
After years of building analytics pipelines, experimentation engines, and real-time behavioural systems — most recently while building an adaptive growth platform called Zyro — I’ve come to a slightly uncomfortable conclusion:
Traditional A/B testing is a leaky abstraction for modern user behaviour.
It worked.
It was necessary.
But it no longer maps cleanly to how traffic behaves today.
The Original Assumption Behind A/B Testing
Classic A/B testing assumes:
- A stable audience
- Homogeneous intent
- Static behaviour
- A single global optimum
You create variants, split traffic evenly, wait for statistical significance, then deploy a “winner”.
That model made sense when:
- traffic sources were limited
- users arrived with similar context
- sessions were isolated
None of that is true anymore.
Modern Traffic Is Heterogeneous by Default
Today, two visitors landing on the same URL can be fundamentally different:
- One comes from Google with a research mindset
- One comes from TikTok with zero patience
- One arrives from ChatGPT already primed with context
- One is a returning user carrying session history
Yet most experiments treat them identically.
From a systems perspective, that’s already a red flag.
You’re averaging over fundamentally different distributions.
Why “Global Winners” Are Usually Local Losers
Here’s the part most dashboards won’t show you.
When you declare a global winner, you often do so by flattening variance:
- Variant A performs slightly better overall
- But performs worse for specific cohorts
- Those losses get hidden inside aggregates
So you ship something that’s “better on average” while quietly degrading performance for high-intent traffic segments.
This isn’t a statistics problem.
It’s a modelling problem.
Behaviour Happens Before Conversions
Most analytics systems anchor on terminal events:
- purchase
- signup
- submit
But behaviour doesn’t start there.
The meaningful signals appear earlier:
- hesitation
- comparison
- repeated scrolling
- policy checking
- copying product details
- bouncing between tabs
These are not noise.
They are state transitions.
And state transitions are what systems should respond to.
This insight didn’t come from dashboards.
It came from building systems that had to react to behaviour as it was happening, not after the fact.
The Shift From Experiments to Control Systems
While building Zyro, we stopped thinking in terms of “experiments” and started thinking in terms of control systems.
A control system:
- observes signals continuously
- reacts in real time
- adapts based on feedback
- never “finishes” optimisation
In this model:
- A/B testing becomes one input
- Not the governing mechanism
Multi-armed bandits, intent scoring, and source-aware routing aren’t replacements for testing.
They’re what naturally emerges when optimisation is treated as a live system instead of a batch process.
Why This Is a Developer Problem (Not a Marketing One)
Most growth tooling is designed for dashboards, not systems.
Developers care about:
- feedback loops
- latency
- reliability
- failure modes
- state awareness
Once you view optimisation through that lens, it becomes obvious why manual experimentation struggles:
- it’s slow
- it’s coarse-grained
- it reacts after the fact
- it assumes stationarity where none exists
Modern systems don’t wait for significance.
They adapt.
The Direction This Is Heading
The next generation of websites won’t run “tests” in the traditional sense.
They’ll behave more like adaptive systems:
- source-aware rendering
- intent-weighted decisions
- continuous learning
- server-enriched signals
- automated feedback into acquisition channels
This isn’t theory.
It’s the architecture you end up with when you try to make optimisation actually match how users behave.
Final Thought
A/B testing didn’t fail.
It succeeded so well that we stopped questioning its assumptions.
But as traffic becomes more fragmented and behaviour more stateful, optimisation needs to evolve from experiments into systems.
When an abstraction starts leaking, the fix isn’t more tooling.
It’s a better model.
And that’s the kind of problem developers are very good at solving.
Top comments (0)