The 200-Conversion Mirage
You ship a new checkout button. After 200 conversions, the p-value hits 0.03. Your manager celebrates. You push to production.
Two weeks later, the "winning" variant is underperforming the control by 8%. What happened?
The culprit isn't bad luck — it's a fundamental mismatch between frequentist hypothesis testing and small-sample reality. Most A/B test calculators assume you're flipping a coin thousands of times. But early-stage products, B2B funnels, and niche features rarely see that volume. And that's where the math breaks down in ways most data analysts never learned in school.
Why p < 0.05 Doesn't Mean What You Think
The classical frequentist test answers: "If there were truly no difference, how often would I see a result this extreme?"
But that's not the question you actually care about.
Continue reading the full article on TildAlice

Top comments (0)