DEV Community

TildAlice
TildAlice

Posted on • Originally published at tildalice.io

A/B Test False Positives: p=0.03 with 50 Users Explained

The 200-Conversion Mirage

You ship a new checkout button. After 200 conversions, the p-value hits 0.03. Your manager celebrates. You push to production.

Two weeks later, the "winning" variant is underperforming the control by 8%. What happened?

The culprit isn't bad luck — it's a fundamental mismatch between frequentist hypothesis testing and small-sample reality. Most A/B test calculators assume you're flipping a coin thousands of times. But early-stage products, B2B funnels, and niche features rarely see that volume. And that's where the math breaks down in ways most data analysts never learned in school.

A top-down view of analytical data sheets and a laptop, ideal for business analysis themes.

Photo by Tima Miroshnichenko on Pexels

Why p < 0.05 Doesn't Mean What You Think

The classical frequentist test answers: "If there were truly no difference, how often would I see a result this extreme?"

But that's not the question you actually care about.


Continue reading the full article on TildAlice

Top comments (0)