Dhruv Khatri

Posted on Apr 2

Why Most A/B Tests Fail (And How to Run Ones That Don't)

#saas

Here's a number that should give you pause: industry research suggests that up to 80% of A/B tests produce inconclusive results. That means the majority of testing effort generates no actionable insight.

This isn't because A/B testing doesn't work. It works extraordinarily well — when done correctly. The problem is that most teams unknowingly break the rules that make testing effective.

Let's walk through the most common failure modes and how to avoid every one of them.

Failure Mode 1: Stopping the Test Too Early

The mistake: You launch a test, check it after 3 days, see that Variant B is 30% ahead, and call it a winner.

Why it fails: You've likely hit a false positive. Early data in A/B tests is noisy. Day-of-week effects, campaign spikes, or random fluctuations create temporary leads that often reverse over time.

The fix:

Run every test for a minimum of 2 full business weeks (14 days)
Don't look at results more than once per week
Wait until you've reached 95% statistical confidence before declaring a winner
Lemora (https://lemora.cloud) automatically flags when a test has reached significance

Failure Mode 2: Testing Too Many Variables at Once

The mistake: You redesign your entire homepage and test it against the original.

Why it fails: Even if the new version wins, you have no idea which change caused the improvement.

The fix: Change one variable per test. Run multiple changes sequentially.

Failure Mode 3: Insufficient Traffic

The mistake: Running a test on a page that gets 200 visitors per month.

The fix: You need roughly 1,000 visitors per variant to detect a 10% lift with 95% confidence. Focus on your highest-traffic pages first.

Failure Mode 4: Testing Without a Hypothesis

The mistake: "Let's try a green button and see if it performs better."

The fix: Every test should start with: "Because [we observed X], we believe changing [element] to [variant] will [improve metric] because [reason]."

Failure Mode 5: Testing Insignificant Elements

The fix: Prioritize using the PIE framework:

Potential: How much room for improvement exists?
Importance: How much traffic touches this element?
Ease: How difficult is this test to implement?

Failure Mode 6: Ignoring Segmentation

The fix:

Always segment results by device type (mobile vs. desktop)
Check: new vs. returning visitors, traffic source, geographic region

Failure Mode 7: Not Documenting Results

The fix: Keep a testing log with hypothesis, variants, results, and learnings.

The Simple Checklist

Is my hypothesis grounded in data?
Am I only changing one variable?
Do I have enough traffic to reach significance?
Am I committing to running for at least 2 weeks?
Am I segmenting results by device and traffic source?
Have I documented this test?

If you check all 7 boxes, your test is set up to succeed.

Run your next test correctly with Lemora — built-in significance tracking, segmentation, and testing logs included.

DEV Community