Vilius CRO

Posted on Apr 7 • Originally published at inspate.com

A/B Testing for Ecommerce: How to Run Tests That Actually Generate Revenue

#shopify #ecommerce #cro

Why most ecommerce A/B tests generate nothing

A/B testing is one of the highest-leverage activities in ecommerce. Run the right test, get a statistically significant result, and you can generate revenue from the same traffic you already have - permanently, with no ongoing cost.

At Ovoko, 4 A/B tests generated over 1.6M euros in incremental GMV. Not from a massive test program - from 4 well-chosen, well-run tests.

But most ecommerce brands run A/B tests and see nothing. Zero lift. Inconclusive results. Wasted time. This is almost always a prioritization and process problem, not a lack of ideas.

Start with the ICE prioritization framework

Before running any test, score your hypotheses using ICE:

Impact - If this test wins, how much revenue does it generate? (1-10)
Confidence - How confident are you this will win, based on data and research? (1-10)
Ease - How easy is this to implement and run? (1-10)

Multiply the three scores to get a priority number. Run the highest-scoring hypotheses first. This sounds obvious but most teams test what's easy to build rather than what's likely to move revenue.

A quick example:

Hypothesis: Add payment FAQ block to checkout page
Impact: 8 (checkout abandonment is high, payment concerns are common)
Confidence: 7 (heatmap data shows users hover near payment section, exit surveys mention payment concerns)
Ease: 6 (requires dev work but manageable)
ICE score: 336

This is exactly the type of test Ovoko ran - and it generated a 15% uplift and 185,000 euros in annual incremental GMV.

What to test first: the revenue hierarchy

Not all pages are created equal for A/B testing. Here's where to focus, in order of likely revenue impact:

1. Checkout flow (highest impact)

The checkout is where buyers become customers. Every friction point here has a direct, measurable revenue cost. Start here.

High-value hypotheses:

Adding trust signals (testimonials, security badges) to checkout pages
Reducing the number of form fields
Adding a payment FAQ accordion
Testing express checkout button placement and prominence
Testing order summary layout and what information is shown

2. Product detail pages (high impact)

PDPs are where purchasing decisions are made. Small changes here - image order, CTA copy, social proof placement - can have outsized effects because every visitor who reaches this page is already considering the product.

High-value hypotheses:

Moving reviews above the fold on mobile
Testing "Add to Cart" vs "Buy Now" as primary CTA
Adding a benefit-focused bullet list vs. paragraph descriptions
Testing shipping/return policy visibility near the CTA
Lifestyle vs. product-only primary hero image

3. Collection pages (medium impact)

Collection pages affect which products get visibility and whether visitors find what they're looking for. Tests here tend to have lower per-visitor impact but affect more of your traffic.

4. Homepage (lower impact than you'd think)

Most ecommerce traffic lands on PDPs or collection pages, not the homepage. Homepage tests often show impressive relative lifts but affect a small percentage of your actual purchasing traffic. Run homepage tests after you've extracted value from checkout and PDP.

Sample hypotheses worth testing

Here are 10 proven hypotheses to add to your testing backlog. Adapt them to your store - don't copy verbatim.

Add customer testimonials to the checkout page. Shoppers experience peak doubt at checkout. Social proof at this exact moment reduces abandonment. At Ovoko, this test alone drove a 4.27% uplift.
Show "X people bought this in the last 24 hours" on PDPs. Social proof reduces perceived risk for hesitant buyers.
Replace generic "Add to Cart" with outcome-focused copy. "Start my free trial," "Get the kit," "Reserve mine" - outcome-oriented CTAs often outperform generic ones.
Add a "Why buy from us?" section above the fold on the homepage. Differentiation reduces comparison shopping to competitors.
Show the discount savings in dollar terms, not percentage. "$40 off" is more motivating than "20% off" for higher-priced items. Test which works for your price point.
Move star rating to the very top of the PDP, before the product title. Reviews are the first thing many buyers look for.
Add a sticky "Add to Cart" bar on mobile PDPs. Keeps the CTA accessible as visitors scroll through reviews and description.
Test free shipping threshold messaging. "You're $12 away from free shipping" in the cart is a proven AOV lifter.
Add a product comparison table for key features. Reduces decision paralysis for stores with multiple variants.
Test a "Most popular" badge on your best-selling product. Reduces choice paralysis on collection pages.

How to read your results correctly

This is where most teams go wrong. A/B testing is statistics, and statistics can lie to you if you don't understand what you're looking at.

Statistical significance is not enough

95% confidence is the standard. But if you stop a test the moment you hit 95% confidence, you're falling for "peeking." Decide your sample size before you start and run the full test. Stopping early inflates false positives significantly.

Calculate your required sample size upfront

Use a calculator like Evan Miller's A/B test calculator. Input your baseline conversion rate, the minimum lift you care about, and desired confidence level. This gives you the number of visitors per variant you need before the test can give a reliable result.

For a Shopify store converting at 2%, detecting a 0.3 percentage point improvement at 95% confidence requires roughly 25,000 visitors per variant. If you're getting 500 sessions a day, that's 100 days for a single test. This is why prioritization matters so much - you can only run a few tests per year at low-traffic stores, and you have to make them count.

Look beyond overall conversion rate

When you analyze results, don't just look at the overall conversion rate. Also check:

Revenue per visitor (accounts for AOV differences between variants)
Conversion rate by device (a change might win on desktop but lose on mobile)
Conversion rate by new vs. returning visitors (some changes work better for first-time buyers)

Document everything, ship only clear winners

Keep a log of every test: hypothesis, date range, sample size, result, and what you learned. Inconclusive tests are not failures - they're information. And only ship variants that show a clear win. Shipping a test that "almost" reached significance is how stores make their sites worse over time.

Tools to run A/B tests on Shopify

VWO - Most flexible for complex tests, good analytics integration
Convert - GDPR-friendly, good for European stores
Intelligems - Built specifically for Shopify, handles checkout testing well
Neat A/B Testing - Budget option for smaller stores

Avoid Google Optimize - it was shut down in 2023. And be cautious of app-based testing tools that slow your page down - page speed degradation can negatively affect the control variant and skew results.

Get your first test hypotheses ready

I've put together a pack of 50 battle-tested A/B test hypotheses specifically for ecommerce, prioritized by impact area and including implementation notes. It's available on Gumroad if you want a ready-made backlog to start from.

Get the A/B test hypothesis pack on Gumroad

And if you want help building a proper testing program for your store - choosing tools, setting up your backlog, interpreting results - that's exactly what I do at Inspate. The first audit is free.

Book a free CRO audit at inspate.com

DEV Community