The ABC's of A/B Testing

TL;DR

You should probably be A/B testing features.
Third party services may open you up to some engineering liabilities.
These services may be incompatible with your shiny new JS framework.
You can write A/B testing components in React.
Third party services may also be slowing down your site and causing flickering page loads for users.
Allocating and tracking cohorts isn’t that hard either and allows you to use server-side rendering.
DIY A/B testing FWT.

Why would you even do this?

Working at a young start-up, we’re strongly user-centric. Users keep our business afloat, which means we keep our jobs. Unfortunately though, we are not our users. Despite our decades of collective industry experience, we can’t always foresee how they will receive changes and new features. Our proximity with our product and even the web at large can bias our perspective and prevent us from being able to predict their behavior.

It seem like the industry as a whole is coming to this realization and adopting A/B testing as a solution. We’ve found it to be more effective than ad-hoc, in-person user testing with samples groups that don’t accurately reflect our overall user base, and it's clearly superior to guessing how our users will respond.

The first terrible idea

Our initial approach was to use a third part service, which seemed pretty ideal, at first. It had a low integration cost and allowed experiments to be created without any engineering effort, but we quickly discovered that it came with some significant draw backs. The most apparent was the terrifying menu in the experiment editor that included items like `Insert HTML` and `Move and Resize`. Giving someone outside of engineer control over our DOM felt like a significant liability. This editor also didn't seem very aware of the responsive web, and it required that whoever managed experiments have a good knowledge of engineering, an awareness of best practices, and an understanding of how our layouts worked, which was far outside of their normal expertise. Without that experience, they could easily break a layout or hurt performance by testing with a 4MB image.

Shiny new toys

When we introduced React, our old testing framework stopped working entirely. Both would manipulate the DOM, but React’s lifecycle events would end up overriding tests. We found that we could still allocate cohorts for experiments and monitor results so we wrote a light-weight component to handle displaying variations.

render() {
  return (
   <p>
     <ExperimentVariation name={this.experiment} variation='Default'>
        Fancy text
     </ExperimentVariation>

     <ExperimentVariation name={this.experiment} variation='Variation 1'>
        Other fancy text
     </ExperimentVariation>
   </p>
  );
}

Note, Default and Variation 1 where the naming conventions used in the third party service not our choice. While having each ExperimentVariation as it's component may seem redundant, it didn't force ExperimentVariations to be siblings. Consider the following contrived example:

render() {
  return (
   <article>
     <ExperimentVariation name={this.experiment} variation='Default'>
        <h1>Pay us money</h1>
     </ExperimentVariation>

     <img src='money.png' />

     <div>
       <ExperimentVariation name={this.experiment} variation='Variation 1'>
          <h1>Give us money</h1>
       </ExperimentVariation>

       <p>Not lorem ipsum</p>
      </di>
   </article>
  );
}

The actual ExperimentVariation is a pretty lightweight component.

import React, { Component, PropTypes } from 'react';
import ExperimentStore from 'stores/ExperimentStore';

export default class ExperimentVariation extends extends Component {
  static propTypes = {
    name: PropTypes.string,
    variation: PropTypes.string,
  };

  state = {
    isActive: false,
  }

  componentDidMount() {
    this.setState({
      isActive: ExperimentStore.isActiveVariation(this.props.name, this.props.variation),
    });
  }

  render() {
    return (this.state.isActive)
      ? <span>{this.props.children}</span>
      : null;
  }
}

I'll spare you the jungle that is the ExperimentStore. It mainly just parses the giant global object the third party service sticks in our page and does value-key comparison to determine if an experiment is active.

The break-up

This solution was pretty quick to stand up, and it meant we didn't have to rebuild cohort allocation or measure results, but we quickly noticed more drawbacks with the service we were using.

The first was that it limited all experiments to the client, meaning that anything we tested wouldn't show up until DOMContentLoaded, forcing us to test only below the fold or to leave our users with a flickering page. Not very ideal.

We also discovered in a site speed audit that this third party service was contributing a significant amount of page weight and causing long rendering frames since it normally would be manipulating the DOM. A second major strike.

Overly-optimistic engineering

So now we're in the process of building our own internal testing framework. Conceptually it's rather simple. You create DB backed experiments that store the percentages allocated to each cohort. You cookie new users when they visit and check against that value to determine which ExperimentVariation they see. We're able to eliminate flickers since we server-side render React. We also bootstrap our Stores so data on the server and client would be in parody. Even if you're not doing that, the server and client will still have access to the cohort cookie and can stay in sync. We're also doing a significant amount of even tracking so gathering results is somewhat trivial. Most sites are already using some form of tracking even if it's just Google Analytics. While it may not give you the nice dashboard a third party A/B testing framework will, it still allows you to quantify results.

This may be a gross over-simplification. We'll find out soon enough, but the major take is: If you've bought into A/B testing, you may want to think twice before committing to a third party service. They can introduce significant liability and adversely affect your site performance, and the given the limited amount of effort required to build your own, it may be worth the trade off.

DEV Community

The ABC's of A/B Testing

TL;DR

Why would you even do this?

The first terrible idea

Shiny new toys

The break-up

Overly-optimistic engineering

Top comments (0)

Read next

Building a Travel Support Agent with RAG and PostgreSQL, Using IaC.

Voxel51 Filtered Views Newsletter - July 26, 2024

GBase 8a Migration Plan Based on Netezza (1) - Migration Methods and Recommendations

Day 1: Kicking Off My 21-Day JavaScript Challenge 🚀