How do you approach split testing? Any words of wisdom?

I taught a bunch of things at twitter, among them experimentation. My slides, sadly, are not public.

Fortunately, this is a pretty great presentation on the topic.

My TL:DR; for experimentation is:

  1. It's really not applicable for low scale unless you're looking for MASSIVE effects. Figure that you need around 10K DAU to be able to run an experiment every week, looking for normal-sized effects.

  2. At scale, it's crazy powerful for changes in the small, but can't really tell you what large changes to make. This means, of course, that it should be used one valuable component in a portfolio of techniques.

  3. It is extremely easy to do wrong, and can be of negative value in that case. Talk to someone who has done it before.