This is the first in a series of posts which aim to make clear a few commonly used phrases in Conversion Optimisation statistics and debunk a few myths around what you can and can’t infer from your test statistics. This jargon buster series uses examples from A/B testing tool Optimizely but the explanations can apply to any statistics in testing.
Part One: Confidence Intervals & Confidence Limits in Testing
Ever seen these signs on an A/B split test and wondered what they mean? The values to the right of the conversion rate are what we call ‘confidence intervals’. These are the values that (when added and subtracted from the test conversion rate) give us confidence limits. Confidence limits are basically a range wherein we can say with relative safety the ‘true’ value of the conversion rate lies.
Okay, let me put it another way. Let’s assume we’re testing to a 95% confidence level, the confidence limits on ‘Variation #1’ above mean; “I’m 95% confident that from this test we can say the exact value of the conversion rate of Variation #1 lies between 2.87% (which is 3.26% – 0.39) and 3.65% (which is 3.26% + 0.39)”
These confidence limits are a bit easier to digest when you see them visualised, like below.
That’s all well and good but why do we need these?
The reason we need confidence intervals and limits is because it would be impossible to run a conversion optimisation test on the whole of the population. Therefore, when we test a sample of the population, we can’t assume that they will behave in a way that represents the whole population. What we can assume is that the sample population provides an estimation to how the whole population would behave.
So we can estimate how all our users will act, what next?
Now we can start comparing the confidence intervals of the ‘original version’ and the ‘variation version’. If there is no overlap of the confidence intervals between the original and the variation (as with our first example above), it is very safe to assume that ‘Variation #1’ will increase the conversion rate over the original, assuming we have tested with a suitable sample size and let the test run for (in our opinion) at least two ‘business cycles’.
How can you say that?
Take a look at the below. This is a (very crude) illustration of the results. As you can see, the most extreme low (or confidence limit if you’re being fancy) of the variation is still higher than the most extreme high of the original. So even in the unlikely event the true value of the original and variation were the high and low respectively, the variation would still win. Now this is a fairly extreme example but gives an idea as to how you can use confidence intervals to interpret your test data.
*absolutely, in no way, to scale
The great thing about confidence intervals is that they provide an alternative way of visualising your test results but also provide extra information on what the largest and smallest effects you can expect.
Do you use confidence intervals and limits? If so, leave a comment and let us know how you us them. We’d love to hear from you.
Next in the ‘Statistics in Testing Jargon Buster’ series I’ll be diving into the much uttered but maybe not wholly understood, statistical significance and statistical power.