• Conversion Rate Optimisation

3rd Apr 2014

4 min

It’s great to get positive test results, not only for the impact it will have on business KPINecces but the support your testing programme will gain from internal stakeholders. But not every A/B test will get the results you expect.

Don’t be disheartened; tests where the variation didn’t win can often be the most insightful and valuable tests to learn from. You’ve tested an assumption, letting the visitors to your site decide which option works best for them. By testing the change you’ve measured the impact that it would have had, as opposed to releasing it site-wide and monitoring the change. Furthermore, a negative test forces you to reassess and test again.

It is incredibly important to take learnings from unsuccessful tests, but there are some key best practice techniques to help increase the likelihood of success for your next test.

5 Steps to Perfect Your Next A/B Test

1. Sound hypothesis

A well thought out hypothesis and rationale is key. We base our tests on insights from user research and analytics. This way we have a comprehensive view of the ‘what’ and the ‘why’. If you are basing tests on internal bias and opinions, it will take a long time to build up any solid learnings and most likely not lead to big wins as there is little insight into the behaviour of users, backing the rationale of the test.

2. Don’t be too trigger happy

If the results look negative upon first release of a test there can be a panicked temptation to pull the test. Don’t! There may be an initial ‘shock’ effect as the users’ journey is disrupted but this will even out over time. If your site has a high percentage of returning visitors it may be that you run the test for slightly longer. Make sure you are running your test for long enough, e.g don’t stop the test until it has been running through at least two typical business cycles and has reached statistical significance.

3. Don’t make assumptions 

When you have seen a positive test result, don’t assume that because the change was a success on one page of the site you can apply it site-wide; test it! Your customers will have different behaviours depending on what kind of browsing state they are in, where they have come from and other elements on the page.

4. Take time to properly interpret the results

When focussing on a test on one page of the site, other metrics across the site and the effect on the end-to-end conversion funnel are frequently overlooked. The variation may be winning in terms of the goal for the specific test, but are you just moving drop-out to the next step of the funnel? Ensure you are able to analyse the overall impact of a change and focus on the macro conversion rate.

5. If you get a negative test, learn why

If a test does fail in one area of a site where it had previously improved metrics in another, surely you should be asking why? Visitors are at different stage of their journey. This should help you to understand why their behaviour isn’t the same. More importantly, find out what you can about the mix of visitors in the test. A good start to interpreting the results is drilling down and analysing the data; segment your visitors by returning/ new users, devices, and referral.

The Win/Win Mentality

Following our tips will help improve your success rate in getting meaningful results and insights. Meaning that testing can be a win/win opportunity; you either have a positive impact on your KPI’s or you gain insight on your users that you didn’t previously have that you can feed into further tests and improvements.

Key Takeaways:

  • Ensure you start with a considered hypothesis and rationale for your test, preferably based on user insight
  • Let the test run for long enough and always check the statistical significance
  • Fully analyse, understand,  report and learn from every test, whether it had a positive impact or not!

2 responses to “5 Necessary Steps for Every A/B Test”

  1. Brian says:

    Thanks Sophia for a great article! A few thoughts:

    “The variation may be winning in terms of the goal for the specific test, but are you just moving drop-out to the next step of the funnel?” This is incredibly under appreciated by many testers, and super important to understand to help move away from using micro-conversions (such as clicks from one step to another in the funnel, etc) as success metrics, and over to true macro-conversions which are directly tied to the bottom line (such as orders submitted, etc).

    Good point on, “There may be an initial ‘shock’ effect as the users’ journey is disrupted but this will even out over time.” – I would also raise the point that this “shock” could artificially positively inflate a success metric (ie, the “novelty effect” of “let’s try this out because it’s new..”) Separately, early on simple chance can skew the results one way or another (ie, “Law of Large Numbers”).

    You’ve briefly touched on addressing the above with, “Let the test run for long enough (2x business cycles) and always check the statistical significance”. This is generally good advice, but one should be aware of the implications on Type I (“false positive rate”, ie statistical significance) and Type II (“false negative”, ie statistical power). From what I’ve seen, most businesses simply don’t decide on a stopping point (when to run their statistical significance test – is two business cycles enough?) in advance of the experiment, which often reduces statistical power (the ability to detect an effect when it truly exists), and instead constantly check their tool’s statistical significance calculator (repeated significance testing), which when done, and one decides to stop a test the moment “significance” is met, greatly increase the probability that your “winning” test is a “false positive”. This problem of, “How long do I run my test for?” and it’s implications on Type I and Type II errors is greatly exacerbated as you increase the number of variations to an experiment, OR start slicing and dicing analysis through segmenting by user type, device, medium, etc.

    Thanks again for the great article!

    – Brian

    • Sophia says:

      Hi Brian,

      Thanks for your comments, very much appreciated!

      You’re right, obviously the introduction of new functionality could just as
      much positively influence a test. This is why we run tests until the initial
      ‘shock’ effect, whether it be negative or positive has settled and we are
      confident in the pattern of results we are seeing.

      You make some interesting points to consider with regards to
      statistical significance and power. We often see cases whereby clients are keen to call the test in the early stages of a test when statistical significance has initially been reached. Another factor which is frequently overlooked is the margin of error (or conversion range) of a test; when running tests we highlight the need for there to be no overlap for a truly significant result.

      Thanks again for sharing your thoughts.


Leave a Reply

Your email address will not be published. Required fields are marked *