A/B Split Testing and Multivariate Testing
What is A/B Testing?
A/B testing (sometimes referred to as split testing) allows you to determine the impact of content, design and functionality changes to your website. By creating multiple versions of a web page and collecting data & statistics for a control group (usually A) and at least one variation (B) we can learn about how your ideas impact user behaviour and ultimately your key goals and objectives.
Why should you perform A/B and multivariate tests?
In too many businesses, changes to the website are made on a whim or without solid evidence to support them. Marketing decisions are often made by the person with the most experience, highest salary or loudest voice; the HiPPO. This kind of subjectivity removes the key part of business success, the view of the customer. Conversion Optimisation allows your business to be truly customer-centric.
HiPPO by Tom Fishburne. Marketoonist
Even when you have great insights from data and user research, it is still difficult to predict how users will behave when you roll out changes. Testing takes the guesswork out of the process and allows you to be confident that you are making the right decisions.
Your business models, target audiences and value propositions are unique, which means your marketing methods have to be too. By gathering insights from your customer-base and testing based on these, you are being proactive rather than reactive and will be one step ahead of your competition.
Furthermore, many businesses believe the answer to improving online sales revenue is increasing customer acquisition. Because of this, acquisition is an extremely competitive marketplace, making it expensive. Eventually, the cost of acquisition will outweigh the ROI.
The trick is to stop thinking that acquiring new customers is the solution. Instead, you should capitalise on the customers you already have by improving their onsite journey. Having a testing strategy at the heart of a Conversion Optimisation programme will increase the number of people converting and improve your primary KPIs. Basically, you can make your marketing spend work better for you.
Beyond that, we have seen across all of our optimisation clients that A/B testing is consistently one of their greatest levers for growth. Our growth methodology™ consistently delivers a 77% test success rate (much better than the in-house industry average of 33%). By working with an AB testing agency like PRWD, you will become more data-driven and more likely to meet your growth forecasts.
6 Steps for a Bullet-Proof A/B Testing Process
Developing a robust A/B testing process is important for a number of reasons. In our experience:
- It leads to the best results
- It helps you to avoid common pitfalls
- It’s more credible within your organisation
- You gain learnings that last longer than any single UI
This is the outline of our tried and tested process for running a single A/B test as part of our ongoing optimisation programmes. Typically we will run between 2 and 8 tests each month although this can vary widely based on a range of factors.
We have a wide range of tools and techniques that we use to gather insight from users. From analytics to lab-based moderated user testing, surveys to session recording and much more.
Develop Sound Hypotheses
This phase will include identifying usability issues, finding opportunities to make a design more persuasive. This may also be the stage at which you identify fundamental business questions or hypotheses which can be answered through testing.
Defining an aim and hypotheses for each test before you start will allow you to evaluate the performance and draw conclusions later on.
A well thought out hypothesis and rationale is key. We base our tests on insights from user research and analytics. This way we have a comprehensive view of the ‘what’ and the ‘why’. If you are basing tests on internal bias and opinions, it will take a long time to build up any solid learnings and most likely not lead to big wins as there is little insight into the behaviour of users, backing the rationale of the test.
Deciding how to prioritise your hypotheses for testing is really important. Fortunately, we’ve written about it in detail here:
This stage will vary a great deal depending on the scope of your test. If you are looking to reword some key messages then this will be fairly quick stage.
On the other hand if you are looking to do something more dramatic such as redesign a page template then this is where we start sketching, prototyping, user testing, building and signing-off our variations.
The next step is to get your experiment set-up within your chosen testing tool (we love Optimizely!).
As well as configuring your variation(s) and key settings such as: segmentation, targeting, goals, analytics integration, etc., this is also the stage at which you should consider what additional tracking might be useful to gain further insight from your experiment.
For example you may have decided to create new goals within your testing or analytics tool (provided they integrate) to track key user behaviour such as :
Tracking a new video
Tracking clicks on a new page element
Tracking scroll depth on a new long-form landing page
Once it’s live, monitor your test initially and then try to avoid constantly peeking while the test runs. We find it can lead to panic or set unreasonable expectations.
Don’t be too trigger happy – If the results look negative upon first release of a test there can be a panicked temptation to pull the test. Don’t! There may be an initial ‘shock’ effect as the users’ journey is disrupted but this will even out over time. If your site has a high percentage of returning visitors it may be that you run the test for slightly longer. Make sure you are running your test for long enough, e.g don’t stop the test until it has been running through at least two typical business cycles and has reached statistical significance.
There is more on result analysis below, so we’ll keep this section brief, but you will need to judge carefully when you can call your test. Then you can start drawing insights from the results. Of course it’s always great to get a big headline conversion lift but there’s a lot more that can be learnt by looking at secondary conversion metrics and even failed tests.
We take great care in carefully documenting the key learnings from testing. This partly acts as a record of testing results and it allows us to avoid repeating tests and often fuels hypotheses for further testing.
Communicating results and learnings is also really important. It’s a great way to build support for your optimisation programme.
Analysing the results of a website test
Once an A/B or multivariate test experiment is completed, we can then start to carry out a post test analysis in order to evaluate the results, draw conclusions and decide on what action we need to take next. It is crucial that we only conclude experiments when we have sufficient data to draw valid conclusions. For a website test, this includes a minimum number of conversions, a minimum number of business cycles and of course, statistical significance for our test goals.
Results and analysis
An experiment will usually have a single primary goal, so that is the first set of data that we must present. Typically we will then drill down in to secondary metrics and specific interaction data. To illustrate, if our primary metric on an ecommerce site is ‘Transaction Complete’, secondary goals might be ‘Add to Basket’ and ‘AOV’. A specific interaction goal might be measuring the usage of a new element added in the variation.
On a lead generation website this might look like:
- Primary goal: New lead generated
- Secondary goal: Newsletter sign-ups
- Interaction Goals: Video plays, Clicking in to different tabs
Beyond our stated goals specified in our test plan, we will dedicate some time to look for other trends or patterns in the data and exploring the behaviour of different audiences and segments. In these instances we have to be careful to ensure that our segments still include a minimum level of conversions and test them for statistical significance.
It’s important to take the time to properly interpret the results across the site too. When focussing on a test on one page of the site, other metrics across the site and the effect on the end-to-end conversion funnel are frequently overlooked. The variation may be winning in terms of the goal for the specific test, but are you just moving drop-out to the next step of the funnel? Ensure you are able to analyse the overall impact of a change and focus on the macro conversion rate.
If you get a negative test, learn why. If a test fails in one area of a site where it had previously improved metrics in another, you should be asking why.
There is one caveat to the above. The only time that you can’t take any learning is when the data is corrupted in some way or there is a known problem with the way the experiment was designed. This is something you’ll need to resolve quickly and run the experiment again.
At this stage we want to evaluate how the results collected impact our initial experiment hypothesis. There are three main outcomes that we may arrive at:
- Success: If the variation outperforms the control and validates our hypothesis then we have a successful test outcome. This is great news and we can move onto our next steps (below).
- If the variation performs significantly worse than the control then we have a failed experiment, potentially saving us from launching something that would have had a detrimental impact on the bottom line. When this happens (and it will, eventually), you need to make sure that you analyse beyond headline metrics. It may be that the variation performed poorly but the results were not normally distributed. For example, did the variation perform well for new users, but frustrate returning visitors? Did performance vary by screen resolution? Carrying out some more detailed analysis answering these types of questions can provide a range of learning for further testing (if you have sufficient sample sizes to allow this).
- The other outcome is that from the results show that there is no discernible difference in the performance of our variations leading to an inconclusive test result. However, sometimes this can also provide really valuable insight. For example, on one site we ran a simple test to hide/show a block of editorial content on the homepage. After running the experiment we saw there was no great change in our key testing metrics. Even so, we realised a lot of time was invested in creating content for the homepage editorial slots. In this instance we didn’t change the primary conversion metrics, but we were able to reassign the time spent preparing content to more valuable tasks for the content team.
Once we have drawn conclusions form our experiment we need to outline a set of follow up actions. Key actions include:
- Implement the winning variation
- Re-run the experiment
- Run a new experiment that optimises further
- Test a similar hypothesis in another area of the site
- Test a new hypothesis that arises from insights from the analysis phase
- Share tests results and learnings with relevant stakeholders
- Carry out user testing to investigate new insights further
Quality Assurance in AB Testing
We have written a more detailed post on QA for AB Testing here, but to summarise the main points:
- Choose high traffic browsers and devices from Google Analytics, carefully noting down versions
- Run your code through a linter and through W3C’s validator
- Check the test doesn’t affect page loading time
- Outline all the elements that your test could affect and check across each outlined browser or device
- Create the user flow: actions the user should be able to carry out during this test and take this journey in each outlined browser or device
- Check all goals are firing to Google Analytics and to your testing tool (should you have set these up)
- Share with your office and get them to carry out the user journey
- Once the client has provided approval, re-check audiences, targeting and, finally, go live
Key A/B testing tools
There are plenty of tools available on the market to help with your AB Tests. To round off this guide, here are some of the most popular: