Five Common A/B Test Mistakes to Avoid
Split testing. The infallible process that adds an element of science to the art of optimizing your page’s copy and presentation.
I say infallible, but can you really place all of your trust in split testing?
I know I’m preaching to the choir here and don’t need to outline the usefulness of a decent A/B campaign. It’s still one of the best ways to properly optimize any digital marketing effort.
But the problem isn’t in understanding the importance of implementing split tests. It’s in their execution.
There’s been a huge rise in the number of split testing services allowing anyone with an email address or a Paypal account to try their hand at running a campaign. These services make split testing seem easy enough for any old Tom, Dick or Harry to see huge gains with minor changes.
Make a change, map results, analyze and repeat for success!
If only it were that simple. The truth is far different. A good split test campaign actually takes a good deal of thought and preparation, even a minor oversight could spell disaster and provide you with a set of misleading results.
Question of the day: Are you making one of these five split testing schoolboy errors?
When it comes to optimization, don’t take any chances. Make sure you’re not making one of the test-breaking mistakes outlined below.
What Are You Trying to Achieve?
Like a kid in a candy store, you don’t know where to start. There’s so much to test and you want to do it all.
Button color, page layout, web copy. You now have the power to optimize them all, and that’s exactly what you’re going to do.
But wait. Before you jump into your testing, you’re going to need a well-devised plan. Without it, your tests are likely to lower your conversion rates.
What you need is a solid hypothesis.
A good hypothesis is the roadmap that’s going to get you to where you want to go. You wouldn’t jump in a new car and drive off into the unknown without a map, just as you wouldn’t start testing without first mapping what you want to achieve.
Creating a good hypothesis is key to conversion rate optimization. If you want to run a successful campaign, you’re going to need to get into the minds of your audience and identify the sticking points that prevent them from converting.
There are two steps to creating a successful hypothesis.
1. Find the problem or sticking point for your customers
2. Figure out a way in which you can solve that problem to increase conversions
The tricky part is identifying the sticking point and problem areas for your visitors in terms of copy, usability, interaction and aesthetics. You’re going to have to delve into some good old fashioned audience research and heat map results, and may potentially need to survey segments of your audience to find the key optimization points.
Once you’ve identified the issues, you need to work out how to combat them. Here, there’s a little bit of guesswork involved—if you had the answer straight away, you wouldn’t need to run a test.
When you’ve come up with a potential solution to the issue, put it in thishypothesis template.
Changing [the problem you identified] to [proposed solution] will[desired effect]
Let’s fill it in with a realistic example:
Changing button text from ‘order now’ to ‘get yours now!’ will increase sales conversions.
The next step is to run your tests and keep a close eye on how they affect your conversions.
Despite what you (or the client who’s hired you) may think, falling short of your desired effect isn’t a complete loss.
Part of the reason you’re split testing is to understand your audience so you can better optimize in the future.
To achieve that goal, sometimes knowing what doesn’t work is just as important as knowing what does.
Confidence Comes with Maturity
Can you say — without a shadow of a doubt — that your test results represent the majority of your audience?
If you can’t, then it’s too early to be making any permanent changes to your site.
Let me put it to you like this: Would you gamble the future of your business on the purchasing preferences of a handful of customers?
I know that equating a single test’s confidence levels to your company’s future seems rather hyperbolic, but come on, you’re optimizing to find the truth, right? You want to know what’s working and what isn’t, and you can’t do that with incomplete data from a minority percentage of your audience.
Remember those tools I mentioned before, the ones that let anyone have a go at split testing? Well, they’ve got a history of declaring a variation a resounding winner after a very small sample size (they are getting better though).
I’ve seen certain variations declared as the best option after only a few hundred visits, which is absolutely ludicrous. There’s no way that a few hundred visits is enough to measure the impact of a variation change.
Peep Laja gives one of the best examples I’ve seen supporting the argument for letting your tests mature.
Peep ran a test for a client, a test which looked like a horrible failure after two days of testing.
After seeing a drop of 89.5%, the client was ready to pull the plug and revert back to the control. But have a quick look at the visitor numbers: 110.
Do you really want to call a test after only 110 visits? Do those 110 users best represent the opinions of your visitors as a whole?
Thankfully Peep let the test run for a further ten days and saw the below.
A 25.18% lift in conversions that ran at 95% statistical confidence (which by the way is the minimum confidence level acceptable).
Give your tests time to mature and you can be confident that the results you’re seeing represent your audience. Call your tests early and you’re wasting everyone’s time.
If you’re looking for any guidelines on test maturity, I’d recommend running them until they hit the minimums outlined below:
- At 95% statistical confidence
- Based on at least 1000 visitors
- Run for at least one whole week
Follow these three basic guidelines and you’ll be far more confidence in the results you’re seeing.
When to Test?
If I asked when you should test, what would you say? If your voice joined the chorus of CROs answering “all the time,” give yourself a gold star.
However, I’m not talking about when in the literal sense, I’m referring to simultaneous or before-after split tests.
Which would you recommend for best results and why?
Anyone who said before-and-after tests, step forward and return your gold star. The only accurate way to run a split test is simultaneously.
Because the quality and volume of your traffic changes daily.
Seriously, go and check your conversions now. Are they at a steady 15% every day? No.
You’re going to have good days or weeks as well as bad. A lot of the time, this doesn’t have anything to do with your conversion efforts. It’s a result of external factors over which you have zero control.
Lets say you get a mention and backlink on an authority site that’s closely related to your business. You’re going to attract a lot of visitors who are interested in your business, which is likely going to yield a higher conversion rate for that week.
Now let’s imagine that the backlink you received was from a site in a totally unrelated niche. The traffic heading your way is going to have little to no interest in what you offer. High traffic and low conversions is going to give you a reading of a poor conversion rate.
These changes are completely out of your control and have nothing to do with your variations.
A good CRO knows not all traffic is good.
Seeing your traffic rise on your analytics account feels great, but it doesn’t mean anything if visitors aren’t converting.
If you’re not running your tests side by side, you risk letting a surge of bad traffic negatively effect the test, giving you a false negative on what could have been the winning variation.
Testing too Much
Let’s take a quick look at the Crazy Egg landing page and choose three elements that we might want to optimize.
A good simple page, right? But there’s always room for improvement! Let’s test the:
- Button Copy
- Button Color
But wait. If you make all of the changes at the same time, one of two things will happen: your conversions either go up or down.
Doesn’t sound too bad, right? That’s what split testing is all about, seeing what works.
You’re right, but split testing is also about data. If you test multiple elements at once, you’ve deprived yourself of any meaningful data.
You tested all three at the same time, so was it the button copy, button color or USP that affected your conversions? If you’ve seen a decrease in your conversions, what do you change back? You could of course just change them all back to their control, which makes your wonderful test a huge waste of time.
If your conversions change, whether positively or negatively, you need to know why. If you don’t know why, then what’s the point of testing at all?
The only time you should be testing numerous variables at the same time is when there’s no overlapping interests or goals.
Choose a variable and test thoroughly before moving on to the next.
Macro > Micro
It’s all about the macro conversion. Focus on the only conversion that really matters, the end goal action that you want your audience to take.
This will, of course, differ depending on your business model, but generally speaking, you’re looking at purchases, sign ups, subscriptions, downloads. That sort of thing.
Now I understand that you want to focus on the micro conversions, to measure the CTR and conversion of every single page and button. You want to grease the wheels and have as many folk as possible shooting through every stage in your funnel.
More people progressing to each stage of your funnel is going to result in a higher end goal conversion, right?
I don’t care about the CTR from your home page to your product page or any other funnel progression, and neither should you. Why? Because people viewing pages doesn’t generate revenue! People who buy products do.
I can hear the grumbling. You’re saying that no one’s going to convert at your end goal if your funnel is broken. And you’re right.
I’m not saying you shouldn’t optimize the stages in your funnel. I’m saying that…
You should only measure the overall, end goal conversion. The one that generates cash.
It seems counter intuitive, I know. And can be a little difficult to wrap your head around. But I like to think of it as that one friend everyone has — the serial dater.
You know the type, the friend who’s out for dinner or drinks with a new paramour every week. They progress to “dating” regularly and sometimes take the step towards the boyfriend/girlfriend stage. But no further.
You as the marketer are looking for a husband or wife (a customer). Obviously that makes the serial dater (someone who isn’t actually interested in buying) the wrong one to optimize for — a fact that marketers focused on the micro conversion rates are going to miss. They’ll have all those dates and might even get to the boyfriend/girlfriend stage, but that’s where it ends.
In the below table, each micro transaction has been measured and recorded. The highest conversion for each stage has been bolded for ease of use.
You’ll notice that there’s no single variation which performs better at every stage of the funnel. Variation C gets top marks for the initial conversion, with variation A coming out on top of the second phase.
But it’s all pointless, because whilst variation B falls short in the micro conversions, it actually ends up pulling in the highest overall conversion with a lift of 46% of revenue gain.
You should still be testing your micro conversions, but test them whilst measuring the end goal.
Track your overall revenue. All that matters are how your changes effect the dollars in your pocket.
Keep your Own Counsel
Now I can wax lyrical about how orange buttons are the best or how we’ve achieved a huge increase by changing x, y, and z. Indeed it’s how a lot of sites draw in thousands of visitors every week. Everyone wants to replicate the button color change that brought a 14% lift, right?
But the truth is these results cannot be reproduced.
Because your audience is different. They might hate orange, which would mean flipping your button color could actually lower your conversions.
Whenever you see something online that you want to try, replicate the test, not the results.
If someone used orange buttons to increase conversions, test your button color. Don’t just switch to orange and hope to see a lift.
Run your own split tests and act only on the results that you’re seeing.