I’m writing this blog post right after Conversion Hotel 2018 conference to transfer all the undeniable wisdom I’ve gained there. It’s still perfectly vivid in my mind. That weekend on the picturesque Dutch island, Texel (yes, we’ve got islands here) made me think a lot about how we can improve our tests.
Having listened to all the 16 incredibly inspiring speakers I’ve worked out the recipe for [almost] flawless experimentation process (because no test is perfect). Or I should rather say, I’ve figured out which common mistakes make each experiment doomed to fail. So I made a few strong promises to myself to stop making them RIGHT AWAY (and I suggest you do too!). Here they are:
I like to think of myself as data-driven. I know, not a surprise for a growth hacker. However, as Micheal Aagaard nicely put this, we all have a habit of torturing the data till it confesses.
“Tell me I’m right!”
Come on, I know it does. I’m not without guilt myself.
Sometimes when doing a heuristic analysis of a website, I use psychological principles and… well, I rip it apart (that sounds more sadistic than it actually is). It might happen that I spot something and I dig into the data to find the numbers that back up my assumption. It’s a strangely satisfying feeling.
“See! See! I knew it! Those twenty calls-to-action are preventing visitors from moving on to the next page. They don’t know where to click or focus on. Let’s set up an A/B test!”
Do you see a crucial mistake here?
The idea for an optimisation is coming first and I am basically searching for quantitative data to back it up. Usually, I do end up looking at heat maps and recordings to double check but that whole approach is flawed. No matter how fun it is.
Let me repeat one more time in case you’ve missed the point: don’t look for data to confirm what you want to believe in. Listen to what the data has to tell you first.
Source: Online Dialogue
Thank you, Michael!
Thank you for reminding me, and the rest of the audience, to always challenge my beliefs (and to always get others to do it too). To remain a student that is still learning. To be critical in my approach to ideation. And also thanks for giving a kickass presentation even whilst you’d lost your voice.
Lizzie Eardley (Senior Data Scientist at Skyscanner) touched on being cruel on data too. I could write a whole separate article about the insights she provided. I’ve never had such an interesting lesson in statistics! I suggest that you check out her articles for more input.
Anyway, the point she made is that we cruelly misinterpret data by allowing statistical ghosts in our experimentation process. Let me tell you how to scare them away.
Source: The first Ghost of Experimentation: It’s either significant or noise - By Tom Oliver, with Colin McFarland and Lizzie Eardley
If you’ve set up a few A/B tests before you probably know that moment when the “p” value is somewhere between 0.05 and 0.10. So close yet so far. What to do…
“Oh, come on, test! Hit this darn 95% significance level!"
Assume the results are significant or not to assume that?
Could you run the test longer?
The painful truth is there is no such thing as nearly significant or almost trending to significance. It is or it isn’t and that is the way you should interpret it. Lizzie admitted that at Skyscanner their custom tool even hides results that are not significant because otherwise people just want to believe it.
Hello confirmation bias, my old friend...
Lizzie said if you look at 20 metrics there is a 65-% chance of false positives. You’d think:
“Yaaaayy! More metrics! Let’s learn more!”
Sorry to disappoint you but that is not how it works. Sh*tloads of metrics equals more chances for false positives. So don’t get too excited and focus on the primary metric and perhaps a few secondary metrics. These should be the metrics you believe will be impacted.
We are spoiled for choice when it comes to metrics yet too often we focus on the wrong ones. Whilst there are arguments for both user-based metrics and session-based metrics, we choose the latter too often. Doing that brings one huge risk. Sessions are not interdependent. Users can have multiple sessions during which they see both versions of your experiment which can lead you to draw terribly wrong conclusions.
You have set your test live. It’s a few days later and whilst working on your next test you can’t help but wonder how is my little genius idea doing.
Click… Click… Just a quick peek.
“Oh God! Oh no! Help! Ring the alarms! Stop the test!”
Calm down. Don’t stop the test quite yet. Whilst you may need to peek for practical reasons (to check for bugs, truly disastrous experiences, etc.). be careful not to stop your test too soon. The results really vary in the beginning. And you don’t learn anything if you stop your test halfway.
Make sure you:
Never stop experiments unless you have to
Decide in advance how long you will run the test
Don’t run it longer just in the hopes of hitting significance
Thank you, Lizzie, for all your statistical wisdom. I learned far more than I ever did in my boring lessons at university. I promise I will stay away from those ghosts!
Like what you're reading? Get regular growth hacking tips sent straight to your inbox!
Sign Me Up!
We do a pretty extensive quality assurance (QA). Or so I thought.
We test five browsers and 6 devices
We test on different IP address
We require customers to conduct their own QA
Yet, an awesome workshop with Abi Hough (Director of Optimisation at Endless Gain) and Craig Sullivan (Optimise Or Die) proved there was definitely still room for improvement.
The most important mistake I realised we were making was that we were testing in the preview. When you do that you only test that exact change but not the effect that change has on the rest of your website.
How does it interact with other a/b tests?
Does it break other parts of the customer journey?
So how should you ideally QA your test? On a restricted IP address. This can be done either through your a/b test tool (some tools have this option built-in) or by adding some extra code to your test.
This will ensure you can fully test the journey and I promise you, you will find bugs.
Here are some other things you should watch out for:
That you double check the client development and release roadmap for the upcoming period
That you double check the marketing actions for the upcoming period so that it doesn’t skew the data
That you create a list of edge cases and walk through them
I think one of the most important lessons I learnt (but also one of the most abstract) was the one by Erin Weigel (Principal Designer at booking.com) Erin refreshingly didn’t talk about only wins but about losses. And there were many. With such an optimised website as booking.com about 10% of tests win.
Growing from 20 - 25 designers when Erin started, to 250 designers now, meant that lots of new ideas came in. Enthusiastic young designers bursting with energy and creativity came up with a gazillion brilliant ideas but so many of them have already been tested. Erin realised you can’t just knock off all those ideas and the designer’s enthusiasm with the reaction:
“We’ve tested that before. It didn’t work.”
Concepts don’t fail, executions do. Even tests that failed in the past can win while using a different method. Usually, the way it’s presented is flawed. It made Erin (and me at the same time) realise that you should ask far more critical questions about the test and why it didn’t work:
How was the technical implementation?
Are there edge cases that could have impacted it?
How was the design setup?
Erin admitted that one of her test ideas to change images on a website showed absolutely no significant results. Not until she realised the image optimisation came with a huge drop in site speed. Oops… The next time they tested it they slightly lowered the resolution of the images and the variant won!
So I can’t help but repeat it: Concepts don’t fail, execution does.
I have these neon pink sticky notes that you can’t ignore. I wrote down my promises on them and put them right in front of my nose so I never forget what I’ve learnt that weekend during the Conversion Hotel 2018.
If you consider yourself a true conversion optimiser you should note these:
Be critical of your test.
Be critical of the results.
Don’t try to support your point with data
Don’t get distracted by statistical ghosts.
Don’t slip up with the QA.
Which of these mistakes do you make repeatedly? Be honest with yourself. Time to change that! Start running better tests today.
Are you well aware of it all but your colleagues or employees still don’t get that? Share this article with them and prevent them from failing their next experiment.