5 most common A/B testing misconceptions
A/B testing is fundamentally a very simple concept. You have a webpage (landing page, homepage or product page) which you call version A. Now you make certain changes to it (changing headlines, buttons, colors, style or layout) and call the changed page as version B. Once you have version A and version B ready, any visitor that arrives on your webpage gets to either see version A or version B. In other words, your website traffic is split between these two versions. In A/B testing reports, you can then see which version (A or B) produced more number of sales or conversions (or leads or downloads, etc.)
So far, so simple.
However, there are a few misconceptions that can arise while doing A/B testing. In this post, I will discuss 5 of those common misconceptions. Note that the following misconceptions are not specific to VWO, but are rather applicable for most A/B testing tools.
1. Split of traffic not occurring as exactly 50/50
Whenever a visitor arrives on a page, how do we decide whether to show version A or version B? Answer is simple: we flip a coin. (Not literally, of course). We generate a random number between 0 and 1. If the random number is less than 0.5, we show version A, otherwise we show version B.
But just like while flipping coin you are never guaranteed exact same number of heads and tails, in A/B test you are never be guaranteed of getting an exact 50/50 split between version A and version B. However, as you test more traffic, the split ratio should get closer to 50/50, but it would rarely be exact 50/50.
2. In A/A test, you may get a winning or losing variation
First, what is an A/A test? This is a type of test where variation is exactly same as control. People generally setup this kind of test to determine validity of a tool. Ideally (and this is what most users expect), performance of both variations should be similar (because both variations are in fact same). So, if one of the variations starts out-performing or under-performing control, users think that tracking in tool is not correct because variation and control are actually same.
It may indeed be the case that tracking of tool isn’t accurate and that it is not accurately recording visitors and conversions. But there is another reason why A/A test can produce winning or losing variations. Because split of traffic is random in A/B testing, just due to random chance it may happen that traffic mix which goes to version A may be better (or worse) converting as compared to traffic mix going to version B. So, although it is highly unlikely (but not impossible), just due to random chance you may get a winning or losing variation in an A/A test.
3. Getting a winning variation even with less number of conversions/traffic
We declare a variation as winning variation when the difference in conversion rate (as compared to control) is statistically significant. You can read about mathematics of it in previous blog posts: how we calculate statistical significance and how to estimate number of visitors needed for a test. A common misconception is that a winning variation cannot arise early in the test when less number of visitors have been tested. Actually, it can. Broadly speaking, there are two components that determine statistical significance: a) number of visitors that have been tested; b) difference in conversion rate between variation A and variation B.
It is true that a minimum number of visitors must be tested before you deduce anything. But after that minimum number of visitors have been tested (it is 25 visitors per variation by default in VWO, but can be changed), if conversion rate difference between two variations is huge, statistical significance can still arise. (E.g. in variation A, 24 conversions from 25 visits and in variation B, 2 conversion from 25 visits). So, number of visitors tested isn’t the only criteria. Difference in conversion rate is also a criteria.
If you are concerned, we always recommend to test longer just to be sure of results. Ideally, you may want to run a test for a full week to accomodate variation and effects of daily traffic. But don’t blame the tool if you get results bit too early.
4. Impact on test if a certain variation or control is disabled while the test is running
Sometimes, our users run a test with 5-6 variations. After one week, they may decide to disable a few existing variations and maybe add a couple of new variations. Now, after a few more days of testing they complain that performance (percentage improvement or decrease in conversion rate) of the variations that they disabled changed. Users wonder how can performance of a disabled variation possibly change.
Actually, it is not the conversion rate of disabled variations that changes (it remains same once you disable it) but rather the conversion rate of variations that are still enabled is what changes. So, the performance of those disabled variations change (when you compare to variations/control that are still enabled). So, even if you get a winning variation and you disable it, you may find that after a few days it may not remain as a winner variation because conversion rate of control changed (and hence percentage change in conversion rate is now different). And the reason why conversion rate constantly changes is because your traffic mix changes from day to day. Estimating conversion rate of control and variation is always an ongoing task and it may change constantly.
5. Google Analytics visits v/s VWO visitors
We have a Google Analytics plugin for VWO. Numbers that are shown in Google Analytics may sometimes differ from VWO reports. Why is that?
Actually, different analytics/testing tools may have different definitions of visitors. Particularly, Google Analytics considers a visit by setting a 30 minute cookie. If a visitor arrives after 30 minutes (or after session expires), s/he is counted as another visit. On the other hand, VWO sets a long term cookie and counts a visitor only once even on repeat visits.
So, while comparing two different tools, always make sure you have correct definitions of visits/visitors figured out and you are actually comparing apples to apples.
Hope these misconceptions clarify some of the issues or questions that you may have in your A/B or split tests! If there are some more questions, please leave a comment.