A part of keeping our customers happy is continually making the user-experience simpler. We want our customers to work with tools that help them make informed decisions, and the just-rolled-out Multivariate Testing (MVT) updates are aimed at accomplishing the same.
VWO’s MVT testing has helped set up more than 300,000 testing and personalization campaigns for our 4000+ customers. We’re releasing a few updates with a goal to add a more logical flow to how you create an MVT campaign, and how the winner is decided at the end of each test.
The updates include:
The Duration Calculator
We have now put a test duration calculator into the MVT test creation process. This will give you a rough estimation of required duration to better schedule the tests, and help you avoid starting tests which will take too long to run. The Duration Calculator screen helps you decide the course of the test through the following:
Estimated Duration of the Campaign
When it comes to computing the duration, VWO decides the scope in terms of the minimum number of visitors entering a test; the winner isn’t decided until this number of visitors enter the test. The number is then divided by the visitors per week recorded on the website during the campaign to calculate number of weeks required.
Estimated Existing Conversion Rate
A tool like Google Analytics or a VWO Conversion Tracking campaign can tell you the existing conversion rate of your website, which you need to fill in.
Minimum Improvement in Conversion Rate You Want to Detect
This represents the minimum lift you wish to achieve over the existing conversion rate. If the variation goes on to achieve this lift, there is a 90% chance of it being reported as winner. This 90% number can be changed by altering the “statistical power” field.
Number of Combinations (Including Control)
This includes all the different combinations you are running the test on, plus the Control.
Average Number of Monthly Visitors
You need to be certain about the value you are entering in this field, else you will be shown an incorrect estimate of the duration. You get to know this number through Google Analytics or by running a VWO conversion campaign.
p-Value Cut Off
It represents the probability where a winner is falsely declared even when all variations have identical conversion rates.
The probability that a change that’s at least as large as the desired lift is detected. For example, if desired lift is 10%, and statistical power is 80%, then if a variation with 10% lift is present, your test has an 80% chance of detecting it.
Attention: The estimated campaign duration may differ from the initial estimate in the duration calculator if the actual number of visitors differs significantly from what you entered.
Besides the above, the tool also keeps recommending in the Summary screen how long you should keep running a current campaign.
Corrections to Terminology
VWO’s existing multivariate testing system is based on the classical frequentist statistics. We previously adopted the industry standard term “Chance to Beat Control” to represent 100% p-value.
But while “Chance to Beat Control” is a standard terminology, it also happens to be significantly inaccurate. In fact, Frequentist statistics cannot actually provide a “Chance to Beat Control” — that’s only available via Bayesian methods. Because MVT testing at VWO is frequentist, we have updated the terminology to more accurately reflect what the statistics are measuring. This means we’ve changed Chance to Beat Control to Significance Level.
Significance Level represents the probability that an MVT campaign with the same sample size as this one, but where all combinations are identical, would yield a difference smaller in magnitude than the difference that was just observed. The closer the Significance Level is to 100%, the more plausible you should consider the hypothesis that “Variation is better than the Control”. However, it’s very important to note that a 97% significance level does NOT imply that there is a 97% chance that a combination will beat the control. Several factors influence the Significance level of a variation including duration of the campaign, number of visitors involved, and so on.
To access the results by clicking on the Detailed Report tab of the Campaigns section.
Attention: The correct way to run an MVT campaign is to choose a sample size upfront, and then run the test for the entire duration. If you happen to peek at the results before the sample size is reached, the statistical significance starts to alter from its actual value. This only leads to incorrect results since your dashboard may falsely state that a result is statistically significant, with a good bit of chance that it isn’t.
Corrections to Our Statistics Engine
Let’s say you create 3 versions of a headline and 3 versions of a button. Running an MVT test dictates that you test all the possible combinations (in this case 3×3=9). This is where the Multiple Comparisons Problem crops up. This problem implies that as the number of combinations being compared increases, the probability of them differing in terms of at least one element also goes up, thus leading to greater probability of false positives.
This Multiple Comparisons Problem, along with the fact that the default VWO winning threshold value is 95%, leads to each individual combination having a 5% chance of a false positive. Let’s say we have 20 combinations in the test. The probability of a false positive in at least one combination is 64% [= 100* (1 – (95/100)^20)]; This problem is nicely illustrated in the XKCD comic Significant.
Our previous testing methodology did not adjust for this problem. To resolve this problem, we now employ the Sidak Correction. You can now have confidence that the probability of having even one false positive – across all the combinations – is less than 5% if in fact all combinations are equal.
There Are No “Losers”
We have also switched to one-tailed tests that rule out the possibility of the test reporting a loser variation. If the variation is better than control, the one-tailed test essentially gives you an evidence supporting it. And if it does not indicate any winner, you can conclude that the variation won’t give you any lift in conversion rate, or rather prove to be counterproductive. which further implies that we won’t explicitly show any “losers”. The outlining procedure will be like this:
- An MVT test is run.
- If a winner is discovered, it should be deployed.
- If no winner is discovered, the control should be deployed.
The application will also provide further guidance on this change.
That about sums it up. Hope all the changes help you convert better. We would love to hear your feedback and first impressions.