Although A/B tests look simple to execute, it requires great discipline to get them right. These tests often follow the Frequentist model that requires you to run the test for a set period of time to get correct data from it.
However, most testers really fail to understand the importance of time and are instead obsessed with reaching the significance level of the test. It ends up being the be-all and end-all for most testers and is often called significance testing.
Without running a test for the recommended period of time, the significance level won’t be correct and your results would be inaccurate.
Even if you run your tests for the recommended period of time, the Frequentist-model tests can only tell you if A will beat B. It’ll not be able to figure out how close or far A and B actually are. It will never actually tell you the probability of A beating B, and the uncertainty involved.
These mistakes happen because A/B testing did not evolve keeping conversion optimization in mind. It’s a statistical method that has been adopted but never customized to follow the workflow of conversion experts.
SmartStats: The Bayesian Way to Finding Your Winning Variation
SmartStats is our new stats engine based on Bayesian statistics that provides you more control over your testing. You can now plan better, have a more accurate reason to end tests, and really understand how close or far apart A and B are. SmartStats understands what improvements you care about, how certain you want to be, and helps you at every step to make your testing smarter.
Improvement as a Probability
Traditional Frequentist statistics approximates the mean (along with the standard deviation) of the samples where A beat B. This type of statistics completely ignores the instances when B beat A. Bayesian statistics takes into account even this possibility to calculate the probability of A beating B and the range of the improvement you can expect.
Every Conversion Has a Range
Instead of just expressing the number of conversions as a percentage, SmartStats calculates a conversion rate range where the true conversation rate lies with 99% probability. The more data it collects, the smaller this range of the highest likely values gets.
Chance to Beat All
SmartStats not just compares variations with Control but also with each other. If you test multiple variations, you will know the probability of a variation beating every other variation as well as control in your test.
SmartStats Calculates the Potential Loss to Reduce the Risk of Choosing a False Winner
SmartStats takes into account the probability that B may beat A. The potential loss is the lift you can lose out on if you deploy A as the winner when B is actually better. With traditional Frequentist statistics, you rely on reaching significance. If you haven’t committed to a sample size, you are at a high risk of getting a false positive or a false winner. SmartStats uses potential loss to decide when to end a test. Based on the test results, you’re shown one of the two statuses:
- Winning Variation: When the difference between the lower limit of the winning variation is at least 1% higher than the upper limit of the losing variation, SmartStats declares it as the winner.
- Smart Decision: Smart decision is recommended when the absolute potential loss for the leading variation is less than the threshold of caring. The “threshold of caring” depends on the prior values entered while creating the primary goal. Essentially, when the potential loss of deploying a variation is negligible to be within your comfort threshold, SmartStats recommends it as a smart decision to deploy this variation.
More Control over Testing with Three New Modes
We give you three modes to let you decide how much potential loss you are comfortable with and when to end your test: quick learning, balanced, and maximum certainty. The longer you run your test, the more evidence SmartStats gathers, to be certain of the range of the highly likely values where your true conversion rate lies. The more certain SmartStats gets, the more accurately it can calculate the potential loss of choosing A or B as the winner. Depending on how much of a direct impact a test can have on your revenue, you can select the correct mode to help you find answers as quickly as possible. SmartStats ends the test when the potential loss falls in your comfort zone.
How to choose the correct mode
For finding quick trends where tests don’t affect your revenue directly
|You can choose this mode when testing non-revenue goals such as the bounce rate and time spent on a page or for quick headline tests. With this mode, you can reduce your testing time for non-critical tests when there isn’t a risk of hurting your revenue directly by deploying a false winner.|
Ideal for most tests.
|This is the default mode and can be used for almost all tests. As the name suggests, it is the best balance between the testing time and minimizing the potential loss.|
Best for revenue-critical tests when you want to absolutely minimize the potential loss. Usually takes the longest to conclude a test.
|Suppose you have an eCommerce website and you want to test changes to your checkout flow. You want to be as certain as possible to minimize the potential loss from deploying a false winner even if it takes a lot of time. This is the best mode for such critical tests which affect your revenue directly.|
The new SmartStats engine lets you customize each test, based on the kind of risk you want to take and the improvement you’re aiming for. It also lets you calculate the sample size and time required to achieve these results.
All these advantages result into the most reliable, easy-to-use, and accurate testing tool to unearth that next winning variation for your app.
Summary of the New Goals Creation Panel
All these changes are located on the Goals Creation panel when you’re setting up a new test. Here’s a summary of all the information that you need to enter, to use our new SmartStats engine.
- Current revenue per user (in USD): This field is required only for the Revenue Tracking goal. Enter the current revenue per user for the pages that are part of this test. You can analyze the revenue data collected from other VWO tests, Google Analytics (GA), or other analytics tool to get an estimate of the revenue per user.
- Average number of monthly visitors: Enter the average number of monthly visitors on your website pages that you want to test. It is recommended that you analyze your website visitors of the last few months to understand the data trend. To get the most accurate results, ensure that the values you enter are accurate. You can analyze the data collected from other VWO tests, GA, or other analytics tool. ATTENTION The average number is the page visit count to the page where you want to test during this test.
- Conversion rate for control: Enter the current conversion rate for your app. The conversion rate you enter here is specific to the goal you are testing for. For example, if the primary goal of your test is to track “visits on a page,” enter the existing average number of users on the particular goal page only. To get accurate results, ensure that you add the conversion rate for the specific goals you are testing and not for the overall app. ATTENTION NOT entering the correct conversion rate, or entering a wildly incorrect conversion rate, will give you inaccurate results. It is important to enter the actual conversion rate (or at least a good ballpark figure) for the goal you want to test.
- Minimum lift in conversion rate you care about: Enter the desired minimum lift in conversion you want to achieve during the test. PRO-TIP You need to choose this number wisely. The smaller the gains you are looking for, the more users you’ll need to figure out to achieve your desired result is 2% improvement.
- Certainty vs. Speed: You have three modes which let you decide the accuracy you want to achieve. Accordingly, we’ll tell you how long you need to run the test to achieve the accuracy you are aiming for.
- Maximum Certainty: You want to be certain about your results. Accuracy is important to you, without worrying too much about how long it’ll take to achieve it.
- Balanced Mode: You want fairly accurate results, but time is also important to you. You’ll get fairly accurate results within a reasonable amount of time. Recommended.
- Quick Learning: You are short on time and want quick results. You want to see early trends, without worrying too much about how accurate the trend is. Not recommended.
How Does SmartStats Declare a Smart Decision Instead of a Winner?
SmartStats declares a variation as a winner only when there is at least 1% difference between the lower limit of a variation’s conversion range and the upper limit of control’s conversion range. The least possible difference ensures that your risk of deploying a false positive is minimized at all possible times. This is calculated using the formula:
(Lower limit of a variation – Upper limit for control)/(Upper limit for control)*100
For example, if the conversion rate for Control lies in the range 10–20% and the conversion rate for variation 1 lies in the range 21.1–27.0%, variation 1 is declared a “winner,” because even in the worst case scenario, it improves the conversion rate by 1%.
In contrast, when the potential loss of a variation drops into your comfort threshold, SmartStats will suggest that it’s a smart decision to deploy the variation. By deploying a smart decision, you will most likely not lose anything and in the best case scenario, you might even gain something out of it.
For example, if the conversion rate for Control lies in the range 10–20% and the conversion rate for variation 1 lies in the range 20.19–27.0%, variation 1 is declared a smart decision because the difference in the upper limit of Control and the lower limit of the variation is less than 1%.