Sequential Testing Correction

Sequential testing is a statistical method used to analyze data as it is collected, ensuring decisions are made in a step-by-step way rather than waiting until all data is collected.

This can help to reduce the time and resources needed for experimentation, particularly in situations where the outcome becomes clear before all data is collected.

Let’s say you’re a product manager for an eCommerce website, and you’re planning to roll out a new feature aimed at increasing conversion rates. However, your website has limited traffic, and acquiring additional traffic through advertising campaigns is costly. In such a case, sequential testing would be ideal.

You could implement the new feature and use sequential testing to monitor its performance. If the feature shows significant positive results early on, you can conclude the test sooner and roll out the feature to all visitors, saving time and resources. On the other hand, if the feature doesn’t perform as expected, you can stop the test early, preventing further investment of resources in an ineffective feature.

Sequential testing correction encompasses methods aimed at preventing the issues that arise from sequential testing, such as false conclusions when interpreting interim results. Sometimes, sequential testing may heighten the risk of erroneously concluding a variation to be better when it isn’t (a false positive). Sequential testing correction mitigates this risk by adjusting the threshold of confidence necessary before finalizing significance.

Fixed horizon tests vs Sequential tests

In contrast to sequential tests, fixed horizon tests have both sample sizes and experiment goals predetermined. Conclusions can only be drawn upon completion of the review period. This approach generally provides a higher level of statistical trustworthiness but at the cost of higher traffic being used for each experiment.

Why are Sequential tests more suited to modern A/B testing?

In recent years, sequential tests have become increasingly popular, enabling continuous data collection. Here are some reasons why it is more suited to modern A/B testing:

Efficiency

By implementing sequential testing, organizations can quickly identify potential disadvantageous ideas or content at an early stage of development before they are fully implemented or exposed to a large audience. Organizations can effectively allocate resources and minimize the overall costs associated with implementing such ideas. This helps businesses make informed decisions, such as releasing a big feature before a major event, in fast-paced digital environments.

Flexibility

Modern businesses need experimentation to be visitor efficient so that A/B testing can be done on pages with low traffic as well. With sequential testing, sample sizes are not fixed, offering the option to stop the experiment early if significant results are observed or to continue until reaching a predetermined endpoint, accommodating varying traffic levels and experiment durations.

What are the problems caused by sequential testing?

Despite having benefits, sequential testing may also pose problems for businesses.

It may seem counterintuitive, but whenever statistical results are calculated multiple times, there is a risk of increasing the false positive rate.

This is the main concern with continuously monitoring A/B test statistics. Therefore, several solutions have been proposed to sequentially correct test statistics and reduce the occurrence of false positives in sequential testing

How do you correct errors from sequential testing?

There are a couple of ways of correcting errors in sequential testing. They are as follows:

Bonferroni corrections

False positive rates increase linearly with the number of interim checks you make. The most simple solution is to divide your false positive rate by the number of interim checks you are making.

So, if you need a 5% false positive rate, and you are making 10 interim analyses, set the false positive rate of the test to be 5/10 = 0.5%. This is the Bonferroni correction.

Always valid inference

This method allows for continuous testing during data collection without determining in advance when to stop or how many interim analyses to conduct. This approach offers flexibility, as it doesn’t require prior knowledge of sample size and supports both streaming and batch data processing.

Always Valid Inference isn’t popular because it’s complex to grasp and significantly compromises statistical power. This implies that detecting a winner will take significantly longer when one actually exists.

To simplify the testing process and allow you to focus on running tests and obtaining early results without concern for skewed outcomes, VWO uses a derivative of an approach called Alpha-Spending to correct Sequential Testing by Lan and DeMets.

The alpha-spending approach involves distributing the type I error (alpha) across the duration of a sequential A/B test. With this approach, alpha can be allocated flexibly across the selected peek times, and it is only utilized when peeking occurs. If a peek is skipped, the unused alpha can be retained for future use. Additionally, there is no need to predetermine the number of tests or the timing of their execution during data collection.

By selecting Sequential Testing Correction in the SmartStats Configuration, decision probabilities will be adjusted to minimize errors while monitoring test results during data collection in the new test reports.

If you prioritize obtaining reliable test results and desire greater control over test statistics, consider using VWO, where our testing system is designed to meet your advanced needs.

Explore more Glossary terms

Server-Side Testing

Server-side testing is the process of experimentation that involves rendering the website variation at the server level before sending it to a user’s browser.

Session Recording

Session recording is a powerful tool that provides a recorded presentation of how a user navigates a website or mobile application. It operates by recording visitor actions such as clicks, mouse movements, and page scrolling as they move across the website.

Simpson’s Paradox

Simpson's paradox is a statistical phenomenon in which a trend or characteristic observed within individual data groups undergoes a reversal or disappearance when these groups are aggregated.

Social Proof

Social proof is a psychological phenomenon based on the idea that individuals are more likely to adopt certain behaviors or make certain decisions if they see others doing the same thing.

Features (+125 more)

Features (+120 more)