What is peeking?
Peeking early at results in modern A/B testing environments presents the opportunity to swiftly detect significant differences with minimal data. However, this practice of prematurely halting experiments by continuously monitoring the dashboard can lead to biased conclusions. Such an approach may inadvertently favor results that appear significant due to random fluctuations rather than genuine effects.
Peeking can result in a Type 1 error, leading to a false positive. This implies mistakenly concluding that your hypothesis testing has succeeded when it actually hasn’t.
Example of peeking
Let’s say you’re testing two different versions of your website’s homepage to see which one leads to more purchases. After just a few days of running the test, you notice that variation 1 of the homepage is performing significantly better than variation 2. Excited by these early results, you decide to stop the test and implement variation 1 across the entire website.
However, what you fail to realize is that the initial success of variation 1 could have been just a temporary fluctuation or random chance. With peeking, you are likely to detect a difference when there is no difference (false positive). By stopping the test too early and making decisions based on incomplete data, you risk implementing a change that might not be beneficial for the business in the long term. The peeking problem in A/B testing leads to premature decisions based on early results, which can be biased due to exaggerated uplift values from smaller sample sizes. This compromises statistical significance and can lead to incorrect conclusions.
When is peeking allowed?
Peeking can result in false positives and inflated uplifts due to limited sample sizes (also called the winner’s curse).
As a result, experimenters must choose between fixed horizon testing and sequential testing based on their approach to handling peeking.
In traditional fixed horizon testing, where experiments are run for a predetermined duration, peeking at results before the test concludes is generally discouraged. This is because waiting to complete the testing duration ensures sufficient data for reliable conclusions, helps maintain statistical validity, reduces bias, and minimizes the risk of Type 1 errors.
However, in modern businesses, sequential testing methodologies have emerged, allowing for more adaptive experimentation and timely feature launches. In sequential testing, data is continuously monitored, and decisions about stopping the test or making adjustments can be made along the way. This approach accommodates peeking to some extent, as long as it is done cautiously to avoid premature conclusions based on incomplete data.
VWO has implemented Peeking Correction to ensure that Sequential Testing is accurate and reliable in its revamped reporting system. By using Peeking Correction, VWO adjusts statistical calculations to maintain validity, even when tests are monitored multiple times. This feature helps maintain the integrity of your results, allowing you to make informed decisions without the risk of skewed data.
If you’re seeking both flexibility in reviewing test results and high accuracy, give VWO a try—it comes with robust and dependable reporting features.