The answer to this question entirely depends on how much certainty you want in your tests. Statistically the more certainty you desire, the more you have to spend the time analyzing it; however there is never a point where we can say with 100% certainty that the results are exactly the same as they appear on the test reports. Yes we can always try and reduce the uncertainty over a period of time but it can never be eliminated completely. Now to improve the certainty factor we will advise you to setup the test as follows:
1) Set your chance to beat the original threshold to 99% or more as shown in the following screenshot:
2) Use our test duration calculator to estimate before starting the test how many visitors you would test and ONLY after testing those many visitors, see if you have got significant results (this will prevent the repeated poking at significance and hence drawing erroneous conclusions). You can access the tool at http://visualwebsiteoptimizer.com/ab-split-test-duration/
You can follow the above mentioned steps; they may reduce the uncertainty in the results however it is never going be with a 100% surety because Statistics simply doesn’t guarantee that. Now let’s say that we have created two different tests with the same setup and we are comparing there results over a period of time, so there are certain factors that we should consider before comparing them, they are as follows:
- It is never a good idea to compare tests for different durations as it is likely that the traffic mix for those durations may have changed.
- Secondly there are chances that your visitors may like the variation initially but later they might not like it due to the newness effect of that variation early on. We really cannot argue with the results without knowing full context of what variation is and how it affects visitors’ psychology over a period of time.
- And finally, conversions from new variations may even drop in the beginning due to the learning effect: the fact that regular visitors will take some time to get used to the new UI and depending on its complexity, may take some time to “learn” to use it effectively.
Looking at statistical significance again and again may in some cases make you see a significance which isn’t true. For more clarification, you can go through the following article link: http://www.evanmiller.org/how-not-to-run-an-ab-test.html