What is Null Hypothesis?
Null Hypothesis: Control and Variation perform equally on the goal i.e μC = μV
Alternate Hypothesis: Control and Variation perform unequally on the goal i.e μC ≄ μV
In simple words, the null hypothesis is the hypothesis that the control and variation are equivalent. In hypothesis testing methodology, the null hypothesis and the alternate hypothesis cover the entire set of possible outcomes.
For instance, say you change the colour of your “Subscribe” button from red (Control) to green (Variation). The null hypothesis says that both red and green buttons have the same underlying conversion rate. In other words, the null hypothesis says that changing the colour from red to green has no impact on the probability that the user will convert (subscribe). The alternate hypothesis says that the underlying conversion rate for the red colour button and the green colour button is different.
The null hypothesis can never be anything other than equality and the null and alternate hypothesis are not exchangeable by choice. The reason is that in classical probability there is a way to calculate the probability of data assuming the null hypothesis, but none to calculate the same assuming the alternate hypothesis. This is because the alternative hypothesis is a combination of infinite hypotheses μC = 2μV, μC = 4μV, μC = 5μV, and so on, whereas the null hypothesis is one pure hypothesis μC = μV, and nothing else.
Null hypothesis is a fundamental component of the Frequentist hypothesis testing system and the probability of observing the observed data assuming the null hypothesis is what we call as p-value. If the p-value is too low (<0.05), we reject the null hypothesis and accept the alternate hypothesis. Note that the p-value equivalent of assuming the alternate hypothesis is never calculated (because it cannot be).
Frequentist and Bayesian implications
Frequentists did not have a way to calculate the probability of the alternate hypothesis and hence always had to define a hypothesis assuming that they could calculate the probability of data. Bayesians solved this problem with a unique insight. They assumed a prior probability distribution over the different hypotheses, μC = μV, μC = 2μV, μC = 4μV, μC = 5μV. Note that this distribution theoretically defines the probability of all possible means of μC, μV. By knowing the prior probability over all hypotheses they can calculate the probability assuming the alternate hypothesis and are also able to update their prior probabilities by learning from data.
This nuance sets Frequentists and Bayesians apart because Frequentist architectures give strong precedence to the null hypothesis. P-values often have to be lower than 0.05 or 0.01 for a statistician to correctly reject the null hypothesis. Bayesians do not have any such precedence to the null hypothesis because null hypothesis is one of the many hypotheses that Bayesians address. Hence, with Bayesian hypothesis testing, you are often able to declare winners much faster.
Frequentist precedence to null hypothesis and Occam’s Razor
The first time someone comprehends the difference, one might find the Frequentist method to be grossly inefficient against the Bayesian method. However, there is the reasoning behind the Frequentist method’s precedence to the null hypothesis. The reasoning is reflected in many laws in different forms, but one interpretation is Occam’s razor. Occam’s razor essentially says that one should always prefer the simpler explanation of a phenomenon. The core of this wisdom comes from the fact that randomness often creates patterns just out of chance. Hence, it is always wise to choose the simpler explanation over the complex explanation until and unless the data is widely off from expectation.