Time: 10 mins.
An A/B test is an example of classic statistical hypothesis testing, or the use of statistics to determine the probability that a given hypothesis is true–you define a problem, you propose a solution and you predict the outcome.
Setting up a Hypothesis
The null hypothesis refers to something that is assumed to be true if there is no real difference between the Treatment and the Control, that any observed difference is just noise.
The alternative hypothesis refers to something that is being tested against the null hypothesis and is basically a hypothesis that you believe to be true. The alternative hypothesis is what you might hope that your A/B test will prove to be true.
With a hypothesis, we’re matching identified problems with identified solutions while stipulating the desired outcomes.
- Identified problem: The Add to Cart rate on your Product page is 11%. By doing some previous research (e.g., surveys, heuristic evaluation), a problem you identified is that you do not have product reviews on the page.
- Proposed solution: By adding reviews on the product pages, you will increase social proof, trust, and confidence in the product, thus increasing the number of users adding items to their Carts. You will know this by looking at the Add to Cart CTA.
The null hypothesis would be: adding reviews generates an Add to Cart rate equal to 11%
The alternative hypothesis would be: adding reviews will cause Add to Cart rate to be more than 11%
After setting up your test hypothesis, it's time to pick the test statistic that will help you determine if your hypothesis can be validated or not.
Picking a Test Statistic
This is the method and value that will be used to reject or validate your hypothesis. Two-sampled T-tests are the most common statistical significance tests to determine if there is a real difference between your Treatment and Control.
At its most basic, the two-sample t-tests look at the size of the difference between the two means of your two samples (your users in the Control vs the Treatment) relative to the variance. The variance tells you the degree of spread in your data set.
The significance of the difference is represented by the p-value, or the probability that the difference would be at least this extreme if there is really no difference between Treatment and Control. The lower the p-value, the stronger the evidence that there is a real difference between Treatment and Control. By convention, any difference with a p-value lower than '0.05' is deemed 'statistically significant'.
Go further
Learn how to setup and run your analysis in Contentsquare.