{"id":3483,"date":"2012-08-22T18:50:18","date_gmt":"2012-08-22T13:20:18","guid":{"rendered":"https:\/\/vwo.com\/blog\/?p=3483"},"modified":"2025-05-01T15:46:12","modified_gmt":"2025-05-01T10:16:12","slug":"how-to-calculate-ab-test-sample-size","status":"publish","type":"post","link":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/","title":{"rendered":"How to Calculate A\/B Testing Sample Sizes?"},"content":{"rendered":"\n<p><em>(This post is a scientific explanation of the optimal sample size for your tests to hold true statistically. <a href=\"https:\/\/vwo.com\/why-us\/technology\/bayesian-statistics\/\">VWO&#8217;s test reporting is engineered in a way<\/a> that you would not waste your time looking up p-values or <a href=\"https:\/\/vwo.com\/blog\/ab-testing-significance-calculator-spreadsheet-in-excel\/\">determining statistical significance<\/a> &#8211; the platform reports &#8216;probability to win&#8217; and makes test results easy to interpret. Sign up for <a href=\"#free-trial\">a free trial here<\/a>)<\/em><\/p>\n\n\n<h2 class=\"js-cro-guide-subheading gtm_heading \" data-level=\"level1\" data-menu=\"How large does the sample size need to be?\" id=\"how-large-does-the-sample-size-need-to-be\" data-menu-id=\"how-large-does-the-sample-size-need-to-be\" style=\"text-align:left\"><strong>How large does the sample size need to be?<\/strong><\/h2>\n\n\n<p>In the online world, the possibilities for <a href=\"https:\/\/vwo.com\/ab-testing\/\" target=\"_blank\" rel=\"noreferrer noopener\">A\/B testing<\/a> just about anything are immense. And <a href=\"https:\/\/vwo.com\/testing\/?utm_source=page&amp;utm_medium=website&amp;utm_campaign=interlinking\">many experiments are done<\/a> indeed, the result of which are interpreted following the rules of null-hypothesis testing, \u201c<a href=\"https:\/\/vwo.com\/tools\/ab-test-significance-calculator\/\">are the results statistically significant<\/a>?\u201d<\/p>\n\n\n\n<p>An important aspect in the work of the database analyst then is to determine appropriate <a href=\"https:\/\/vwo.com\/glossary\/sample-size\/\">sample sizes<\/a> for these tests.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-text-align-center\"><em><a href=\"https:\/\/vwo.com\/ab-testing\/#guide-download\"><span style=\"text-decoration: underline\">Download Free: A\/B Testing Guide<\/span><\/a><\/em><\/h2>\n\n\n\n<p>On the basis of a daily case, a number of current approaches for <a href=\"https:\/\/vwo.com\/tools\/ab-test-sample-size-calculator\/\">calculating desired sample size<\/a> are discussed.<\/p>\n\n\n<h3 class=\"js-cro-guide-subheading gtm_heading \" data-level=\"level1\" data-menu=\"Case for calculating sample size\" id=\"case-for-calculating-sample-size\" data-menu-id=\"case-for-calculating-sample-size\" style=\"text-align:left\"><strong>Case for calculating sample size<\/strong><\/h3>\n\n\n<p>The marketer has devised an <a href=\"https:\/\/vwo.com\/blog\/landing-page-testing\/\" target=\"_blank\" rel=\"noreferrer noopener\">alternative for a landing page<\/a> and wants to put this alternative to a test. The original landing page has a known conversion of 4%. The expected conversion of the alternative page is 5%. So the marketer asks the analyst &#8220;how large should the sample be to demonstrate with <a href=\"https:\/\/vwo.com\/glossary\/statistical-significance\/\">statistical significance<\/a> that the alternative is better than the original?&#8221;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Solution: &#8220;default sample size&#8221;<\/strong><\/h4>\n\n\n\n<p>The analyst says: split run (A\/B test) with 5,000 observations each and a one-sided test with a reliability of .95. Out of habit.<\/p>\n\n\n\n<p><em>What happens here?<\/em><\/p>\n\n\n\n<p>What happens when drawing two samples to estimate the difference between the two, with a one-sided test and a reliability of .95? This can be demonstrated by infinitely drawing two samples of 5,000 observations <span id=\"GRmark_f86f45993622d29d7dff867b0adfb80c155ee560_neach:0\" class=\"GRcorrect\">neach<\/span> from a population with a conversion of 4%, and plotting the difference in conversion per pair&nbsp;(per &#8216;test&#8217;) between the two samples in a chart.<\/p>\n\n\n\n<p><em>Figure 1: sampling distribution <span id=\"GRmark_5cdf1ce998cb809da0d421deb300e13131f1b839_for:0\" class=\"GRcorrect\">for<\/span> the difference between two proportions with p1=p2=<span id=\"GRmark_5cdf1ce998cb809da0d421deb300e13131f1b839_.:1\" class=\"GRcorrect\">.<\/span>04 and n1=n2=5<span id=\"GRmark_5cdf1ce998cb809da0d421deb300e13131f1b839_,:2\" class=\"GRcorrect\">,<\/span>000; a significance area is indicated for alpha=<span id=\"GRmark_5cdf1ce998cb809da0d421deb300e13131f1b839_.:3\" class=\"GRcorrect\">.<\/span>05 (reliability= .95) using a one-sided test.<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"508\" height=\"202\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image001.png\" alt=\"Sampling distribution for the difference between two proportions with p1=p2=.04 and n1=n2=5,000; a significance area is indicated for alpha=.05 (reliability= .95) using a one-sided test\" class=\"wp-image-3486\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image001.png 508w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image001.png?tr=w-375 375w\" sizes=\"(max-width: 508px) 100vw, 508px\" \/><\/figure>\n<\/div>\n\n\n<p>This chart reflects what is formally called the \u2018sampling distribution <span id=\"GRmark_f46ece0b9bbda13b44497ffd2784eb5c72af0fab_for:0\" class=\"GRcorrect\">for<\/span> the difference between two proportions.\u2019 It is the probability distribution of all possible sample results calculated <span id=\"GRmark_737313a415c6b8413c6610b0eca0a504f28e859e_for:0\" class=\"GRcorrect\">for<\/span> the difference between p1=p2=<span id=\"GRmark_737313a415c6b8413c6610b0eca0a504f28e859e_.:1\" class=\"GRcorrect\">.<\/span>04 andn1=n2=5<span id=\"GRmark_737313a415c6b8413c6610b0eca0a504f28e859e_,:2\" class=\"GRcorrect\">,<\/span>000. <strong>This distribution is the basis \u2013the reference distribution- for <a href=\"https:\/\/vwo.com\/blog\/ab-testing-hypothesis\/\">null hypothesis testing<\/a>. <\/strong>The null hypothesis being that there is no difference between the two landing pages. This is the distribution used for actually deciding on significance or non-significance.<\/p>\n\n\n\n<p>p=<span id=\"GRmark_4427c0400b125779ad24fdbdbb473481dfdae329_.:0\" class=\"GRcorrect\">.<\/span>04 means 4% conversion. Statisticians usually talk about proportions that can lie between 0 and 1, whereas in the everyday language mostly percentages are communicated. In order to comply with the chart, the <span id=\"GRmark_47eea2da9064ee891afd7a0e4889d004b1e13a6c_proportion:0\" class=\"GRcorrect\">proportion<\/span> notation is used.<\/p>\n\n\n\n<p>This probability distribution can be replicated roughly using this <span id=\"GRmark_edcd43821812e3df1d597de0bb04deeb32ac08ad_spss:0\" class=\"GRcorrect\">spss<\/span> syntax (<a href=\"https:\/\/vwo.com\/downloads\/Kees-Schippers-how-to-calculate-sample-size.zip\" target=\"_blank\" rel=\"noreferrer noopener\">thirty paired samples from a population<span id=\"GRmark_edcd43821812e3df1d597de0bb04deeb32ac08ad_.:1\" class=\"GRcorrect\">.sps<\/span><\/a>). Not infinitely, but 30 times two samples are drawn with p1=p2=<span id=\"GRmark_da0093fff59763079dbedc262c2e0fec0673f048_.:0\" class=\"GRcorrect\">.<\/span>04 and n1=n2=5<span id=\"GRmark_da0093fff59763079dbedc262c2e0fec0673f048_,:1\" class=\"GRcorrect\">,<\/span>000. The difference between the two samples are then plotted in a histogram with the normal distribution inputted (the last chart in the output). This normal curve will be quite similar to the curve in figure 1. The reason for performing this experiment is to demonstrate the essence of a sampling distribution.<\/p>\n\n\n\n<p>The modal value of the difference in conversion between the two groups is zero. That makes sense, both groups come from the same population with a conversion of 4%. Deviations from zero <span id=\"GRmark_87e6831bbee6dc6e2338a3cde2b852a82eb87304_both:0\" class=\"GRcorrect\">both<\/span> to the left (original does better) and to the right (<span id=\"GRmark_87e6831bbee6dc6e2338a3cde2b852a82eb87304_alternative:1\" class=\"GRcorrect\">alternative<\/span> does better) can and will occur, just by chance. The further from zero, however, the smaller the probability of happening. The pink area with the character alpha indicated in it is the significance area, or unreliability=1-reliability=1-<span id=\"GRmark_7727cd05c49d8ad548d49bcd5992d2e565aebae3_.:0\" class=\"GRcorrect\">.<\/span>95.<\/p>\n\n\n\n<p>If in a test the difference in conversion between the alternative page and the original page falls in the pink area, then the null hypothesis that there is no difference between both pages is rejected in <span id=\"GRmark_66bbd6f42848f55c18896be9d1fefacd8346bf28_favour:0\" class=\"GRcorrect\">favour<\/span> of the hypothesis that the alternative page returns a higher conversion than the original. The logic behind this is that if the null hypothesis were really true, such result would be a rather \u2018rare\u2019 outcome.<\/p>\n\n\n\n<p>The x axis in figure 1 doesn\u2019t display the value of the test statistic (Z in this case) as would usually be the case. For clarity sake the concrete difference in conversion between the two landing pages has been displayed.<\/p>\n\n\n\n<p>So when in a split run test the alternative landing page returns a conversion rate that is 0.645% higher or more than the original landing page (hence falls in the significance area), then the null hypothesis stating there is no difference in conversion between the landing pages is rejected in <span id=\"GRmark_a4832ef6f38bdda6bfa2698f8dafd96ef673e105_favour:0\" class=\"GRcorrect\">favour<\/span> of the <a href=\"https:\/\/vwo.com\/glossary\/hypothesis\/\">hypothesis<\/a> that the alternative does better (the 0.645% corresponds to a test statistic Z value of 1.65).<\/p>\n\n\n\n<p><span id=\"GRmark_e277872f0065bbecd036962b7e0ce6af419521ab_Advantage:0\" class=\"GRcorrect\">Advantage<\/span> of the approach &#8220;default sample size&#8221; is that by choosing a fixed sample size, a certain standardization is brought in. Various tests are comparable \u2018stand an equal chance\u2019 to that respect.<\/p>\n\n\n\n<p><span id=\"GRmark_88919f9bdf8b33c81da35f4eedd432ccc90edfe1_Disadvantage:0\" class=\"GRcorrect\">Disadvantage<\/span> to this approach is that whereas the chance to reject the null hypothesis when the null hypothesis (H<sub>0<\/sub>) <span id=\"GRmark_3032c4c0aa467032c577b9f55cbf3f4aef2821fe_is:0\" class=\"GRcorrect\">is<\/span> <span id=\"GRmark_3032c4c0aa467032c577b9f55cbf3f4aef2821fe_true:1\" class=\"GRcorrect\">true<\/span> is well known, namely the self-selected alpha of<span id=\"GRmark_3032c4c0aa467032c577b9f55cbf3f4aef2821fe_.:2\" class=\"GRcorrect\">.<\/span>05, the chance to <em>not&nbsp;<\/em>reject H<sub>0<\/sub><span id=\"GRmark_32ca717fdf7a3c8957c7a8720c84e57a4f1d5f05_when:0\" class=\"GRcorrect\">when<\/span> H<sub>0<\/sub> <span id=\"GRmark_89fcd3448e0a7b153dace75dcfca7434c502f08b_is:0\" class=\"GRcorrect\">is<\/span> <em>not<\/em> true remains&nbsp;<em>unknown<\/em>. These are two <em>false decisions<\/em>, known as <a href=\"https:\/\/vwo.com\/blog\/errors-in-ab-testing\/\">type 1 error and type 2 error<\/a> respectively.<\/p>\n\n\n\n<p>A type 1 error, or <strong>alpha<\/strong>, is made when H<sub>0&nbsp;<\/sub><span id=\"GRmark_00b8a3df2470898e313232676b46c55251184a63_is:0\" class=\"GRcorrect\">is<\/span> rejected, when in fact H<sub>0<\/sub> <span id=\"GRmark_d4bab02bdaa5a8fe7e998757df0b573e50e32f1c_is:0\" class=\"GRcorrect\">is<\/span> true. Alpha is the probability of saying on the outcome of a test there is an effect <span id=\"GRmark_4e20df5fd252ac4be6cb9c6aea4da8706b5a0d5e_for:0\" class=\"GRcorrect\">for<\/span> the manipulation, while on population level there actually is none. 1-alpha is the chance to accept the <a href=\"https:\/\/vwo.com\/glossary\/null-hypothesis\/\">null hypothesis<\/a> when it is true \u2013a <em>correct decision<\/em>-. This is called <strong>reliability.<\/strong><\/p>\n\n\n\n<p>A <a href=\"https:\/\/vwo.com\/glossary\/type-2-error\/\">type 2 error<\/a>, or <strong>beta<\/strong>, is made when H<sub>0&nbsp;<\/sub><span id=\"GRmark_26438dfb925d605d69ba30b64a816f71cdd23817_is:0\" class=\"GRcorrect\">is<\/span>&nbsp;<em>not<\/em> rejected, when in fact H<sub>0<\/sub> <span id=\"GRmark_5ef1f7f8910e7a60a5b9c7232783d16025c96dc8_is:0\" class=\"GRcorrect\">is<\/span> <em>not<\/em> true. Beta is the probability of saying on the outcome of a test there is no effect <span id=\"GRmark_869dbf4dd01e0829eb4cc364f352f23619cf9475_for:0\" class=\"GRcorrect\">for<\/span> the manipulation, while on population level there actually is. 1-beta is the chance to reject the null hypothesis when it is not true\u2013a <em>correct decision<\/em>-. This is called <strong>power<\/strong>.<\/p>\n\n\n\n<p>Power is a function of alpha, sample size and effect (the effect here is the difference in conversion between the two landing pages, i.e. <span id=\"GRmark_6e6785aec62e23d36ef56c6361ca734c261292ff_at:0\" class=\"GRcorrect\">at<\/span> population level the added value of the alternate site compared to the original site). The smaller alpha, sample size or effect the smaller power is.<\/p>\n\n\n\n<p>In this example alpha is set by the analyst at.05. Sample sizes are also set by the analyst, 5000 for original, 5000 for <span id=\"GRmark_ba931fb29049b8add8b1b88b5a5647f330b3a0a6_alternative:0\" class=\"GRcorrect\">alternative<\/span>. Which leaves the effect. And the actual effect is by definition unknown. However it is not unrealistic to use commercial targets or experiential numbers as an anchor value, as was formulated by the marketer in the current case: an expected improvement from 4% to 5%. Now if that effect were really true, the marketer of course would want to find statistically significant results in a test.<\/p>\n\n\n\n<p>An example may help to make this concept insightful and to clarify the importance of power: suppose the actual (=population) conversion of the alternative page is indeed 5%. The sampling distribution <span id=\"GRmark_a5f3bc479ec17bdc97d21e3112de65cb2207b545_for:0\" class=\"GRcorrect\">for<\/span> the difference between two proportions with conversion1=4%, conversion2=5% and n1=n2=5<span id=\"GRmark_a5f3bc479ec17bdc97d21e3112de65cb2207b545_,:1\" class=\"GRcorrect\">,<\/span>000is plotted in combination with the previously shown sampling distribution <span id=\"GRmark_a5f3bc479ec17bdc97d21e3112de65cb2207b545_for:2\" class=\"GRcorrect\">for<\/span> the difference between two proportions with conversion1=conversion2=4% and n1=n2=5<span id=\"GRmark_a5f3bc479ec17bdc97d21e3112de65cb2207b545_,:3\" class=\"GRcorrect\">,<\/span>000 (figure 1).<\/p>\n\n\n\n<p><em>Figure 2: sampling distributions for the difference between two proportions with p1=p2=<span id=\"GRmark_b63d6d625925bd02a85e420a1ed7b0763652ddc4_.:0\" class=\"GRcorrect\">.<\/span>04, n1=n2=5<span id=\"GRmark_b63d6d625925bd02a85e420a1ed7b0763652ddc4_,:1\" class=\"GRcorrect\">,<\/span>000<span id=\"GRmark_b63d6d625925bd02a85e420a1ed7b0763652ddc4_(:2\" class=\"GRcorrect\">(<\/span>red line) and p1=<span id=\"GRmark_b63d6d625925bd02a85e420a1ed7b0763652ddc4_.:3\" class=\"GRcorrect\">.<\/span>04, p2=<span id=\"GRmark_b63d6d625925bd02a85e420a1ed7b0763652ddc4_.:4\" class=\"GRcorrect\">.<\/span>05, n1=n2=5<span id=\"GRmark_b63d6d625925bd02a85e420a1ed7b0763652ddc4_,:5\" class=\"GRcorrect\">,<\/span>000 (dotted blue line), with a one-sided test and a reliability of .95.<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"530\" height=\"195\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image011.png\" alt=\"Figure 2: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=5,000(red line) and p1=.04, p2=.05, n1=n2=5,000 (dotted blue line), with a one-sided test and a reliability of .95.\" class=\"wp-image-3496\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image011.png 530w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image011.png?tr=w-375 375w\" sizes=\"(max-width: 530px) 100vw, 530px\" \/><\/figure>\n<\/div>\n\n\n<p>The dotted blue line shows the sampling distribution of the difference in conversion rates between original and alternative when in reality (on population level) the original page makes <span id=\"GRmark_8814ff2288440645cfe2e9ab77fb311b908ce691_4% conversion:0\" class=\"GRcorrect\">4% conversion<\/span> and the alternate page 5%, with samples of 5,000 each. The sampling distribution <span id=\"GRmark_c9c8ac1ac62c72408374dfde6b9d88ab0a3a78e7_whenH:0\" class=\"GRcorrect\">whenH<\/span><sub>0<\/sub> <span id=\"GRmark_400d521ec47b80b6a785d7e8177da83a9038625e_is:0\" class=\"GRcorrect\">is<\/span> true, the red line, has basically shifted to the right. The modal value of this new distribution with the supposed effect of 1% is of course 1%, with random deviations both <span id=\"GRmark_d560cb488884f553d6c7f1c69eb390aa0f0088fc_tothe:0\" class=\"GRcorrect\">tothe<\/span> left and to the right.<\/p>\n\n\n\n<p>Now, all outcomes, i.e. <span id=\"GRmark_c90dc0bac753f6f73a82a37dc9111fa9600b65e6_test:0\" class=\"GRcorrect\">test<\/span> results, on the right side of the green line (marking the significance area) are regarded as significant. All observations on the left side of the green line are regarded as <em>not<\/em> significant. The area under the \u2018blue\u2019 distribution left of the significance line is beta, the chance to not reject H<sub>0<\/sub> <span id=\"GRmark_32ca717fdf7a3c8957c7a8720c84e57a4f1d5f05_when:0\" class=\"GRcorrect\">when<\/span> H<sub>0<\/sub><span id=\"GRmark_0364b7f1f75fac81a469a518970ce69e90d8b69a_is:0\" class=\"GRcorrect\">is<\/span> in fact not true (a false decision), and it covers 22% of that distribution.<\/p>\n\n\n\n<p>That makes the area under the blue distribution to the right of the significance line the power area and this area covers 78% of the sampling distribution. The probability to reject H<sub>0<\/sub> <span id=\"GRmark_32ca717fdf7a3c8957c7a8720c84e57a4f1d5f05_when:0\" class=\"GRcorrect\">when<\/span> H<sub>0<\/sub><span id=\"GRmark_9cd231916e64d00efa4c85da2bc67b68b5bd1987_is:0\" class=\"GRcorrect\">is<\/span> not true, a correct decision.<\/p>\n\n\n\n<p>So the power of this specific test with its specific parameters is .78.<\/p>\n\n\n\n<p>In 78% of the cases when this test is done, it will yield a significant effect and consequent rejecting of H<sub>0<\/sub>. Could be acceptable, or could perhaps not be acceptable; that is a question for <span id=\"GRmark_604fba226c9d62728cb10adaf678ba5cab1c16ee_marketer:0\" class=\"GRcorrect\">marketer<\/span> and analyst to agree upon.<\/p>\n\n\n\n<p>No simple matter, but important. Suppose for example that an expectation of 10% increase in conversion would be realistic as well as commercially interesting: 4.0% original versus 4.4% <span id=\"GRmark_29b8fc4134e2524ea93c125f950cad40cde18df1_for:0\" class=\"GRcorrect\">for<\/span> the alternative. Then the situation changes as follows.<\/p>\n\n\n\n<p><em>Figure 3: sampling distributions for the difference between two proportions with p1=p2=<span id=\"GRmark_d4e8d776fe34b716abd40be64e53a1ac5fc7c4dc_.:0\" class=\"GRcorrect\">.<\/span>040, n1=n2=5<span id=\"GRmark_d4e8d776fe34b716abd40be64e53a1ac5fc7c4dc_,:1\" class=\"GRcorrect\">,<\/span>000 (red line) and p1=<span id=\"GRmark_d4e8d776fe34b716abd40be64e53a1ac5fc7c4dc_.:2\" class=\"GRcorrect\">.<\/span>040, p2=<span id=\"GRmark_d4e8d776fe34b716abd40be64e53a1ac5fc7c4dc_.:3\" class=\"GRcorrect\">.<\/span>044, n1=n2=5<span id=\"GRmark_d4e8d776fe34b716abd40be64e53a1ac5fc7c4dc_,:4\" class=\"GRcorrect\">,<\/span>000 (dotted blue line), with a one-sided test and a reliability of .95.<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"535\" height=\"199\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image012.png\" alt=\"Figure 3: sampling distributions for the difference between two proportions with p1=p2=.040, n1=n2=5,000 (red line) and p1=.040, p2=.044, n1=n2=5,000 (dotted blue line), with a one-sided test and a reliability of .95. \" class=\"wp-image-3499\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image012.png 535w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image012.png?tr=w-375 375w\" sizes=\"(max-width: 535px) 100vw, 535px\" \/><\/figure>\n<\/div>\n\n\n<p>Now the power is.26. Under these circumstances the test would not make much sense, is in fact <span id=\"GRmark_fc43b7a9efb6c34c2e81bad1841334cd550d2fcf_counter-productive:0\" class=\"GRcorrect\">counter-productive<\/span>, since the chance that such test will lead to a significant result is as low as .26.<\/p>\n\n\n\n<p>The above figures are calculated and made with the application \u2018<strong>Gpower\u2019<\/strong>:<\/p>\n\n\n\n<p>This program calculates&nbsp;<strong>achieved power&nbsp;<\/strong>for many types of tests, based on desired sample size, alpha, and supposed effect.<\/p>\n\n\n\n<p>Likewise <strong>required sample size<\/strong> can be calculated from desired power, alpha and expected effect, <strong>required alpha <\/strong>can be calculated from desired power, sample size and expected effect and <strong>required effect&nbsp;<\/strong>can be calculated from desired power, alpha and sample size.<\/p>\n\n\n\n<p>Should a power of .95 be desired for a supposed p1=<span id=\"GRmark_ad52d2f08538bbed8aa69555c2182c2bbf8b9224_.:0\" class=\"GRcorrect\">.<\/span>040, p2=<span id=\"GRmark_ad52d2f08538bbed8aa69555c2182c2bbf8b9224_.:1\" class=\"GRcorrect\">.<\/span>044, then the required sample sizes are 54.428 each.<\/p>\n\n\n\n<p><em>Figure 4: sampling distributions for the difference between two proportions with p1=p2=<span id=\"GRmark_6e12f1995db1af96b0c757e4df6b2938ea1a0b03_.:0\" class=\"GRcorrect\">.<\/span>040 (red line) and p1=<span id=\"GRmark_6e12f1995db1af96b0c757e4df6b2938ea1a0b03_.:1\" class=\"GRcorrect\">.<\/span>040, p2=<span id=\"GRmark_6e12f1995db1af96b0c757e4df6b2938ea1a0b03_.:2\" class=\"GRcorrect\">.<\/span>044 (dotted blue line), using a one-sided test, with a reliability of .95 and a power of .95.<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"472\" height=\"443\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image013.jpg\" alt=\"Figure 4: sampling distributions for the difference between two proportions with p1=p2=.040 (red line) and p1=.040, p2=.044 (dotted blue line), using a one-sided test, with a reliability of .95 and a power of .95. \" class=\"wp-image-3500\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image013.jpg 472w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image013.jpg?tr=w-375 375w\" sizes=\"(max-width: 472px) 100vw, 472px\" \/><\/figure>\n<\/div>\n\n\n<p>This figure shows information omitted in previous charts. This also gives an impression of the interface of the program.<\/p>\n\n\n\n<h2 class=\"wp-block-heading has-text-align-center\"><em><a href=\"https:\/\/vwo.com\/ab-testing\/#guide-download\"><span style=\"text-decoration: underline\">Download Free: A\/B Testing Guide<\/span><\/a><\/em><\/h2>\n\n\n\n<p><strong>Important aspects of power analysis<\/strong> are careful evaluation of the consequences of rejecting the null hypothesis when the null hypothesis is in fact true &#8211; e.g.&nbsp;<span id=\"GRmark_a6fd421fce61446c1d3c32a259f3ed641213152d_based:0\" class=\"GRcorrect\">based<\/span> on test results a costly campaign is implemented under the assumption that it will be a success and that success doesn\u2019t come true &#8211; and the consequences of not rejecting the null hypothesis when the null hypothesis is not true -e.g. <span id=\"GRmark_e93875f02a26ac76cd83189752023d558cece073_based:0\" class=\"GRcorrect\">based<\/span> on test results a campaign is not implemented, whereas it would have been a success.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Solution: &#8220;default number of conversions&#8221;&nbsp;<\/strong><\/h4>\n\n\n\n<p>The analyst says: split run with a minimum of 100 conversions per competing page and a one-sided test with a reliability of .95.<\/p>\n\n\n\n<p>In the current case with expected conversion of the original page 4% and expected conversion of the alternate page 5%, a minimum of 2,500 observations per page will be advised.<\/p>\n\n\n\n<p>When put to the power test though, this scenario demonstrates a power of just <span id=\"GRmark_2c487a75ee5604713eb2a67071c65114119cf032_little:0\" class=\"GRcorrect\">little<\/span> over .5.<\/p>\n\n\n\n<p><em>Figure 5: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=2500 (red line) and p1=.04, p2=.05, n1=n2=2500 (dotted blue line) rusing a one-sided test, with a reliability of .95.<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"472\" height=\"447\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image014.jpg\" alt=\"Figure 5: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=2,500 (red line) and p1=.04, p2=.05, n1=n2=2,500 (dotted blue line)using a one-sided test, with a reliability of .95.\" class=\"wp-image-3503\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image014.jpg 472w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image014.jpg?tr=w-375 375w\" sizes=\"(max-width: 472px) 100vw, 472px\" \/><\/figure>\n<\/div>\n\n\n<p>For a better power, a greater effect should be present, a larger sample size must be chosen, or alpha should be increased, e.g. <span id=\"GRmark_c26a9d806eedca601e8e963a485764d6076660da_to:0\" class=\"GRcorrect\">to<\/span> .2:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"470\" height=\"441\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image015.jpg\" alt=\"Figure 6: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=2,500 (red line) and p1=.04, p2=.05, n1=n2=2,500 (dotted blue line), using a one-sided test, with a reliability of .80. \" class=\"wp-image-3504\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image015.jpg 470w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image015.jpg?tr=w-375 375w\" sizes=\"(max-width: 470px) 100vw, 470px\" \/><\/figure>\n<\/div>\n\n\n<p>An alpha of .2 returns a power of .8. The power is more acceptable; the &#8216;cost &#8216; for this bigger power consists of a magnified chance to reject H<sub>0<\/sub> <span id=\"GRmark_32ca717fdf7a3c8957c7a8720c84e57a4f1d5f05_when:0\" class=\"GRcorrect\">when<\/span> H<sub>0&nbsp;<\/sub><span id=\"GRmark_6da4e5297ea0d6ca4c3322eba6e5be5670e58756_is:0\" class=\"GRcorrect\">is<\/span> actually true.<\/p>\n\n\n\n<p>Again, business considerations involving the impact of alpha and beta play a key role in such decisions.<\/p>\n\n\n\n<p>Approach &#8220;default number of conversions&#8221; with its rule of thumb on the number of conversions actually puts a kind of limit on effect sizes that still make sense to be put to a test (i.e. with a reasonable power). In that regard it also comprises a sort of standardization and that in itself is not a problem, as long as its consequences are understood and <span id=\"GRmark_21f61fc9d3c3db870ab26eb5b6d95ef7f5c7afa1_recognised:0\" class=\"GRcorrect\">recognized<\/span>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Solution: \u201csignificant sample result\u201d<\/strong><\/h4>\n\n\n\n<p>The analyst says: split run with enough observations to get a statistical significant result if in the test the supposed effect and actually occurs, tested one-sided with a reliability of .95.<\/p>\n\n\n\n<p>That sounds a little weird, and it is. Unfortunately this logic is often applied in practice. The required sample size is basically calculated assuming the supposed effect to actually occur in the sample.<\/p>\n\n\n\n<p>In the used example: if in a test the original has a conversion of 4% and he alternative 5%, then 2,800 cases per group would be necessary to reach statistical significance. This can be demonstrated with the accompanying <span id=\"GRmark_3e61cd79cef49d7f3215980e06e3711690106ffd_spss:0\" class=\"GRcorrect\">spss<\/span> syntax (<span id=\"GRmark_3e61cd79cef49d7f3215980e06e3711690106ffd_limit:1\" class=\"GRcorrect\">limit<\/span> at significant test result<span id=\"GRmark_3e61cd79cef49d7f3215980e06e3711690106ffd_.:2\" class=\"GRcorrect\">.sps<\/span>).<\/p>\n\n\n\n<p>These sort of calculations are applied by various online tools offering to calculate sample size. This approach ignores the concept of random sampling error, thus ignoring the essence of inferential statistics and null hypothesis testing. In practice, this will always yield a power of .5 plus a small additional excess.<\/p>\n\n\n\n<p><em>Figure7: sampling distributions for the difference between two proportions with p1=p2=<span id=\"GRmark_a4ed744405fb2f8890de7cf11941a2142b92773e_.:0\" class=\"GRcorrect\">.<\/span>04, n1=n2=2800 (red line) and p1=<span id=\"GRmark_a4ed744405fb2f8890de7cf11941a2142b92773e_.:1\" class=\"GRcorrect\">.<\/span>04, p2=<span id=\"GRmark_a4ed744405fb2f8890de7cf11941a2142b92773e_.:2\" class=\"GRcorrect\">.<\/span>05, n1=n2=2800 (dotted blue line), using a one-sided test, with a reliability of .95.<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"475\" height=\"450\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image016.jpg\" alt=\"Figure7: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=2800 (red line) and p1=.04, p2=.05, n1=n2=2800 (dotted blue line), using a one-sided test, with a reliability of .95.\" class=\"wp-image-3507\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image016.jpg 475w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image016.jpg?tr=w-375 375w\" sizes=\"(max-width: 475px) 100vw, 475px\" \/><\/figure>\n<\/div>\n\n\n<p>Using this system a sort of <span id=\"GRmark_ea9a21161872c091d8c0e7d7425ca4b334b1a848_standardisation:0\" class=\"GRcorrect\">standardisation<\/span> is actually also applied, namely <span id=\"GRmark_ea9a21161872c091d8c0e7d7425ca4b334b1a848_on:1\" class=\"GRcorrect\">on<\/span> power, but that&#8217;s not the apparent goal this method was invented for.<\/p>\n\n\n<h4 class=\"js-cro-guide-subheading gtm_heading \" data-level=\"level2\" data-menu=\"\" id=\"\" data-menu-id=\"\" style=\"text-align:left\"><strong>Solution: \u201cdefault reliability and power\u201d<\/strong><\/h4>\n\n\n<p>The analyst says: split run with a power of .8 and a reliability of .95 with a one-sided test.<\/p>\n\n\n\n<p>In the current case with 4% conversion for original page versus 5% expected conversion for the alternate page, alpha=<span id=\"GRmark_faba12a716ec02a852c4593131290dcf1721a933_.:0\" class=\"GRcorrect\">.<\/span>05 and power=<span id=\"GRmark_faba12a716ec02a852c4593131290dcf1721a933_.:1\" class=\"GRcorrect\">.<\/span>80, Gpower advises two samples of 5313.<\/p>\n\n\n\n<p><em>Figure 8: sampling distributions for the difference between two proportions with p1=p2=<span id=\"GRmark_3a174c6d31bd268013d82214e4c51e6f599fad76_.:0\" class=\"GRcorrect\">.<\/span>04<span id=\"GRmark_3a174c6d31bd268013d82214e4c51e6f599fad76_(:1\" class=\"GRcorrect\">(<\/span>red line) and p1=<span id=\"GRmark_3a174c6d31bd268013d82214e4c51e6f599fad76_.:2\" class=\"GRcorrect\">.<\/span>04, p2=<span id=\"GRmark_3a174c6d31bd268013d82214e4c51e6f599fad76_.:3\" class=\"GRcorrect\">.<\/span>05 (dotted blue line), using a one-sided test with&nbsp;reliability&nbsp;.95 and power .80.<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"463\" height=\"437\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image017.jpg\" alt=\"Figure 8: sampling distributions for the difference between two proportions with p1=p2=.04(red line) and p1=.04, p2=.05(dotted blue line), using a one-sided test with reliablity .95 and power .80. \" class=\"wp-image-3508\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image017.jpg 463w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image017.jpg?tr=w-375 375w\" sizes=\"(max-width: 463px) 100vw, 463px\" \/><\/figure>\n<\/div>\n\n\n<p>This approach uses desired reliability, expected effect&nbsp;<em>and<\/em> desired power in the calculation of the required sample size.<\/p>\n\n\n\n<p>Now the analyst has grip on the probability an expected\/desired\/necessary effect will lead to statistically significant results in a test, namely .8.<\/p>\n\n\n\n<p>Some online tools, for example&nbsp;VWO&#8217;s <a href=\"https:\/\/vwo.com\/tools\/ab-test-duration-calculator\/\">Split Test Duration Calculator<\/a>, use the concept of power in their sample size calculation.<\/p>\n\n\n\n<p>In a presentation by VWO &#8220;<a href=\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/\">Visitors needed for A\/B testing<\/a>&#8221;&nbsp;a power of .8 is mentioned as a regular measure.<\/p>\n\n\n\n<p>It can be questioned why that should be an acceptable rule? Why could the size of the power, as well as the size of the reliability not be used more dynamically?<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Solution: &#8220;desired reliability and power\u201d<\/strong><\/h4>\n\n\n\n<p>The analyst says: split run with desired power and reliability using a one-sided test.<\/p>\n\n\n\n<p>Follows a discussion on what is acceptable power and reliability in this case, with as a conclusion, say, both 90%. Result according to Gpower, 2 times 5.645 observations:<\/p>\n\n\n\n<p><em>Figure 9: sampling distributions for the difference between two proportions with p1=p2=<span id=\"GRmark_62e5222e7ed2938259385142ee835de57cf4e7fe_.:0\" class=\"GRcorrect\">.<\/span>04 (red line) and p1=<span id=\"GRmark_62e5222e7ed2938259385142ee835de57cf4e7fe_.:1\" class=\"GRcorrect\">.<\/span>04, p2=<span id=\"GRmark_62e5222e7ed2938259385142ee835de57cf4e7fe_.:2\" class=\"GRcorrect\">.<\/span>05 (dotted blue line), using a one-sided test with reliability=<span id=\"GRmark_62e5222e7ed2938259385142ee835de57cf4e7fe_.:3\" class=\"GRcorrect\">.<\/span>90 and power=<span id=\"GRmark_62e5222e7ed2938259385142ee835de57cf4e7fe_.:4\" class=\"GRcorrect\">.<\/span>90.<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"472\" height=\"446\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image018.jpg\" alt=\"Figure 9: sampling distributions for the difference between two proportions with p1=p2=.04 (red line) and p1=.04, p2=.05 (dotted blue line), using a one-sided test with reliability=.90 and power=.90. \" class=\"wp-image-3509\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image018.jpg 472w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image018.jpg?tr=w-375 375w\" sizes=\"(max-width: 472px) 100vw, 472px\" \/><\/figure>\n<\/div>\n\n\n<p>What if the marketer says &#8220;It takes too long to gather that many observations. The landing page will then not be important anymore. There is room for a total of 3,000 test observations. Reliability is equally important as power. The test should preferably be carried out and a decision should follow&#8221;?<\/p>\n\n\n\n<p>Result on the basis of this constraint: reliability and power <span id=\"GRmark_ab8afd2b29281a0203ef5a545ceaf31a52fad2a3_both:0\" class=\"GRcorrect\">both<\/span> .75. If this doesn\u2019t pose problems for those concerned, the test may continue on the basis of alpha=<span id=\"GRmark_90246b21ff11147fb4bd49c287df55d3e193b006_.:0\" class=\"GRcorrect\">.<\/span>25 and power=<span id=\"GRmark_90246b21ff11147fb4bd49c287df55d3e193b006_.:1\" class=\"GRcorrect\">.<\/span>75.<\/p>\n\n\n\n<p><em>Figure 10: sampling distributions for the difference between two proportions with p1=p2=<span id=\"GRmark_dd2271e45f088be94a13866e8d5b84e1d4bd163e_.:0\" class=\"GRcorrect\">.<\/span>04, n1=n2=1500 (red line), and p1=<span id=\"GRmark_dd2271e45f088be94a13866e8d5b84e1d4bd163e_.:1\" class=\"GRcorrect\">.<\/span>04, p2=<span id=\"GRmark_dd2271e45f088be94a13866e8d5b84e1d4bd163e_.:2\" class=\"GRcorrect\">.<\/span>05, n1=n2=1500<span id=\"GRmark_dd2271e45f088be94a13866e8d5b84e1d4bd163e_(:3\" class=\"GRcorrect\">(<\/span>dotted blue line), using a one-sided test with equal reliability and power.<\/em><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"470\" height=\"446\" src=\"https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image019.jpg\" alt=\"Figure 10: sampling distributions for the difference between two proportions with p1=p2=.04, n1=n2=1500 (red line), and p1=.04, p2=.05, n1=n2=1500(dotted blue line), using a one-sided test with equal reliability and power. \" class=\"wp-image-3510\" srcset=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/image019.jpg 470w, https:\/\/static.wingify.com\/vwo\/uploads\/sites\/3\/2012\/08\/image019.jpg?tr=w-375 375w\" sizes=\"(max-width: 470px) 100vw, 470px\" \/><\/figure>\n<\/div>\n\n\n<p>This approach allows for flexible choice of reliability and power. The consequent lack of standardization is a disadvantage.<\/p>\n\n\n<h2 class=\"js-cro-guide-subheading gtm_heading \" data-level=\"level1\" data-menu=\"Conclusion\" id=\"conclusion\" data-menu-id=\"conclusion\" style=\"text-align:left\"><strong>Conclusion<\/strong><\/h2>\n\n\n<p>There are multiple approaches to calculate the required sample size, from questionable logic to strongly substantiated.<\/p>\n\n\n\n<p>For strategically important \u2018crucial experiments\u2019, preference goes out to the most comprehensive method in which both &#8220;desired reliability and power&#8221; are involved in the calculation. If there is no possibility of checking against prior effects, an effect can be estimated using a pilot with &#8220;default sample size&#8221; or &#8220;default number of conversions&#8221;.<\/p>\n\n\n\n<p>For the majority of decisions throughout the year \u201cdefault reliability and power\u201d is recommended, for reasons of comparability between tests.<\/p>\n\n\n\n<p>Working with the recommended approaches based on calculated risk will lead to valuable optimization and correct decision making.<\/p>\n\n\n\n<p><em>Note: Screenshots used in the blog belong to the author.<\/em><\/p>\n\n\n<h2 class=\"js-cro-guide-subheading gtm_heading \" data-level=\"level1\" data-menu=\"FAQs on A\/B Testing Sample Size\" id=\"faqs-on-a-b-testing-sample-size\" data-menu-id=\"faqs-on-a-b-testing-sample-size\" style=\"text-align:left\">FAQs on A\/B testing sample size<\/h2>\n\n\n<div class=\"schema-faq wp-block-yoast-faq-block\"><div class=\"schema-faq-section\" id=\"faq-question-1580295142333\"><strong class=\"schema-faq-question\">What is the formula for determining sample size? <\/strong> <p class=\"schema-faq-answer\">There are multiple approaches to determine the required sample size for A\/B testing. For strategically important \u2018crucial experiments\u2019, preference goes out to the most comprehensive method in which both \u201cdesired reliability and power\u201d are involved in the calculation.<\/p> <\/div> <div class=\"schema-faq-section\" id=\"faq-question-1580295173233\"><strong class=\"schema-faq-question\">What should be the required sample size for an ab test? <\/strong> <p class=\"schema-faq-answer\">In the online world the possibilities for a\/b testing just about anything are immense. The sample size should be large enough to demonstrate with statistical significance that the alternative version is better than the original. <\/p> <\/div> <\/div>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>(This post is a scientific explanation of the optimal sample size for your tests to hold true statistically. VWO&#8217;s test reporting is engineered in a way that you would not waste your time looking up p-values or determining statistical significance &#8211; the platform reports &#8216;probability to win&#8217; and makes test results easy to interpret. Sign&#8230;<\/p>\n","protected":false},"author":17,"featured_media":56979,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"post_read_time":14,"footnotes":""},"categories":[10558],"tags":[],"feature":[1852,10526],"industry-type":[],"product":[10626],"role":[10636],"region":[],"class_list":["post-3483","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-calculator","feature-ab-testing","feature-experimentation-platform"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Learn How to Calculate A\/B Testing Sample Sizes<\/title>\n<meta name=\"description\" content=\"In this detailed post, learn about the different approaches for calculating the desired sample size for creating &amp; starting A\/B tests.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Learn How to Calculate A\/B Testing Sample Sizes\" \/>\n<meta property=\"og:description\" content=\"In this detailed post, learn about the different approaches for calculating the desired sample size for creating &amp; starting A\/B tests.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/\" \/>\n<meta property=\"og:site_name\" content=\"Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/vwoofficial\/\" \/>\n<meta property=\"article:published_time\" content=\"2012-08-22T13:20:18+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-05-01T10:16:12+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/OG-image_How-to-Calculate-AB-Testing-Sample-Sizes.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Kees Schippers\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@VWO\" \/>\n<meta name=\"twitter:site\" content=\"@VWO\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kees Schippers\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/\"},\"author\":{\"name\":\"Kees Schippers\",\"@id\":\"https:\/\/vwo.com\/blog\/#\/schema\/person\/c430ae4e61c3fb5b63536e40ac4e6a52\"},\"headline\":\"How to Calculate A\/B Testing Sample Sizes?\",\"datePublished\":\"2012-08-22T13:20:18+00:00\",\"dateModified\":\"2025-05-01T10:16:12+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/\"},\"wordCount\":2898,\"publisher\":{\"@id\":\"https:\/\/vwo.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/Feature-image_How-to-Calculate-AB-Testing-Sample-Sizes.png\",\"articleSection\":[\"Calculator\"],\"inLanguage\":\"en-US\"},{\"@type\":[\"WebPage\",\"FAQPage\"],\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/\",\"url\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/\",\"name\":\"Learn How to Calculate A\/B Testing Sample Sizes\",\"isPartOf\":{\"@id\":\"https:\/\/vwo.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/Feature-image_How-to-Calculate-AB-Testing-Sample-Sizes.png\",\"datePublished\":\"2012-08-22T13:20:18+00:00\",\"dateModified\":\"2025-05-01T10:16:12+00:00\",\"description\":\"In this detailed post, learn about the different approaches for calculating the desired sample size for creating & starting A\/B tests.\",\"breadcrumb\":{\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#breadcrumb\"},\"mainEntity\":[{\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295142333\"},{\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295173233\"}],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#primaryimage\",\"url\":\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/Feature-image_How-to-Calculate-AB-Testing-Sample-Sizes.png\",\"contentUrl\":\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/Feature-image_How-to-Calculate-AB-Testing-Sample-Sizes.png\",\"width\":1200,\"height\":700,\"caption\":\"How To Calculate Ab Testing Sample Sizes?\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/vwo.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Calculator\",\"item\":\"https:\/\/vwo.com\/blog\/calculator\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"How to Calculate A\/B Testing Sample Sizes?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/vwo.com\/blog\/#website\",\"url\":\"https:\/\/vwo.com\/blog\/\",\"name\":\"Blog\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/vwo.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/vwo.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/vwo.com\/blog\/#organization\",\"name\":\"VWO\",\"url\":\"https:\/\/vwo.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/vwo.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2018\/09\/VWOLogo.png\",\"contentUrl\":\"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2018\/09\/VWOLogo.png\",\"width\":780,\"height\":492,\"caption\":\"VWO\"},\"image\":{\"@id\":\"https:\/\/vwo.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/vwoofficial\/\",\"https:\/\/x.com\/VWO\",\"https:\/\/www.instagram.com\/vwoofficial\/\",\"https:\/\/www.linkedin.com\/company\/vwo\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/vwo.com\/blog\/#\/schema\/person\/c430ae4e61c3fb5b63536e40ac4e6a52\",\"name\":\"Kees Schippers\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/vwo.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9e9a4e152dff5ac17292f9a76fcfa21ef884b2261be21c19d830af52824d438b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9e9a4e152dff5ac17292f9a76fcfa21ef884b2261be21c19d830af52824d438b?s=96&d=mm&r=g\",\"caption\":\"Kees Schippers\"},\"description\":\"I am a passionate Marketing Data Analyst. I offer hands on training in Statistics & Data Mining with IBM SPSS, KNIME and R.\",\"url\":\"https:\/\/vwo.com\/blog\/author\/keesschippers\/\"},{\"@type\":\"Question\",\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295142333\",\"position\":1,\"url\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295142333\",\"name\":\"What is the formula for determining sample size?\",\"answerCount\":1,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"There are multiple approaches to determine the required sample size for A\/B testing. For strategically important \u2018crucial experiments\u2019, preference goes out to the most comprehensive method in which both \u201cdesired reliability and power\u201d are involved in the calculation.\",\"inLanguage\":\"en-US\"},\"inLanguage\":\"en-US\"},{\"@type\":\"Question\",\"@id\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295173233\",\"position\":2,\"url\":\"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295173233\",\"name\":\"What should be the required sample size for an ab test?\",\"answerCount\":1,\"acceptedAnswer\":{\"@type\":\"Answer\",\"text\":\"In the online world the possibilities for a\/b testing just about anything are immense. The sample size should be large enough to demonstrate with statistical significance that the alternative version is better than the original. \",\"inLanguage\":\"en-US\"},\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Learn How to Calculate A\/B Testing Sample Sizes","description":"In this detailed post, learn about the different approaches for calculating the desired sample size for creating & starting A\/B tests.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/","og_locale":"en_US","og_type":"article","og_title":"Learn How to Calculate A\/B Testing Sample Sizes","og_description":"In this detailed post, learn about the different approaches for calculating the desired sample size for creating & starting A\/B tests.","og_url":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/","og_site_name":"Blog","article_publisher":"https:\/\/www.facebook.com\/vwoofficial\/","article_published_time":"2012-08-22T13:20:18+00:00","article_modified_time":"2025-05-01T10:16:12+00:00","og_image":[{"width":1200,"height":630,"url":"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/OG-image_How-to-Calculate-AB-Testing-Sample-Sizes.png","type":"image\/png"}],"author":"Kees Schippers","twitter_card":"summary_large_image","twitter_creator":"@VWO","twitter_site":"@VWO","twitter_misc":{"Written by":"Kees Schippers","Est. reading time":"16 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#article","isPartOf":{"@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/"},"author":{"name":"Kees Schippers","@id":"https:\/\/vwo.com\/blog\/#\/schema\/person\/c430ae4e61c3fb5b63536e40ac4e6a52"},"headline":"How to Calculate A\/B Testing Sample Sizes?","datePublished":"2012-08-22T13:20:18+00:00","dateModified":"2025-05-01T10:16:12+00:00","mainEntityOfPage":{"@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/"},"wordCount":2898,"publisher":{"@id":"https:\/\/vwo.com\/blog\/#organization"},"image":{"@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#primaryimage"},"thumbnailUrl":"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/Feature-image_How-to-Calculate-AB-Testing-Sample-Sizes.png","articleSection":["Calculator"],"inLanguage":"en-US"},{"@type":["WebPage","FAQPage"],"@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/","url":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/","name":"Learn How to Calculate A\/B Testing Sample Sizes","isPartOf":{"@id":"https:\/\/vwo.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#primaryimage"},"image":{"@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#primaryimage"},"thumbnailUrl":"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/Feature-image_How-to-Calculate-AB-Testing-Sample-Sizes.png","datePublished":"2012-08-22T13:20:18+00:00","dateModified":"2025-05-01T10:16:12+00:00","description":"In this detailed post, learn about the different approaches for calculating the desired sample size for creating & starting A\/B tests.","breadcrumb":{"@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#breadcrumb"},"mainEntity":[{"@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295142333"},{"@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295173233"}],"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#primaryimage","url":"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/Feature-image_How-to-Calculate-AB-Testing-Sample-Sizes.png","contentUrl":"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2012\/08\/Feature-image_How-to-Calculate-AB-Testing-Sample-Sizes.png","width":1200,"height":700,"caption":"How To Calculate Ab Testing Sample Sizes?"},{"@type":"BreadcrumbList","@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/vwo.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Calculator","item":"https:\/\/vwo.com\/blog\/calculator\/"},{"@type":"ListItem","position":3,"name":"How to Calculate A\/B Testing Sample Sizes?"}]},{"@type":"WebSite","@id":"https:\/\/vwo.com\/blog\/#website","url":"https:\/\/vwo.com\/blog\/","name":"Blog","description":"","publisher":{"@id":"https:\/\/vwo.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/vwo.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/vwo.com\/blog\/#organization","name":"VWO","url":"https:\/\/vwo.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/vwo.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2018\/09\/VWOLogo.png","contentUrl":"https:\/\/static.wingify.com\/gcp\/uploads\/sites\/3\/2018\/09\/VWOLogo.png","width":780,"height":492,"caption":"VWO"},"image":{"@id":"https:\/\/vwo.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/vwoofficial\/","https:\/\/x.com\/VWO","https:\/\/www.instagram.com\/vwoofficial\/","https:\/\/www.linkedin.com\/company\/vwo"]},{"@type":"Person","@id":"https:\/\/vwo.com\/blog\/#\/schema\/person\/c430ae4e61c3fb5b63536e40ac4e6a52","name":"Kees Schippers","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/vwo.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9e9a4e152dff5ac17292f9a76fcfa21ef884b2261be21c19d830af52824d438b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9e9a4e152dff5ac17292f9a76fcfa21ef884b2261be21c19d830af52824d438b?s=96&d=mm&r=g","caption":"Kees Schippers"},"description":"I am a passionate Marketing Data Analyst. I offer hands on training in Statistics & Data Mining with IBM SPSS, KNIME and R.","url":"https:\/\/vwo.com\/blog\/author\/keesschippers\/"},{"@type":"Question","@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295142333","position":1,"url":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295142333","name":"What is the formula for determining sample size?","answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"There are multiple approaches to determine the required sample size for A\/B testing. For strategically important \u2018crucial experiments\u2019, preference goes out to the most comprehensive method in which both \u201cdesired reliability and power\u201d are involved in the calculation.","inLanguage":"en-US"},"inLanguage":"en-US"},{"@type":"Question","@id":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295173233","position":2,"url":"https:\/\/vwo.com\/blog\/how-to-calculate-ab-test-sample-size\/#faq-question-1580295173233","name":"What should be the required sample size for an ab test?","answerCount":1,"acceptedAnswer":{"@type":"Answer","text":"In the online world the possibilities for a\/b testing just about anything are immense. The sample size should be large enough to demonstrate with statistical significance that the alternative version is better than the original. ","inLanguage":"en-US"},"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/posts\/3483","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/comments?post=3483"}],"version-history":[{"count":67,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/posts\/3483\/revisions"}],"predecessor-version":[{"id":95620,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/posts\/3483\/revisions\/95620"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/media\/56979"}],"wp:attachment":[{"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/media?parent=3483"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/categories?post=3483"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/tags?post=3483"},{"taxonomy":"feature","embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/feature?post=3483"},{"taxonomy":"industry-type","embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/industry-type?post=3483"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/product?post=3483"},{"taxonomy":"role","embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/role?post=3483"},{"taxonomy":"region","embeddable":true,"href":"https:\/\/vwo.com\/blog\/wp-json\/wp\/v2\/region?post=3483"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}