VWO Testing
A/B Testing

A/B Testing Solutions to Maximize Your Website Conversions

Create and A/B test different versions of your website and its elements to continuously discover the best-performing versions that improve your conversions.

Know More
VWO GDPR Ready Badge
VWO CCPA Ready Badge
VWO G2Crowd Leader Spring Badge
VWO BS 10012 Certification Badge
SOC 2 logo
Follow us and stay on top of everything CRO
Related content:

Appsumo Reveals its A/B Testing Secret: Only 1 Out of 8 Tests Produce Results

5 Min Read

This is the 2nd article in the series of interviews and guest posts we are doing on this blog regarding A/B testing and conversion rate optimization. In the first article, we interviewed Oli from Unbounce on Landing Pages Best Practices.

Editor’s note: This guest post is written by Noah Kagan, founder of web app deals website Appsumo. I have known Noah for quite some time and he is the go-to person for any kind of marketing or product management challenges. You can follow him on Twitter @noahkagan. In the article below Noah shares some of the A/B testing secrets and realities that he discovered after doing hundreds of tests on Appsumo.

Download Free: A/B Testing Guide

Only 1 out of 8 A/B tests have driven significant change

AppSumo.com reaches around 5,000 visitors a day. A/B testing has given us some dramatic gains such as increasing our email conversion over 5x and doubling our purchase conversion rate.

However, I wanted to share some harsh realities about our testing experiences. I hope sharing this helps encourage you not to give up on testing and get the most out of it. Here’s a data point that will most likely surprise you:

Only 1 out of 8 A/B tests have driven significant change.

That’s preposterous. Not just a great vocab word but a harsh reality. Here are a few tests from us that I was SURE would produce amazing results only to disappoint us later.

A/B test #FAIL 1

Hypothesis: Title testing. We get a lot of traffic to our landing page and having a more clear message will significantly increase conversions.

first example of attempt at website messaging for a/b test
second example of attempt at website messaging for a a/b test
third example of attempt at website messaging for a/b test

Result: Not-conclusive. We’ve tried over 8 versions and so far not one has produced any significant improvement.

Why it failed: People don’t read. (Note: the real answer here is “I don’t know why it didn’t work out, that’s why I’m doing AB testing”)

Suggestion: We need more drastic changes to our page like showing more info about our deals or pictures to encourage a better conversion rate.

A/B test #FAIL 2

Hypothesis: Having a tweet for a discount pop-up in a light-box vs someone having to click a button to tweet. Assumed reducing a click and putting it (annoyingly) in front of someone’s face will encourage more tweets.

an example of pop up on appsumo's website

Result: 10% decrease with light-box version.

Why it failed: ANNOYING. Totally agree. Also, it was premature as people had no idea about it nor were interested in tweeting at that moment.

Suggestion: Better integrate people’s desire to share into our site design.

A/B test #FAIL 3

Hypothesis: A discount would encourage more people to give us their email on our landing page.

a/b test to try if showing discount on website messaging works

Result: Fail. Decreased conversion to email on our landing page.

Why it failed: An email is a precious resource and we are dealing with sophisticated users. Unless you are familiar with our brand which is a small audience then you aren’t super excited to trade your email for % off.

Suggestion: Give away $ instead of % off. Also, offer the % off with examples of deals so they can see what they could use it for.

Explore: A/B Testing Tools

Thoughts on failed A/B tests

All of these were a huge surprise and a disappointment for me.

How many times have you said, “This experience is 100x better, I can’t wait to see how much it beats the original version?”

A few days later you check your testing dashboard to see it actually LOSING.

Word of caution. Be aware of premature e-finalization. Don’t end tests before data is finalized (aka statistically significant).

I learned the majority of my testing philosophy at SpeedDate where literally every change is tested and measured. SO MANY times my tests initially blew the original version away only to find out a few days later that a) the improvement wasn’t as amazing after all or b) it actually lost.

How can you get the most out of your tests?

Some A/B testing tips based on my experience:

  • Weekly iterations. This is the most effective way I’ve found to do A/B testing.
    • Pick only 1 thing you want to improve. Let’s say it’s conversion rate to buying on the first-time visitors
    • Get a benchmark of what that conversion rate is
    • Do 1-3 tests per week to increase that
    • Do it every week until you hit some internal goal you’ve set for yourself
  • Most people test 80 different things instead of 1 priority over and over. It simplifies your life.
  • Patience. Realize to get results it may take a few thousand visits or 2 weeks. Pick bigger changes to test so you aren’t waiting around for small improvements.
  • Persistence. Knowing that 7 out of 8 of your tests will produce insignificant improvements should comfort you that you aren’t doing it wrong. That’s just how it is. How badly do you want those improvements? Stick with it.
  • Focus on the big. I say this way too much but you still won’t listen. Some will and they’ll see big results from this. If you have to wait 3-14 days for your A/B tests to finish then you’d rather have dramatic changes like -50% or 200% than a 1-2% change. This may depend on where you are in your business but likely you aren’t Amazon so 1% improvements won’t make you a few million dollars more.

If you like this article follow @appsumo for more details and check out Appsumo.com for fun deals.

Editor’s note: Hope you liked the guest post. It is true that many A/B tests produce insignificant results and that’s precisely the reason that you should be doing A/B testing all the time. For the next articles in this series, if you know someone whom I can interview or want to contribute a guest post yourself, please get in touch with me (paras@wingify.com).

Paras Chopra
Paras Chopra I started Wingify in early 2009 to enable businesses to design and deploy great customer experiences for their websites and apps. I have a background in machine learning and am a gold medalist from Delhi College of Engineering. I have been featured twice in the Forbes 30 under 30 list - India and Asia. I'm an entrepreneur by profession and my curiosity is wide-ranging. Follow me at @paraschopra on Twitter. You can email me at paras@wingify.com

Comments (21)

  1. 1) You’re not getting enough traffic to drive results. 5000 visitors a day is tiny.

    2) 10% change is meaningful, not a “Fail”..

    1. @Matt: there is no thumb rule for traffic. You can even get significant results at 100 visitors a day. It depends on a lot of traffic.

      10% change was in negative direction. It was a fail for sure.

  2. I think it’s cool to see (A) that you tried so many things and (B) you really thought about why they did or didn’t work. I just ran some advertisements, and it was a nigh-total fail; but I learned a lot, and thought through some things that I wouldn’t have *had* to think through otherwise.

    We’re taught about Edison and his 10,000 (or whatever) light-bulb failures until he found the one solution that worked, but most folks don’t realize the same idea can apply to their work.

  3. Great post – thank you for being so open. Can you elaborate on “focus on the big”? It sounds like an important point – did you mean getting statistically significant experiments, or choosing major A/B changes rather than the ornamental?

    1. @Tal: I will let Noah comment on this but according to my understanding, he meant doing big, bold changes in A/B tests rather than small changes like changing color of headline and stuff.

  4. Nice. I don’t think these are “failures” though – knowing that you were doing the right thing in the first place is certainly valuable information.

  5. @Tal

    @Matt who left the first comment said our 5,000 / visitors a day was small and he’s right.

    Point being is you want to go for biggest wins, especially if you have small traffic amounts since it’ll take longer to get definitive results.

    Too many people are testing for minor changes like button color for increasing conversion when they only have 100 visitors a day. For example, if each buyer on your site is worth $10,000 and you have 10 visits a day. It’s way more ROI to focus on growth than conversion or retention.

    I tend to aim for the most drastic changes and then scale back form there. Here’s a great article from Seth Godin about how people are testing too much, http://sethgodin.typepad.com/seths_blog/2011/01/a-culture-of-testing.html

    Good luck.

  6. Nice post (thank you, Paras and Noah) — love the sharing of what works and what doesn’t.

    I wouldn’t be so quick to give up on some of your testing ideas, Noah. In my opinion, it was primarily the execution on validating the hypotheses that ‘failed’ here (sorry… I am a direct person).

    People absolutely do read headlines. They’re a great opportunity for conversion optimization… but you must first have a strong grasp on the factors that infuence conversion — like motivation, value proposition, anxiety, etc.

    The headlines you tested above are merely tag lines — and tag lines have a low probability to increase conversion. You should go back to the drawing board and come up with some headlines that tap into the motivation of your site visitors. First learn about what motivates people to look for your (or similar) solutions… and then amplify that learning through great, clear copy in your headline test!

    Lance

    1. @Lance: thanks for your inputs! As we were discussing, I will be following up with you for an interview in this series. Will be excited to hear your point of view.

  7. I agree with Lance’s last comment. The purpose of testing is not to find out what works, but rather to find out what does NOT work. The tests by Noah reveal a rather large amount of information and insight towards future testing. In fact, when a test “works” — and I use quotes on that to mean “does what we wanted it to do by supporting the hypothesis in some way” — we often learn *less* because we over-interpret the success. As Lance also pointed out , it isn’t the headlines (or the pop-up) that is the problem, it’s the contextual basis under which they were presented. To paraphrase Bill S. : “The fault likes not in our tests, but in ourselves”. That is where you go to find actual insight that ends up leading to better tests (“what assumptions did I build into that test, and are they all valid?”, “if I were sitting across the table from this prospect they would need X Y and Z at this point to continue — so is my test creating a roadblock to that? (crappy headlines, premature popups” etc

    @Paras: There ARE rules of thumb for traffic, which is that the more homogeneous the traffic, the smaller the variance you can expect in the sample of visitors versus the population of visitors. If you had a site that was geared towards something specific — say, late stage Lung cancer patients — you don’t need nearly as large a set of traffic to get meaningful results than with a broader spectrum of, say, eBay shoppers. That is not a trivial meme to keep in mind as not only the size of your test samples will be driven by that concern but also the frequency of the tests and the overall testing schedule you keep.

    So while 5000 visitors per day is small when all one is doing is comparing how big your set of visitors is versus mine, the real measure is how segmented is the audience and does there exists large discrepancies for what is needed for each segment in order to proceed.

    1. @John: thanks for your detailed comment. By thumb rules I meant, you can’t throw around a figure like one needs at least 1000 visitors a day to get statistically significant results (irrespective of knowing what conversion goal is being measured and what is the kind of traffic being sent to the test page).

  8. I love that others share my frustration, it restores my confidence that im not just missing the point.

    My last test revealed a very unexpected win. I took a very text heavy page that was supposed to be a pricing breakdown. I ran some de-cluttered versions and thought I better just have a completely sparse version with literally just the prices and no info.

    Then I created a version with a clear call to action and a version with supporting information about next steps when you’ve chosen the right pricing model. I was really excited about the last one which seemed to be the solution to a lot of negative feedback my user testing had produced…

    I’m sure you can imagine what happened. I’m still scratching my head as to why the completely sparse version won with a 30% uplift – back to the drawing board!

  9. @Chris, great example of a ‘head scratcher’. 🙂 It’s worth spending the time trying to figure it out — otherwise you’re taking a major leap of faith if you simply try to apply the same visual design and copy elements to other pages. If you have the traffic to support it, I recommend running a multivariate test to attempt to deconstruct the results… so that you can learn from them.

    Lance

  10. @Chris: The first thing you need to do is to repeat the test. You have to convince yourself — and you can do this numerically — that the sample set of visitors of your test is representative from the population of your visitors as a whole. Or, more simply, did you just get a goofy mix of folks in the first test? One can’t really know this from just one test, though there are ways to sniff out some confidence levels.

    I’d suggest repeating with lesser traffic since at the end of the day, when you subject any visitors to a less optimized experience you’re costing yourself some money…what you’re looking to do is to see if the results are different, while costing yourself as little as possible while still getting meaningful results. It’s definitely a balancing act!

    Further, back to the issue of “are there rules for total amount of traffic for a test?” which was touched on earlier, I’d also comment that if someone had, say, 5000 visitors taking a test, I’d much rather see the results of 10 of the same tests of 500 visitors each, than one big test of 5000. The challenge with conversion rates that are low, is that you have to expose a larger number of people to the test to tease out insight into what are typically 1-2-3% conversion rates. This means the signal to noise ratio is rather low 🙁 , but the same techniques as in polling (“Obama 51%, Generic Republican 49%”) are useful.

    So the big challenge is first to “test your tests” by repeating them — because if you get a randomly skewed sample of visitors, it will completely throw off your interpretation of the test results. You’re aiming for Directionally Correct, not Metaphysical Certitude.

  11. Hi,

    This article really kinda makes testing look dumb and randomly directed. I’m testing in quite a few countries (we operate in 35) and trying 8-100,000 odd variables per test. I do directed tests with inputs from clicktale, usability research, web analytics, previous tests, copywriters etc. etc.

    In our case, only one experiment in the last 2 years has failed to give a positive result, and that was for one week only. A lot of this is down to test design but also because we use multi-variate testing.

    The problem with running multiple A/B tests at different *time* or *traffic* mixtures, is that it might completely change the outcome. You could, in theory, test all those headlines at different times and find completely baffling results.

    At least if you are doing multi-variate, you can then play with the variables and *how they interact*, at the same *time* with the *same traffic mix*.

    I note the other comments about sample size and you’ve hit this with your early ‘predictions’ that were premature. You need to get very high confidence levels, especially if the results in the A/B test are close. They’re going to be close because you’ve done simple variables without huge changes. Ergo, you’ve made it harder to ‘see’ what will ‘push’ the conversion rate.

    If you’re not quoting confidence levels and intervals, you’re not seeing how reliable the result might be – you need these figures as much as any lift figures. Also, if your business has a weekly or seasonal pattern, you need to test with one of these.

    Last but not least, watch the traffic mix. If this changes, so will the results.

    And remember to check the A/B results post the online funnel. What is the long term value of the customer across the lifetime?

  12. “People don’t read?” The fact that copy changes didn’t make a measurable difference in conversion doesn’t prove that. Maybe you just didn’t come up with motivating copy.

  13. If you sell products that are easy to price shop by a brand or a model number, your conversion results may get skewed by any promotions that competitors may launch or stop during your test.

  14. Following on the point from craig sullivan:
    “In our case, only one experiment in the last 2 years has failed to give a positive result, and that was for one week only.”

    the point about testing is not so much about just getting a positive or negative result,the end goal is testing your business hypothesis and at times its not just about “tweaking elements here and there” but what it is indicating other areas of your business which might need attention to (i.e. is there an issue with your product structure,customer service areas etc.)

Leave comment

Your email address will not be published. Required fields are marked *

Share
More from VWO on A/B Testing
A/B Testing Your Way Into A Great Product Launch

A/B Testing Your Way Into A Great Product Launch

Josh Ledgard is the co-founder of KickoffLabs. He knows a thing or two about what…

Read More
Paras Chopra

Paras Chopra

5 Min Read
How To Leverage Bad Test Results

How To Leverage Bad Test Results

In this exclusive interaction with VWO, Christopher Nolan shares his learnings borne out of failed…

Read More
Nida Zehra

Nida Zehra

8 Min Read
How to Convert User Feedback Into A Goldmine for A/B Testing

How to Convert User Feedback Into A Goldmine for A/B Testing

We have finally learned to mock HiPPOs, as they no longer can serve as a…

Read More
Sezgin Hergul

Sezgin Hergul

7 Min Read

Scale your A/B testing and experimentation with VWO.

Start Free Trial Request Demo
Invalid Email

DOWNLOAD A/B TESTING FREE E-BOOK