Running better A/B tests: How to stop wasting time and money on bad tests

Transcription

Disclaimer- Please be aware that the content below is computer-generated, so kindly disregard any potential errors or shortcomings.

Jan Marks: Good afternoon, everybody. I wish you a wonderful day. Thanks for joining us today. It’s, a minute past the time that we have agreed to me, but I see people coming in. I’m really glad to see you. Come on in. ...

People have a seat. Get yourself a glass of water. Meanwhile, we’re gonna give you a minute more before heading into a very interesting and entertaining webinar with VWO. So let’s just wait 30 seconds more.

My name is Jan Marks. I’m heading the European and Latin America at VWO. I’m glad that you made it today, and thanks for signing up to join us. Today is a very, interesting, subject on the table today.

We are meeting people from Mammoth Growth. When I first met them, I said, why is it mammoth? It’s not because of such a hairy animal on the logo. It is because it’s actually a great company that, enables you to grow to a size that is really hard to imagine. So that is probably the origin of the name.

They are customer data experts, so they tackle really tackle experimentation and conversion optimization from the data side. They have created a lot of success stories at companies like Deliveroo, Dropbox, Tet, Calendly, really huge brands, really successful. And, this company with a Harry animal as its logo, is, being represented by somebody with, less hair, but, with a lot of brain, inside, we gonna welcome, Stuart Scott. Stuart, are you there?

Can you hear me?

Stuart Scott:

Yes, Jan. I’m here.

Jan:

Thank you. Thanks so much. Uh-uh. Apologies for the remark that I may have, but as a matter of fact, the Mammoth has a lot of hair. And, let me let me just introduce you briefly to the audience.

Stewart is addicted to marketing technology, data, and analytics, and that’s why he’s heading the team, I think at Mammoth Growth you have built successful businesses or you have enabled successful growth and models at existing companies That’s just amazing. Yesterday, I flipped again through your profile, but the work that caught my attention is the transformation you’ve gone through. I’m not talking about digital transformation. You kind of transformed from somebody who did outdoor sports and hiking and you were running and stating that you’re migrating to becoming a dad. And now you’re reading kids’ books and you’re building train sets and so on.

Congratulations on this challenging transformation. How is it going?

Stuart:

Oh, very good. Yeah. My kid is a very happy wee boy, but, I have a lot less time for for anything else nowadays.

Jan:

Okay. So thanks for making it, possible and, joining us here. And, I think we’re gonna see a lot of interesting stuff and stuff that makes me angry, like wasting time, wasting money, running bad tests, and so on. So I get to say more about that. And, I hope that in the future, we can avoid it. In many common projects. So over to you Stewart, the stage is yours.

Stuart:

Great. Thank you, Jan. So, where do we start? I’m gonna start with this slightly kind of provocative slide. Right?

So 85% of A/B tests fail and I think it’s widely kind of accepted that more A/B tests will feel. If they all win, then we’re not learning anything, are we? And, like, there’s no point running a test, right, you just have shipped the change. But it also means there’s a huge opportunity. Right?

So if we can increase the number of successful tests from 15% to 20% or even 30%. We can start to move much faster and improve our businesses and revenue much more quickly. So I guess why does it matter? The reason it matters is that the A/B tests are expensive. It takes a lot of energy from your team as an organization to build the variants to set up the experiment, debug it, and do some quality assurance.

And then whilst they’re tested live, it creates additional work for the organization. So you’ve got to maintain multiple versions of the website. You’ve got to keep your customer support team up to date and make sure that someone reaches out to you. You know which version they’re running.

Now all of this is more than worth it because you’re learning something important that moves your business forward. But we need to make sure that we are only running the tests that are delivering and not kind of wasting time. So how do we how do we improve our success rate and reduce the amount of wasted time? Well, the answer is data.

We as modern businesses collect huge amounts of data and lots of that data can tell us something about what might or might not work when we make a change. What I’m advocating here today is using the data that in most cases you already have to make sure that you are maximizing your chances of success every time.

If you can only launch a certain number each month you’re using those as effectively as possible. So I’m gonna start with an example here. We were working with a company here in the eCommerce business with a signup form, where they ask you to sign up with an email address or with a Facebook login. And they the other thing that’s important to know about this business is that they are reliant on email as a marketing channel. So most of them have a very high repeat rate, and most of that repeat rate is driven by emailing the customer with promotions or updates about the business, like new product launches. Thus, the email channel is one of the most important ways in which they drive around. Also, what they were seeing is that the Facebook sign-in option they offered was really popular. So they had a hypothesis for an A/B test.

If we add more sign-up options like Google sign-in, Twitter sign-in, or other social sign-ins, then we’ll be able to increase conversion rates because we’ll make it easier to register for the products. So they had a plan to A/B test by adding more sign-in options, which is a a great idea. But one thing we always advocate for before you start a test is that you look at the data and understand what the potential of that test is.

So we did some pretest analysis with them and we found a really surprising result. Users who sign up with Facebook place fewer orders over their lifetime and are less valuable than the users who sign up with an email address. That was a really surprising result for everyone because we thought the business thought they’d made life easier for people by letting them sign up with Facebook and that those users would be engaged. We did some follow-up analysis, and we found the big difference was the email open rates were much lower for Facebook users than they were for users who signed up with an email address.

At this point, I am thinking a little bit if I think about my kind of personal Facebook account. My Facebook account is still associated with an email address when I set that Facebook account up. Probably 15 or 20 years ago. I set that up in the early days of Facebook, and so I don’t even look at that email account anymore. I think what we were seeing is that lots of other users are in the same place, right, they still send into Facebook regularly, but they’re not using those email addresses that are associated with their Facebook account. And so, actually, this led us to a completely different plan.

Instead of A/B testing more social sign-in options, what we did was A/B testing hiding the Facebook sign-in option. Now, I like two things about this test. One is Given the data we had, it was much more likely to be successful. The second one was a much easier test to run because it’s much easier to hide a button than to build integrations with different third parties and maintain those integrations. Because if someone signed up with Google, I couldn’t delete the the Google integration next day I had to wait and and allow them to continue signing in that way until I could somehow migrate them to a different option. So we ended up with what was bought a much easier test to run and one that had a much higher chance of success. And the results were positive.

So while we did see a very small drop in conversion rate, we actually saw a big uplift in repeat orders, and that was kind of more than worth it and outweighed the impact on conversion rate. And so I think what we learned here is that our intuition isn’t always right, right? We all have opinions about the products or the websites that we work on. And opinions about what doesn’t work, what we should change, what we shouldn’t change, and that intuition isn’t always right.

And that’s why, and I guess going back to the first slide, that’s why so many A/B tests are unsuccessful, right? Because often we’re guessing and making assumptions about what will and won’t work, and that’s the reason we’re A/B testing because we need to test those assumptions. But often there’s a much faster way to test those assumptions than running an A/B test. A/B tests are great when we need to learn something new, develop a new data point, or prove something with a high degree of certainty. But, actually, often we already have some data that can provide us with great insights and allow us to ensure that we’re testing things that are going to move us forward. And I think that’s what this is an example of.

It wasn’t that an A/B test was still the right answer, but we had the wrong one to start with. So, yeah, by using data, we can get that faster. So that brings me to the next slide: How can we learn faster? And so if I think about the simplest A/B testing process that you, as an organization, can have, you essentially have two steps. It’s a hypothesis step where you sit down, do some user research, interview some users, and watch some session recordings.

Look at some heat maps, do some other analysis, and you come up with some ideas for how you can improve your experience, and your product, and ultimately, you can boost your revenue. Then you take those hypotheses and run some experiments. So most often, that’s probably an A/B test. Sometimes it might be some other sort of test. And you prove or disapprove of the hypothesis.

Now, as a starting point, this is great. You’re learning. You’re not just shipping on gut instinct. You are making informed decisions, but we can be much more efficient, right?

If we start to integrate other methods of learning into this process, what we advocate for is adding two additional steps. So after you’ve got your hypothesis, we’ve called it the validation and design step. It’s what we did in that first case I just talked about, looking at the data we have and how that data can give us more information about the hypothesis and the likely outcome of that test.”

And then I think the other thing, or another thing that’s important, is it can also help us prioritize these tests. There’s an infinite number of different tests we can run and not all of them have the same potential impact. Some of them are potentially more impactful than others. So by doing some sort of sizing and estimation, if this is successful, what can I achieve?

We can, again, make sure that we’re spending our time on the most valuable tests. And then I think the third thing that you can do here that’s useful is prototyping. Have one of your designers build a wireframe and an interactive mockup, and then actually take it out to other people in the company or users and see how they interact with it. Use that to iterate on the design and experiment. And then I think the other really important thing is that once you’ve run a test, all too often, we see companies stopping there.

Like, they take the results of the test, and that’s then done. So, again, it’s really important to take a pause after you run a test and ask whether the test was successful or not, what can we learn from it? Can we look at how different segments behaved?

Can we look at other metrics beyond the one we collected for statistical significance, or can we see other interesting behaviors in the data? Now, to go a bit deeper into a few of these elements, first of all, let’s talk about the hypothesis. It’s really important, in my opinion, that you’re clear about the hypothesis you are testing, right? It’s very easy to write down a large hypothesis, but the risk when you do that is it makes it too easy to have a discussion or a prioritization discussion based on opinion and not on facts and data.

So, by forcing people to use a fixed format for documenting hypotheses, and this is one that I quite like, it’s not the only one but a good starting point. If we say what are you going to do? So, if we make a specific change, and then because, and what’s the mechanism that you believe that change will happen through? This kind of forces people to document their assumptions and levels the playing field a bit.

It means that it’s not just the loudest voice in the room that’s going to win that discussion. It means you can go in and look at data and qualify some of these hypotheses, some of these assumptions, and actually test the assumptions in other ways and then confirm that it is actually the test you want to run or the hypothesis you want to test. Now, if we then talk about the second step in that flow, the validation and design phase, the question here is really, as much as anything, what’s the most valuable test that we can run? So we know we’re going to run a test.

Like, it’s almost, or it’s most of the time, going to be the outcome, but we want to make sure that the test is the most valuable one, that we’re not wasting time, and we’re not testing something that we can already understand the answer to from our existing data. So, really, we’re asking a whole range of questions here, right? But some of the most common ones are, like, how many users actually drop off at this stage in the funnel?

Is there enough dropping off that if we improve the conversion rate at this step, we’ll see a big enough impact down the funnel to be statistically significant and to improve our business overall? Will enough people interact with the change for it to be impactful? I’ve seen plenty of tests where someone changes an element on a page, and actually, the impact isn’t that significant because no one sees the element or the element isn’t very prominent on the page, and no one experiences the difference, right? Like, no one has a different experience, and so there’s no change in the business as a whole. We do need to change someone’s experience of the product or the site to improve our conversion rate significantly

And then also what other data do we have? So have we got similar features elsewhere in the product? Have we done something similar? In the past. Do we have qualitative data from user interviews or user testing around this stage in the funnel?

Jan:

I’m sorry to say something. We are dealing a lot with the challenge of prioritization and validation because we have many projects going on with a long list of potential tests and hypotheses. Often, it’s either the hip or the last idea that comes in being served first because it’s fresh in the mind, and so on. Besides all the points that you mentioned, we’ve noticed that determining how big the problem is vital. It’s crucial to have a benchmark.

So we always are being asked that. My drop rate here between step 1 and step 2 of the reservation process of the whole company is 30%. But is that good or bad? Maybe it’s fantastic. Maybe it is so that all the others have a 50% off rate that you don’t know.

So it’s not part of the problem. In order to really put your old data into context, it’s we try to bring in, and from other projects, we run and so on, some of the benchmarks there. Another thing is that the the gravity of the problem very often depends, on what I’ve seen on, let’s say, some partners have nothing to do with the website, which, for instance, has to do with the business logic. We were working with an airline, and, we noticed that on one route between London and Dubai. They had a huge problem.

They had a huge problem there, and they wanted to try to solve the problem of occupancy. So it was just a yield problem, which was not an online problem. They needed to focus on that to bring up the occupancy and have a better ratio in selling this particular roof. And so we tried a whole bunch of things that became prioritized because it was a new route that they built and so on.

So I think it’s very often to take one step back and look at the competitors. The second thing, you know, really is, think about the business, where is the big challenge in the business? If you have another example that comes to my mind is you have a new product that you wanna launch. It’s always suffering. A new product.

It’s less known. It’s hard to get it in. The content that you use to describe this new product is new. It’s untested. Right?

The new product being new is not getting a lot of attention. So there is a certain temptation that you don’t care about it’s. So let’s test something on the home page.

No. It is, that the the context is a launch. In this launch, you need to get quickly to have the most perfect presentation of your product you can. And I think that this 2 step back looking into the competition, looking into your business goals, and so on is always highly recommendable.

Stuart:

Yeah, I think that’s true. And I think every data project or conversion re-optimization project should really start with some clear business goals. It’s often the case that we’re stuck in the weeds, focusing on a problem we’ve been thinking about for years or our own pet problem that we’re passionate about without taking a step back and asking, ‘What is the business trying to achieve here?’ and ‘How can our work support that?’ This is something we try to instill in all our analysts at the beginning of every project because, if you don’t do it, it’s easy to do good work that’s not valuable.

Jan:

That’s right.

Stuart:

Keeping that connection to the value. So, moving on from there, I wanted to use a really simple example here, probably an overly simplistic one, to talk a bit about how we can validate our hypotheses and how we might select and prioritize which are the best hypotheses to work on.

Think about the most simple funnel you can have, really. Someone visits your website. They land on a landing page with a big call to action button. They click on that call to action button. They land on a second page with a sign-up form.

They fill out the sign-up form and sign up. If we split this into two steps in this funnel we can improve, right? We can improve the conversion from the landing page to clicking on the call to action, or we can improve the conversion rate from clicking on the call to action to completing the sign-up form. Which one of these we pick or think is most important to improve will really impact the type of experiment we might run.

At the top of the funnel, we’ve got a few different options, right? We can make it easier to click the CTA and make it more prominent.

We can make the button bigger, and make it a brighter color. We can increase trust, so we can add social proof, for example, logos of other customers or some of our partners, or we can do more to convince the user and clarify the value proposition. At the bottom of the funnel, there’s a similar set of things we can do, but just in different places. We can make it easier by making the sign-up form shorter, maybe relaxing restrictive validation rules on the name or email address. We can increase trust by adding padlocks, creating the notion of a very secure place to buy from. I’ve seen this work successfully quite a few times: we can delay the friction. Often, the part of a sign-up form where you see the highest drop-off, based on form analytics or multiple pages, is when customers hesitate to input their email addresses. This is frequently the first step in the sign-up funnel, and users aren’t that invested at that stage. Particularly in longer sign-up processes, asking for the email on the last page instead of the first can make users feel more invested by the time they’re asked for their email address, increasing the likelihood of completion. But these are two very different sets of ideas. The first question is, which category of ideas is going to be most valuable? This is where analytics and data start to become really impactful.

So, again, a really simple example, this is a screenshot from VW Insights, I think, where we have that same funnel we just talked about. From 28,100 users within the time period, have visited the landing page. Of those, 30% have gone on to click the call to action, and all of those, 70% have completed the sign-up form. Now, if we handle this data, I might have thought that we should be investing all our time in the sign-up form because that’s probably where we’re seeing a significant drop-off, likely a lot of friction there. But actually, looking at this example, I mean, I don’t know about you, but I’d be pretty happy if three-quarters of the people who see my sign-up form and are asked for lots of personal information are completing it.

So, in this case, we probably want to start at the top of the funnel and get more of those users who see the website to put their trust in us, be excited about the value proposition, and progress through to the sign-up form in the first place. Those top-of-funnel ideas are where we should start. The fastest way to do this analysis is typically using some sort of behavioral analytics tool. VWO Insights is one great example. There are lots of others out there.

Jan:

Oh, there are not so many. I mean, looking at anything else, right? Yeah, I am.

Stuart:

So, anyway, there are, yeah. These tools are designed to give you rapid user-centric analysis. They are really focused on the user journey. All the data they collect is user-centric and event-based.

They have out-of-the-box reports that are easy to configure for things like funnel drop-offs, conversion paths, heat maps (showing where users are interacting with the page), and some of these tools bundle session recordings (like VWO does now). Tying all these data points together and making the data accessible, even to non-technical users, allows for faster decision-making.

The nature of this reporting is closely tied to the types of questions you’ll be answering with an A/B test when trying to improve conversion rates. While it’s possible to do these things in a business intelligence environment using SQL, you can write many SQL queries to answer the same questions.

But typically, it’s much quicker to use a behavioral analysis tool. We would strongly advocate that if you’re investing in experimentation, you should also invest in rapid, non-technical analytics alongside that. Alright.

Just a quick note, I forgot something at the very beginning, Stuart. Sorry for that. Understanding user behavior just reminds me that we always love to understand all our… So, I forgot to invite everybody in the audience: Sadie, Rachel, Laura, York, Dennis, and many others. I can’t go through all of them, but please, ask questions. This is a great moment here.

The only thing you can’t do with the recording is ask questions, but you’ve got Stuart and myself here. We’re happy to answer them. You will see in your interface that I will monitor the questions coming in, and I think I will invite the person with the best questions. I’ll invite him or her to a pizza. Let’s see if that works to engage our audience. So whatever your questions are, please come up with them, and we will address them. I’ll interrupt Stuart when appropriate.

Yeah. Sounds great. So the next thing I’m going to talk about is using another case study. But once we’ve run this A/B test, I’m not going to talk today about the process of testing itself because, if you’ve joined this webinar, you’re probably already thinking about A/B testing, and you’re hopefully doing some already.

But one of the things that’s often missed is what else you can learn from the test beyond what you see in the A/B testing tool or the primary KPI that you use for that test. In most cases, it’s going to be conversion rate or some sort of revenue metric. This is an example, and the screenshot is not from the same company, but it’s an example of an e-commerce company that had a hypothesis.

Their hypothesis was that if they added an infinite scroll to their product listing page (instead of having 30 products on a page, allowing users to scroll infinitely through the product catalog), they would be able to increase their conversion rates because people would see more products and have more opportunities to find something inspiring and decide to buy. So, the A/B test that they conducted was pretty obvious from the hypothesis: they added an infinite scroll to the product listing page, and they ran the A/B test.

The result that came back surprised us all. The people who were given infinite scroll were spending less, and it was statistically significant. So, how could showing people more products lead them to spend less? Well, for that, I’ve gone back to this great graphic, but we used to do some post-test analysis. So we dug into the data.

We looked at what users were doing, and the thing we quickly saw was that while they were seeing more products on the product listing page, they were viewing fewer product pages, clicking through to the next stage less often. And that seemed to be the mechanism through which they were spending less money. Then we watched some session recordings. When we started watching session recordings, we saw that people who were viewing the product listing page with infinite scrolls were just scrolling much further before viewing a product page. Often, when we watched a session recording for someone who didn’t have infinite scroll, they would scroll down to the bottom and see their 25 or 30 products

Then they bounce back up and go back to something in the first couple of rows that had interested them. They click on it, go through it, and read the product page in detail. When we gave people infinite scroll, what happened was they were just scrolling for pages. It was almost like they were scrolling for hours, and we were shocked by how far they were scrolling and how much they were viewing.

Could it mean that once they had decided what to buy, sorry, once they decided, once they got bored scrolling, they couldn’t then find the thing they wanted? In these session recordings, we then saw people fly back to the page, trying to find the thing they liked earlier on and not being able to find it because they had scrolled too far. So, actually, while the A/B test itself wasn’t successful, we learned a huge amount from it.

So, we learned a lot about how users interact with the product listing page, and coming out of that, we were able to come up with some new hypotheses based on the data we gathered in that process analysis, laying the foundations for more successful testing later. The first alternative hypothesis we came up with was that too much selection can lead to indecision. One of the reasons this might have happened is that if we show people 5 things, they feel like they have to pick from the 5 things, and they’ll be quite decisive. But if we show people 5000 things, it’s then very hard to make a decision because they have too many things to compare and evaluate.

The second alternative hypothesis is that users who view more products spend more. On the original version with a fixed number of products on each page, users were clicking quite a few product pages, viewing the product pages, and looking at the products in detail. However, in the infinite scroll example, they saw a lot of product cards and scrolled past many of them without interacting or looking at the details of the product.

So both of these things take us to potential follow-up experiments. If we think that too much selection is the problem, we might test a smaller page size, only having 10 products on each page and forcing users to click a button to move to the next ten, making it much easier to compare and contrast those products, and adding additional comparative information on the listing page.

If we want to test the other hypothesis, we might do more to encourage users to click on the product cards, encourage them to view the full product page, or even take information away from the product page so that if they want to know more about the product, they’re forced to click through. Now we have concrete things we can test, backed by data to a much larger extent. Even though the test wasn’t successful, we’ve learned a lot from it. The key to post-test analysis is to really ask why.

Why did we see the result that we saw? Did enough users see and interact with the experiment? Did lots of users see it but not interact with it? Was that what we expected? Did behavior change at one particular stage of the funnel or did it change across the entire funnel? Did different segments of users behave differently? Did our best users love it while brand new users who don’t really know our product yet hate it? Was there a different experience between enterprise customers and small businesses? Asking all these questions can help us extract the maximum value and the maximum amount of learning from the experiment we’ve run.

The thing you hope you don’t find here is bugs in the tests. Far too many times, I’ve seen us get to this stage and look at the data and see that actually, no one saw the change. How did no one see the change? Well, there was some bug. So that’s why I think it is incredibly important to have a robust quality assurance process before you launch the test. It’s very easy to build a test and launch it without going through it in detail to make sure it’s all working exactly as it should. But if you don’t do that, you can waste a lot of time waiting for results to come in or need to relaunch the experiment. So please, make sure you’re doing that quality assurance upfront.

The final lesson or thing I’d like you to take away from today is that it’s really worth documenting what you find. You’ve invested a lot of time and energy into learning something or learning multiple things, hopefully. If you don’t document those things and those learnings and share them throughout the organization, you’ve wasted a lot of energy, and they may not persist over time, or other people in other parts of your organization might expend the same energy to learn the same thing. So make sure you have a process within your organization to document what you did, why you did it, and what you learned. We embed this in all our projects to make sure we document everything. Sometimes it may seem like it adds work at the beginning, but it’s more than worth it for the long-term benefit of maximizing the value of every experiment you’re running.

I suppose 33 real conclusions from all of this. The first one is to make sure they do the test that you’re running in the count. So make sure that if you’re gonna run a test, make sure you’ve looked at the data you already have, and make sure you’re confident that you’re going to learn something from it.

First, it doesn’t have to always win, but you have to always be learning. Second, I’ve alluded to this already, but make sure you’re using the data you already have in validating and improving the hypothesis and the design of the test before you actually launch into an A/B test. Lastly, even if an A/B test isn’t successful, you still need to look at the data and make sure you’re learning from it. Often, the most value you get from those tests is what you learn when you look at the data, watch the session recordings, and speak to some users who ran different variants. You understand why you saw the result you did. So that’s the last of my slides.

Jan:

Thank thank you so much, Stewart. That was really interesting. I think we can take away a number of things here. We can have, actually, thanks to Laura. We do have a question in the audience, and that was the question that I least hoped to find because I’m not a statistic expert, but I learned a couple of things.

I see if if we can answer it. So Laura’s asking is, do you have any insights on when to use Bayesian versus frequentist? And that exists. So it’s about the statistical model. You wanna go ahead with that, or you wanna make me to take it? Whatever you prefer.

Stuart:

I did a lot of statistics in my degree, but I haven’t done a huge amount since, at least not at that level of depth. So my general take is that frequentist statistics work really well when you hold certain assumptions true. One of the most important assumptions is that you only ever look at the test results once. There’s no peeking in a frequentist model. You have to wait until you have a fixed sample size upfront, and then you can look at the results. If you peek early and make decisions before reaching that sample size, the assumptions in the frequentist model break down, and there’s a risk of making poor decisions even if the frequentist calculation suggests statistical significance.

So, in many situations, people have moved toward Bayesian models because they allow for more flexibility in experimentation and can relax some of the strict rules associated with frequentist statistics.

Jan:

I don’t know. Stuart, as a matter of fact, I can tell that you have read much more about statistics than I have. What I did when I saw Laura’s questions, was immediately get back to Slack. And I slacked my CTO, Ankit Jain.

And I said, Ankit, I’ve got someone in the audience here who was asking me this question. What will be a good answer? It was my way of dealing with it, and I think it was a perfect way because I had immediately come back to me, and he said, well, first, both ways are correct. They work.

So it’s not that one is outdated and the other one’s just better. That’s not the case. So the second one that I took away from it was it depends very much on what you know better. So as long as you know, your methodology and your statistic model very well, you know, how to apply the data that it delivers to you.

And it’s good because you understand it, and you can work with it responsibly. Right? So there’s a whole bunch of things out there around this model. I think any extremist view on it makes you say that this is totally wrong. Right?

I would say, okay. Thank you very much. But, I’ve had a nice page that I just shared with the audience here, where someone has, well, there are a couple of videos that we explain the difference between the one on the other and so on and so forth. But I like the bottom line that you also kind of pointed out is that it depends very much on knowing it well and knowing how to use the data that they provide. And thank you, Laura, very much for this, really smart question, challenging question.

I hope for that. We kind of answer it there. We have a second one that comes, I think it comes. I’m not totally wrong. It comes from Dublin.

And it’s, Nicola was asking us, how would you advise early-stage startups that might not have enough data to reach statistically significant data to run an A/B test. Oh, I love that one. That’s a great one. Thank you, Nicola. You go first. Yeah. What is your answer today?

Stuart:

Yeah. So I think there are numerous alternative methods for gaining insights from data beyond A/B testing. While A/B testing is a powerful tool, it’s not the only one. So, if you genuinely don’t have any data, perhaps during a pre-launch or a soft launch with a small user base, you can still learn a lot by tracking user behavior on your website. This data also serves as a source when you start A/B testing, allowing you to generate ideas for future A/B tests and gain an understanding of real user journeys, not just the ones you hope users will follow.

Additionally, it’s possible to be clever in how you design A/B tests to work with smaller sample sizes than initially anticipated. For example, through careful experimental design, you can reduce the necessary sample size.

If your conversion rate is 50%, then if your conversion rate is 2%. And so picking something high up funnel where the conversion rate is as close to 50% as possible, will and so maybe you’re testing how many people click on that call to action, rather than how many people eventually sign up. Another thing to think about is that, like, it will usually be easier to test for conversion rate than it is to test for revenue or, like, some numerical metric. And then I think the final thing I’ll see, and then I’ll let Jan, Jan speak, is, that often, you can also see the impact of the tests that you’re doing should be bigger in an early-stage startup. Right?

If you haven’t spent much time developing your product yet and you’re a new business, you should be able to achieve a larger impact on the conversion rate with each test you run. This means you can work with a smaller sample size. You can also consider reducing the statistical power of the experiment. When aiming for substantial improvements, you may not require the same level of statistical power as you would for smaller, incremental changes.

Jan:

Yeah. Totally aligned, but that’s Stuart. I think I love the question because there are people out there, and they normally also come from the statistic-free side. So it’s not worth it. I can’t do it because I don’t have statistical significance.

That’s bullshit. I think if you have you were on the startup website that receives 100 visitors a month. Right? And, last month, you converted 2 people, and you have an idea to simplify their own package. And so on.

You change a couple of things. And so on and suddenly, months later after having done this, you have 5 conversions on that. Is that statistically significant? No. It does.

But it’s information And as an entrepreneur, you’re used to handling data. All data you can get your hands on, to make a smart decision. Right? You’re not statistically backed up at this moment, but you have information. For instance, if you do user research and you pick 5 randomly pick group people of your buyer personas and put them into a meeting room, and then you ask them questions.

Is that representative of all the users out there? No. It isn’t. It’s not representative. It is not statistically represented.

But still, you have a good conversation going on with all the people in the room. They tell you, no, I think the other way is too aggressive. And so everybody nods and says, yeah, I would also go with blue one. So we have information there. Is it representative? No. But it helps you make a decision. And this is a journey. You start with this kind of thing. You start asking your wife, and it’s like, you know, you think you should do it this way, that way, or your team.

Right? And then you get the people to use research, and then you do pick some smart test that you can actually run and even take decisions before you have reached statistical significance. You should not I mean, it’s tempting to keep it doing this way. You should struggle you should try to find the fastest way towards it, but in a startup business, it’s really hard to do it. Except if you have a new route of financing and you can spend some money on a massive campaign, and then you target it to a particular page where you wanna you always want to run this experiment, and then you do it because, for a week or so, you have a big bunch of people on that page.

That’s also a good one. Yeah. Stuart. Thank you very much. It was really fun.

Like always, and I hope that we soon have, the opportunity to work on some site. Oh, there is yeah. Thanks. Thanks. Never mind, Nikola.

So next time in Dublin, I’m gonna invite you for a pizza as promised. Right? So, when you come to Spain, then we can have a chat about that. Thank you, everybody, in the audience. Thanks for staying with us.

I see that hardly anybody has dropped off, which is always very good, and we’re happy about this. If you have any questions and would like to talk to either Stuart or myself or both of us. We’re really happy to look at your conversion rate optimization ideas or your conversion rate optimization program. We are happy to help you scale it up or build it up, and so on. I am a strong believer in one-on-one meetings, we’re ready to spend half an hour longer with you to have a look at it and have a chat about it.

My email address is here, it’s very easy. It’s the shortest one that you can ever get. It’s jan@vwo.com. Send it to me. I’ll share it with Stewart. And we get in touch with you to, discuss the matters that matter most to you. Stuart, thank you very much. Looking forward to seeing you again. And, thanks to everybody and see you soon.

Stuart:

Great to see you. Bye, everyone.

Jan:

Bye bye.

Follow us and stay on top of everything CRO

Running better A/B tests: How to stop wasting time and money on bad tests

Key Takeaways

Summary of the session

Webinar Video

Webinar Deck

Transcription

While we will deliver a demo that covers the entire VWO platform, please share a few details for us to personalize the demo for you.

Select the capabilities that you would like us to emphasise on during the demo.

Which of these sounds like you?

Please share the use cases, goals or needs that you are trying to solve.

Please provide your website URL or links to your application.