What to do About Inconclusive Experiments
Learn how to analyze tests and handle situations with inconclusive results.
Hello. Hi, Tien.
Welcome to Spotlight.
Hello, everyone. Thank you.
Awesome. Let me just quickly pull up the presentation, and I’ll jump off the stage and you can get started.
All right. So, hello, everyone, and thanks for having me. I’m excited to, to share my experience in CRO, and my name is Tien. I’m the head of digital marketing services at Mitego.
We are a global end to end digital agency that takes care of web e commerce solutions and CRO as well. And I have myself led many experimentation programs for our clients. And I’m here today to share about one of the most frustrating experience for me and for my clients, which is what to do about inconclusive experiments, right?
So you spend a lot of resources digging into the data, trying to find the problems and this is that can solve the problems and doing design mockups and the experiments. And then after a few weeks running, you find that the experiments are inconclusive. What to do about that? And I’ve talked to a lot of people within the industry and everyone share that they have at least 10 20 percent of their experiments actually end up being inconclusive, sometimes even nearly half.
So, , that’s why I thought to share this topic because it is a problem being commonly faced by a lot of people. So what defines an inconclusive experiment? , that would be when the performance of all variations are flat and show no statistically significant results for the primary metrics, as well as secondary metrics.
And I will assume that you’ve done the sample size calculation, allowed enough traffic. So I won’t be talking about the not big enough audience size problem in this presentation. Here are three things I would recommend to do next. First is to ensure that your hypothesis is strong and data driven. And that sounds obvious enough, but here I put a format of a strong hypothesis in my experience.
So it would go something like, Because we observed a and or feedback B. We believe that changing C for visitors D will make E happen. We’ll know this when we see F and obtain G. , and this contains all three key elements of a strong hypothesis, which are problems, solutions, and results, which are all measurable as well.
, so for the first element, which is the problem, you should make sure that the problem exists and matters to most users, and it’s backed by data. You could use qualitative data such as user surveys, user recordings, or any forms of user feedback that the customers are telling you that these are the problems.
They facing on your site, or you could use quantitative data such as Google analytics, statistics or hot jar or any similar data tracking system that you use for your site to validate the problems. If you haven’t done so, the problem may not exist because it’s only your assumption, or it could only occur to a small percentage of your user base.
And that caused the proposed solution to not have any impact. , therefore the experiment tend to be inconclusive. So that’s the very first step that you should do. The next one would be to make sure that the solution is the right one to solve the identified problem. So say that your problem is the filter usage on products listing page is lower than the industry benchmark.
Then your solution should be around things like location of the filter, where it is on the page, display of the filter, so the UX and the UI of the filter, the actual filter criteria that you’re displaying. But if your solution is to remove the hero banner or change the sorting order of products on the page, for example, they have very weak connection to the identify problem.
Therefore, it may not have any impact. You should also check to see if the low filter usage is only happening on mobile or it’s on desktop as well to propose the right solution. The third element would be the results. Have you used the right primary metric to measure the results? So in that filter experiment example that I just mentioned, if you use add to card as the primary metric, , you will likely get an inconclusive result, even though any retailers want to improve e commerce metrics like add to cart or purchase rate.
In this particular case, your primary metric should be around the filter usage rate. , and oftentimes you might have to prove the importance of improving a micro conversion, such as filter usage rate. So to tie this back to the first element, which is the problem, right? Like, how do you make sure that this is the right problem you set out to solve?
Then you can do things like to check on the stats, to see if people who use filter on products listing page are more likely to click on products or more likely to convert, for example, that way you can see the connection between the micro conversions of filter usage rates and the actual e commerce conversion rate that you are set to solve.
, so the second thing I would recommend to do is to examine the test results. A lot of people would just look at the test results to see which variant wins and then all the metrics and such. But I would recommend going a little bit deeper into the data by segmenting the audience. So to see if the test result differs when you segment the audience by criteria like device, location, traffic source, time of day, new versus return user, etc.
, some cases you may find that, uh, the results are different depending on the segment that you’re looking at. So that itself is a insight that you could apply to your site. In some other cases, you may find that you want to run follow up experiment, , maybe personal exper Personalized experiment, uh, based on certain device or based on certain search queries that people use to go to your site.
For example, the next thing you could do is to check all of the secondary metric as well. So for example, your test is set out to improve product click. However, when you look into other monitoring metrics, you find that the test didn’t make an impact on the product click, but it was able to increase user engagement, such as it makes people stay on your page a little bit longer, or it makes people bounce less.
Or it makes people click on other pages so that it increased number of pages per session That itself is an insight that you could use as well So be sure to check other monitoring metric as well beyond just your primary metrics You can also access to see if there’s any potential bias factor that could have happened during the time of running your experiment So if there was any seasonal campaigns of motion running or if there was any development changes that may impact the site performance or speed that could have skewed the test results when you were running the experiment.
Now that you’ve got all the data, , then it’s time to decide what the next step is. , if you find that the hypothesis was not strong enough or there were bias factors, Obviously, you should rerun the test. , one tip I have is that you could consider adding more variants to the test or make the differences between the existing variants more recognizable.
So one example I had was when we run an experiment for a client, we tested adding more product information and CTIs above the fold on the mobile layout of a product detail page would help increase the conversion rate. We reached about 12, 000 sessions, but the test was still inconclusive. So we decided to rerun the test.
Uh, we tweaked the variant B to add more margin between each element so that it’s readable because we added a lot of product descriptions information to the small space of the hero, , layout on the product detail page. , and then we added another variant, variant C. With a shorter product descriptions, uh, visualized with icons.
And then we still keep CTA above the fold. And the variant C, then one. , Scenario two that you could run into is that examining the test result may give you sufficient insights already. So like I mentioned earlier, sometimes segmenting the audience could give you sufficient insight already. Or in an example that we run for our clients, we did an A B experiment on the air conditioner products listing page.
Uh, we added highlight features and technologies that the products on top of the products listing sections on this page to see if that would help increase the product clicks. But the test result was also inclusive because Product click as the primary metric did not show any statistically significant difference.
, however, we drilled down into the data and we looked at other metrics such as number of transactions. So we didn’t think that adding something on products listing page at the very top of the funnel. Would impact number of transactions, but when we look deeper, the result was a significant uplift. Uh, people were not clicking on the products on this page because there was all these other content before the product listing section.
So they were clicking on these new content pages, but they ended up. Buying more of those products after checking out the product features, , and things like that. So if you dig deeper into the data itself, you may find something interesting already. And in the case that you can’t figure out why the test was inconclusive, then it’s time to move on with other experiment ideas or try personalized experiments.
This is something that we found in our personal experience with our clients seems to be a lot more productive compared to just the general experiments. And I put here a popular lift. model, a framework to figure out, to help you figure out what efforts, uh, what effects the changes in your experiment should have.
, it, it helps you come up with new ideas. It also can help with making sure that you have a variety of ideas and you touch on different aspects of e commerce, uh, Conversions drivers equally, so not always just, , using urgency as a driver for e commerce conversion by uplift, but also touch on things like value propositions, making sure that the content, the CTAs that you have on the page are really boosting that value propositions of your products and services, , or things like destruction, remove distractions on the page for the users or any barriers of anxiety that could cause the user to not purchase.
, so that’s some of the ideas that I would like. to share. , obviously for each case, it would be different. So here’s my contact info and I’m happy to chat further. If you have a particular case of an inconclusive experiment and you don’t know what to do about, be happy to help. Thank you very
Awesome. Awesome. Thank you so much for an insightful presentation here. Let me switch off this tab. Uh, inconclusive tests can definitely be a bit of headache for experimentation teams. Uh, really loved, uh, the steps and the examples that, , you know, TN. So we have five minutes, guys. If you have any questions,
I think there’s a comment here around, , the hypothesis template from Amanda.
So let Amanda, uh, any, any particular question on this particular slide that you requested, or is there an observation that you want to share? We have a few minutes. We can bring you on stage if you request to join. I see that you already have a raised request. Let me add you to the stage. Hi, Amanda.
Welcome to the spotlight. Hi. Thank you. Sorry, my camera is working. , but yeah, yeah, I sent it. I guess we we do a lot of testing on things that we kind of just know are our best practices. , and I guess the question would be around forming a hypothesis when, you know, obviously we have an idea of how something might change or the metrics might change based on the best practices that we, you know, might work.
But, , yeah, I guess just how would you sort of format something around. Uh, I guess something, you know, has worked in the past and you kind of wanna say, okay, we know it might work, but, , it also might not.
Yeah, yeah. So, I mean, a lot of times, , we could still use assumptions, right? So we try our best to, to remove that assumption bit our, uh, work.
Yeah. So the problem itself gotta come from a, you know, a backup by the data. So either something that the customer’s feedback or you see it on Google Analytics. You know, the drop off here is weird. It’s different. It’s lower than last year. It’s lower than that’s your debt benchmark or something like that.
But when it comes to the solution, it could have been based on best practices as well, and sometimes you do have to make the guesswork of, okay, this would make 10 percent impact to the conversion rates, for example, and you know, it’s a guesswork. And as, as with practice, you would get better at being accurate at forecasting how much it would impact, or at least to set a number where it’s reasonable enough, because it makes an impact.
That, uh, that makes the ROI work, uh, compared to all that efforts going into experimentation and conversion rate optimizations. But I would say with the problem always comes from a data back, uh, problem, but when it comes to solutions, it could adopt multiple formats. Okay,
sweet. Yeah. Uh, my, uh, colleague and I.
Gabriella, we’re, , loving this, this format. So yeah. Thank you. I appreciate that. Okay. And thank you so much for, uh, uh, creating this lovely presentation and insightful fun as well. Thank you so much.
Experimentation Consultant, Albert Heijn