Unlocking Information from Inconclusive A/B Test Results

Transcription

Disclaimer- Please be aware that the content below is computer-generated, so kindly disregard any potential errors or shortcomings.

Vipul from VWO: Hi everyone. Thank you so much for joining in for this webinar. I hope you and your family are safe inside your respective homes, and I wish you all good health. My name is Vipul, and I am the marketing manager at VWO. ...

I’ll be your moderator for today.

For those who are hearing about VWO for the first time, VWO helps you identify leaks in your conversion funnel and provides tools to fix those leaks and keep your revenue growing. So before I introduce our guest for tonight’s session, let me inform you all that today is our 10th anniversary. Wingify, the parent company of VWO has turned 10 today. And that means VWO too has been serving online businesses for 10 years now. So it’s a happy birthday for us all, and we all, are excited today.

We all had fun today, although remotely, but we had great fun. This all has been possible because of your consistent support. And I request you all to keep supporting us as we move further in the journey of helping businesses build better online experiences. With that said, it’s my pleasure to introduce our guest speaker for tonight Kenya Davis.

She is the senior manager of decision science at Evolytics. She was previously leading conversion optimization programs at Lois, a Fortune 50 company. Hi, Kenya. Thank you so much for joining us today.

Kenya Davis:

I’m cool. Thank you so much. And happy 10th birthday.

Vipul:

Thank you so much. See you before I hand over the mic to Kenya, and start with the presentation, I just request every one of you ask your questions during the course of this presentation using the questions panel in GoToWebinars. And, we’ll try to take all your questions towards the end of the webinar. I have already seen Kenya’s presentation deck and there are a lot of insights, so I know you’ll be curious about a lot of things.

So do post questions whenever you have one. I think with that Kenya, let’s begin.

Kenya:

Alrighty. Thank you so much. And to everyone, thank you for attending, and welcome to my webinar on unpacking information for inconclusive results. I would like to first start with a little bit about myself, so we will, dive into that first. I have a background in Applied Physics.

A concentration in astrophysics. So I come from Discovery Place, Merced Line Loews, and I’m now at analytics. As a senior manager of decision science, that basically means that I developed these creative strategies for testing. And I dig into insights for optimization. So, it’s a breadth of knowledge in that field, but it’s very rewarding.

So I’m super excited to dive into how I’ve developed this step-by-step on unlocking inconclusive results. So, let’s dig into what you can expect to get from this presentation. First, I would say that I want you to be able to identify and label the types of inconclusive results, along with that, I want you to take away three steps and be able to apply those to some sort of checklist for yourself and your business and learn how to pivot the narrative of your roadmap. And lastly, how to relay the next steps to your leaders and your peers. So, why does any business test?

Sometimes we, the business make big investments under the impression that we really know our customers. With that knowledge, we tend to veer towards finding cost-effective ways of delivering high-performance, experiences with optimized business processes, product enhancements, and whatnot. So I’ll call that reduced cost. Next, I would say taking a risk is one of the higher rewards among businesses. So if you’re often that company that takes a risk, you’re expecting a lot from it as well, but you’re investing a lot as well.

For revenue, I’d say that’s probably arguably the top reason that we test. However, shifting revenue-driving factors to different channels can serve as a problematic task, so that can be resolved through testing. Customer satisfaction is probably the staple among all of these. And it can be what makes and breaks the company. And lastly, gaining insights is definitely something that’s a pillar within your post analyses, regardless of your testing reasons.

So now that we know, you know, why we test, let’s get into popular ways to test. So as many of you know, you probably have seen user research, observation tests, canary tests, and A/B and multi-variate tests. Just the level set though. Let’s briefly run through those meetings. Sometimes we need face-to-face or direct feedback from our customers.

So this can be done in many ways with the end goal being that you’ve walked away with one of those golden eggs or a few golden eggs versus a variety basket of, hard-boiled, colorful, or whatnot, however you wanna describe your features. Next, we can sometimes face deadlines or simply not enough foot traffic, to gain confident conclusions. So, the best way to look at that type of information is to run an observation test. Tracking and documenting these types of trends can be a very strong reward in the end. And it’s better to do that rather than run a weak stand-alone test.

Probably the most, favorite amongst engineers I would argue is the canary test. It’s never a good idea to just dump the code into production without extensive testing, so this type of performance testing can be one of the best forms in conjunction with many of the other types of testing. And then lastly, one of my favorites is A/B testing and multivariate testing. This is what we will focus on today. And this form of testing has skyrocketed to the top of digital efforts. As many of you have known, it has boosted these experimentation teams.

I’m sure you’ve heard the term CRO and whatnot. These types of testing methods are still something that’s being discovered. So it’s quite amazing to see how it’s impacting the marketing and product management worlds. So with each of these, basically, in a nutshell, you can always expect to get inconclusive results regardless of which form of testing you’re choosing. So in order for us to unpack, how for us to impact these results, we’ll need to look at a case study for today.

So this case study, I will let you know is called the Snugley Bear Company, It is not a real company, but this is a real case study with real facts. So we’ll start with identifying our company, and then we’ll define the 2020 goal. Next, we’ll call out the observations we know about our customers, and we’ll run through our roadmap, and narrow it down to about 60 days, from that 60-day scope, we’ll identify our first A/B test series and then we’ll run a test and lastly, we’ll actually unpack it in multiple ways, and we’ll jump into the results and have a guess at why it could have come back inconclusive. So, let’s run into some company specs. The snuggly bear company has about 50k products and that includes stuffed animals, customized stuffed animals, key chains, and some clothing materials for your bears. The top 5 selling categories are bears for tots, bears for bereavement, bears for love, big bears for big needs, and birthday bears.

The average order value is around $86.2 and some of the highest add-on items that we’ve noticed are special notes, gift wrapping, and clothing accessories. And lastly, the average purchase per year is around 4 orders per user. So now that we’ve identified our company, let’s go ahead and do some observations. So, what do we know about our customers? Clearly, we know quite a bit.

So, in conclusion of all of these insights, our customers return to us on multiple special occasions each year and typically purchase 1 to a few bears for a child or for a few children at a given time. However, there doesn’t seem to be value in signing in since guest checkout is exercised more than signed-in users. So, now that we’ve made this conclusion, let’s think of a practical goal. So, our company would like to focus on the customer loyalty program and they wanna launch this for, across the course of the 2020 year. We are basically wanting to go from this basic experience.

To this much stronger experience. But to do so, it will also take a lot of changes and as you may have guessed, testing and lots of different forms of testing to get to this look and feel. In addition to that, there may be new KPIs that we’ll discover along the way. So our custom our current customer account has a pretty basic look and feel. Our customer’s account for users is merely a place to like, and add basic information that lives in a very, very large table.

So let me show you the experience. It might be a little easier. So a customer comes to the website, they sign in and it just says hi. There’s nothing else and then the customer can shop. So, once they log in, say they drop down the carrots, then they can click my account, my bears, my list, and help.

The only thing that they’re able to do within these different fields is make alterations to the basic profile, look at the bears that they’ve purchased in the past, create and edit the lists, and indicate whether they want to bump or delete the lists. It doesn’t show them whether any of these bears are in stock anymore or not, and it’s kind of a bad look and feel, in terms of navigation. So, the last thing that they can do, is file a claim or talk to an agent. And that’s about it. So, looking at where we are now, where do we want to be? The new perks program is something that involves quite a few additions and with all of these additions, obviously, it’s not a roll completely out type of feature, but, we wanna offer the customers, you know, a step-by-step introduction into all of this.

So it’s important for us to outline the overall goal and address the areas that we know, need to be tested. So each of these areas could cause cataclysmic road map meltdowns. But still, I mean, we must dream. So for our parts program, we wanna offer a personalized look and feel to the customers, for the customers.

And eventually, with that, we’ll have them basically shopping more with us. And the more that they shop, then the more they’ll save. So, to get there, the roadmap as briefly mentioned before is the key. So, we’ll look at tiers 1, 2, 3, eventually. So, now that we have our 2020 goal defined, and you’ve been able to see what the overall goal is, let’s go ahead and look at our roadmap.

Beneath their goals, it will help to assign the types of testing that you believe can answer the appropriate questions. So, the current customer account may help to eliminate a few defects before diving into testing and creating a baseline performance dashboard or, use something that can allow you to compare as you move along your roadmap. And to do so, it’s recommended that you, do this as observations and or canary tests take those baselines, and use them to compare as we go along. So, if we look at our current customer account, it may help to eliminate those defects as we mentioned, but also we’ll want to, parse out how that experience can be done.

So we listed out quite a few things that we want in our end goal, but to get there, as I mentioned before, A/B testing and a series may be the key. So, we’ll add each step and then we’ll add what type of testing. So to keep our focus on just the 60 days as mentioned, we’re just gonna use this piece of the roadmap. And this branch, it may be 60 days or more, but of course, like I mentioned, you’re going step-by-step and you’re taking into account the the points at which you can see some type of major shift. So, now that the roadmap is actually created, let’s look at the 1st test series that we wanna focus on.

The roadmap basically can be created now it’s, created by the grouping of these questions, but when we’re looking at the test series, it’s important to parse those test series questions and 2 hypotheses as well. So, for the first test series, we’ll focus on revealing the new parts to a group of customers. So, here’s what we’ll include in the first test. So, we have the current, which is our control, and we’ll call that A, and then for B, this is our grouping of parts that we are gonna move forward with for our B variant.

So that will be free shipping, 5% off special promotions, shipment tracking and recommendations care sales. So let’s see how it will look for the customer when they’re entering into these tests. So A group will get the normal experience on the website and our B group as you can tell will get this new banner that calls out the new parts program and it’s calling for them a call to action for them to sign up. But not just you won’t just see this call to action at this point. If someone does not see this banner, whenever they click the login and or sign up, it will also trigger what we’ll call our pop-up experience.

So, whether you click on the banner, you click log in slash sign up in the top left corner, you’ll get this pop-up action that’s calling out these points that you’ll get whenever you sign up for the new perks program. And then our buttons below if you sign up, log in, upgrade, or the no thanks will give us some type of insights into the intention once the customer does get here and has seen what the upgrades entail. So the current account users will be measured based on upgrade rate with a few secondary KPIs like site conversion, average order value, revenue, time spent on-site, and equally, for the new users, they’ll be tracked on sign-up, click-through rate, and site conversion, AOV revenue, time spent, etcetera. So, now that we have the first test series altogether, let’s look at our first test. So, this is exciting.

I’m sure all of you have been here before. You’re running your first test and you believe everything is set up right. You have your KPI. You have your goals. It’s clear-cut.

The experience has been QAed and we are just ready to go. Alright, so the test is in flight. And we have our variant new parts program engine running. We have our control group running and we are excited and ready to go. We have some data filtering through.

However, we’re nearing completion and we noticed that the conversions are drastically different from the control experience. In a negative way. So they’re even worse than they were before. So then we do a little bit of digging around and we look at our control and we realize that it’s dropped more than 25% on average month over month. So, what on earth could be happening?

So, for context, this test was launched on February 25th and we’re gonna run this all the way through to April. So, although it may seem like the obvious thing is to chalk it up to COVID, probably the best thing to do is to be patient. Let’s think about this. Let’s unpack it in stages. So now we are at the part of unpacking our test results.

I will say that it’s better to make sure you’re ruling out everything before generally, saying why something is inconclusive because there are unforeseen things that you can run into, code changes, global changes, new customers, all of that in combination could be the reason as well. So, this 3-step process will be something that you can turn into an ongoing checklist even if it’s not for your inconclusive results. So as I mentioned, before we chalk up the inconclusive results to COVID, even with the impact dates lining up, it’s still important to rule out everything. So, Let’s quickly look at what we mean by inconclusive results. There could have been testing load errors, disproportionate samples, site outages, or just simply confidence, just wasn’t met.

So, we’ll want to really make sure that we’re attributing the issue appropriately. So as you all have been waiting for, let’s get into the step-by-step. As you all can see, there are 3 portions that we’ll want to look at. You’ll want to check yourself, start unpacking the data, and then pull out your tools. So, if I start with the check yourself, what does this really mean?

Check yourself. You can do this in any order as you please. This can be done by either checking your setup validating your original parameters or validating other test collisions if you’re not using swim lanes. So I don’t wanna simply put these points here and then have you try to interpret them. So I’d like to align these to some examples from our bear company.

So, what exactly would this look like? Let’s say that when we set up the test, it was set up here. Meaning, that the customer was exposed to the test only upon clicking log in or sign up. This would be done basically, with the test being set up at this layer of entry. So the customers that have already entered the site and then they are assigned to a test or not to a test via clicking log in or sign up. What we would have wanted was for this test to fire before the customers entered the site or at the point of entry.

And we’ll call that our A site versus our B site. So, the A site would have no banner, and no pop-up, and the B would have exposure of either clicking the sign-up and getting the pop-up or the login and sign-up and getting the pop-up. But no matter where they entered into the company site, this global banner would always show and it’ll always give them the opportunity to click, either the banner or the login or sign-up. So now that we’ve rolled that out, let’s move on to step 2, and let’s start packing. Now, I will say that a lot of times you can get hung up on the step 1.

If there are setup issues depending on what platform you’re using to run your test, there may need to be more, investigation into how you should be setting those up. So if you’re running a test for the first time, I would recommend starting with just an AA test so you can eliminate that part of the process or running into that type of problem. So, we’ll start unpacking. We can look at a few points here and as I said, it doesn’t have to be in any particular order but these actually can run concurrently in your investigation.

So data first would be, did the analytics fire? Identify is – how are the high and low-converting segments behaving? Next, we can look at the areas of the highest and lowest converting traffic or just the highest and lowest areas that have traffic flowing through the site. That could be channels that could be, paid channels or whatnot.

Site health. That is very imperative to a healthy testing environment. If you’re having site outages, that can greatly impact the loading capability of your test or the experience itself can look a little jumbled. And then lastly, the goal was your goal too large or was it too narrow? Did it match the hypothesis? Was the KPI matching?

That type of analysis isn’t necessarily changing what the KPI is or changing what the goal is. It’s assessing whether the test really answered that question. So like I did before, let’s run through some quick examples. So if we’re looking at data first, a good example of that with the bears would be the unique identifier for the tests that were not implemented by the developer. So users were qualifying from both the control and the variant sides of the test, which is not a good thing.

If we look at the identify section, then there were biases in the segments or users who already had an active account. Let’s say that they automatically went into the new parts roster and did not actually have to click the upgrade button. If we’re looking at location, Northern US states convert higher, and we know that they also were hit harder with COVID-19. If we look at site health, let’s say that the site was down from 3 AM to 8 AM for one of the peak traffic days. Or maybe for an entire week. It’d be very relevant to know this and understand why you may not have met your sample.

And then lastly, as we mentioned, the goal, the metric for, the test was set to fire, for site conversion instead of site sign or for sorry, not site, but customer sign up and upgrade rate sign up. So now that we’re moving on to the 3rd step, before I reveal these, I will say that, this level of unpacking is very valuable, but make sure that if you’re spending this much time unpacking and conclusive results, there’s truly something that will drive your road map into a huge fork in the road. And by unpacking this information, it means that you’re not going in blind, but you’re going in at least with trends or new information, that allows you to tell that story and that narrative of your customer’s behavior better. So going into our step 3, This is pulling out the tool. So, we all have a tool belt for A/B testing and that’s you know, your platform that you’re using, your team, the expertise of your team, the post and pre analyses, the monitoring during the test, all of these things are encompassing your tools, but there are also these other tools that help you to unpack things that are not visually apparent.

So that could be like a Chi-Square tool, a Z score, and really digging into that. Checking your confidence calculator, looking at multiple confidence calculators, validating the logic of the KPI chosen, so does it even define the goal? And then lastly, I would take this, very cautiously. So, external and third party data correlation analysis will have to really be from credible sources. There’s a lot of media that’s churning through multiple channels.

So if you’re going to pull that information to compare to your own data, your own test, be sure that you’re pulling in something that is very very credible. So, I would like to go into examples of these as well. For statistical tools, if using Chi-Square, you can reveal categorical relationships. That’s pretty much the strongest thing about it. There’s also flexibility in adjusting confidence and power levels along with examining your Z score.

If you’re looking at statistics and wanting to compare your models, I wouldn’t necessarily say questioning the confidence of the models is the best thing to do, but as a much larger scope, maybe having that conversation is good to have, at a scale for the entire testing program. And I would like to close that by saying trust your data teams and trust that they can empower you with that knowledge. And the tools needed to tell that story of, you know, which confidence models should be used. If your KPI is wrong in terms of how it’s defined when it fires you as the owner of analytical power should really think of redefining it.

You’ll be setting up the business and everybody else will fall in line. For success. So be sure that you’re taking that responsibility and ownership of what your KPI is and how you are telling the story of your customers. And lastly, our most cautious area when looking at external data from world events, much like COVID, you should be very careful of how it is integrated into the story. This is why it’s important to run out all of the reasons for your tests being inconclusive.

Also, your test could be inconclusive for multiple reasons as we mentioned before, so it could just be a combination. So it could be a setup error and external data. It could be that your customers are not able to access a certain portion of your site, which would have greatly affected, your ability to measure the sign up rate, especially if it was a high-traffic page along with external data and whatnot. So now that we understand how to impact the data, and have done so, should we pivot the narrative? Is it too soon?

Should it involve more people? There’s a lot to really understand here. So before we pivot, let’s look into why we would even pivot. So, here are some examples of why the Snugly Bear company would or should pivot. One test is not enough to pivot, but if the behavior needs of your customers have changed so dramatically, that the business is now making company-wide changes, then it’s a fair chance to alter your roadmap.

So I’ll go directly into this example. The Bear’s new account test was the 1st of its series and a great way to start identifying what’s valuable for our customers. Real situations though we’re hitting this company, like the bears arriving well after the patient has passed away. Or, they weren’t reaching the hospitals in time, or the hospital received them and they couldn’t actually give them to the patients because of social distancing. And although this is a very sad example, it is a very true one and very relevant.

And no matter how big or small your business is, we’ve noticed that they have all been impacted. So if I look at it, another reason to pivot in the form of the data or the KPI is not sophisticated enough. This could be found honestly during the step 2 process which reveals the lack of proper tagging or validations. It could have been caught there that the action that’s firing this KPI.

It’s not the actual action that you’re observing the original roadmap, it’s going off of the insights of what we have found for our customers. So sometimes using something like the Chi-Square tool can reveal those new groups or those not necessarily even new, but groups that you have not really identified by that deep dive analysis that’s needed. And then lastly, high-performing products, which we’ve all probably seen before. They could drastically change as the customer’s demands and needs change throughout their journey through the year or whatnot.

So something like high sales and Christmas and Valentine’s Day could peak to have the same value as that of COVID where we’re seeing an increase in the bereavement bears. So now that we understand why we would pivot, let’s look at one of the pieces of our roadmap. So before it was people more willing to sign up based on the new program and now it’s connected to deeper reasons for becoming a new customer, then observe the customer behavior for the next 60 days. Because we have a lot fluctuating.

It’s not just traffic, but there are feelings and emotions and clicks and anger and frustration and people are calling your call centers and they’re trying to figure out what to do here. So it’s okay to input something like emotions or behavior that’s very hard to articulate with KPIs, into your roadmap as well. And don’t be afraid to use a cluster of KPIs to tell that story, partner with your data team. As I mentioned before, they can really give you those insights into what pairings go well together to tell that story in that narrative. So information to leaders. We all know that they are very patient individuals.

But let’s face it. Time is money. And with that, we really want to make sure that we’re not wasting anyone’s time. So, depending on how much time you have to unpack your data upon learning about the test results being inconclusive, here are a few things that you should already start doing.

Define the time needed for the analysis. So depending on how difficult your test is to unpack, either communicate that’s 1 day, 2 days, 7 days, whatever’s needed, and I would definitely partner with the person who’s doing the deep analysis to make sure that they are giving you the proper timeline as well. Draft and send that right up to your stakeholders on the current situation, partner together, do stand-ups and send consistent communications along this process. So that’s like your day 1, your 0%. Halfway through your now these.

You should either know that it’s either something in step 1 or know that it’s something in step 2 and you’re getting to that, conclusion or you’re considering step 3. You’re considering really doing that deep dive. And by now, since you know this, you start looking at ‘Do I have a data problem’? ‘Do I have a business problem’? Send that to leaders and update them on the next steps and potential impact on your roadmap.

Sooner than later. And then upon completing this 3 step process, now that you’ve identified your problem, whether you have an answer or don’t have an answer, you still have to come out of it with some sort of action. So how will you do things differently and how will you get the support of your stakeholders? It’s imperative that you document this and make sure that it’s something that either is adopted into your digital community and is something that everyone is able to reference or, you can be that speaking voice of, you know, I thought this was a simple test and it turned out to be something that’s quite, complicated or complex. Then you can just pivot your roadmap to more attainable goals and communicate that.

So, the point is to stay ahead of the questions. Now, lastly, as promised, the example of communication to leaders, sure many of you on the line are not new to that. So, what I’ve found to be very successful is to not only tell them, what you found or what’s the things going wrong, but tell them the things going right. Tell them the learnings in a way that shifts the narrative into ‘Here’s how we were gonna do it and how we thought we would make a revenue impact, but here’s how we can do it differently’. And still make that same revenue impact.

So if I think of the callouts, I really wanna go through these points with you guys, if we’ve eliminated the setup and data issues using the step-by-step process, then, make sure those are also documented within the communication to say, hey, you know, we checked the setup and it passed. We checked the tagging and it passed. We checked the segments and it looks like you know, where the test was located, everything was passed. And then we dove into the deeper analysis tools, and they revealed this really new exciting thing. So if we pivot the roadmap this way, then we’ll still be able to get our answers, but we now are becoming more sensitive to the customer’s behaviors.

For the solution piece, as I mentioned, they don’t just wanna hear the problems. They wanna hear the solution as well. So leave out the unnecessary overly detailed A/B testing jargon. If you’re not sure what it means, don’t put it in there.

Found that can just stir up confusion and sometimes people get hung up on the little tiny numbers. The tiny numbers may not be as relevant to the story. So really focus on what your goal is and keep that as the jargon piece. So digging into our takeaways now, prepare your roadmap to align to attainable goals that allow time for proper testing. I know that we’re moving a million miles a minute, but there is value in allowing that cushion time when you do run into these inconclusive results.

Will you have time to dive into a deep analysis? Maybe lining up your tests and ways of if A wins versus B, then we’ll go on to C and if C versus D wins, then we’ll move on to E. The gaps in between those, make sure you’re giving that time to really digest what you’ve learned. And even more so than that, give the time for your data to really level out. Then if you do run into inconclusive results, follow the 3-step unpacking process in the order.

Document and share the learnings with your digital community. Make it a stand-up and call out what you found. Then that culture of inconclusive results is better received. And respect the knowledge of your data teams. They can help you target those big wins with the deep dive analysis.

Lastly, communicate with your leaders along the way. Don’t have them reaching out to you with questions. It is never a good thing to be in the principal’s office for something that you could have, addressed or something that you could have found out within some certain time frame. If you know 3 quarters of the way through your test, that’s you running into some inconclusive, roadblocks, go ahead and and share that narrative and say, hey, you know, I don’t think things are going right, so we’re gonna start unpacking our tests to figure out what’s going well, or what’s going wrong. And if something can be run, concurrently, that still answers another question that may not be related to the one that you’re trying to address now, then save the time. Go ahead and run the other one.

Make sure that you’re checking your setup and use some of those unpacking tools to make sure that the test that you’re running moving forward won’t run into the same issues. All in all, it’s a lot. I know, but don’t worry. Inconclusive results are normal and can be caused by literally anything. If every test came back conclusive whether accepting hypotheses or not, then scientists wouldn’t need so many testing cycles, so many testing animals, or hopefully not testing animals, but, the testing samples.

It will sometimes take honestly multiple tests and perspectives to really find an answer. Questions may not always just be that one-for-one approach. It could just be like 5 tests in combination that give you that story. And then lastly, always remember what is proven applies to what was observed during that time applying the same test to different users at a different time or even the same users at a different time can yield completely different results. So, every test has an expiration date.

If you’re not aware of your expiration date, simply grabbing the experience that you’ve run the test in adding that into your post-analysis will give a really good perspective. When I ran this test during this time, this action was relevant or this conversion, was predictable in this way because of these different parameters in the environment. But now that we’ve maybe shifted everything, these buttons don’t exist anymore. These colors aren’t there anymore. This flow of the process doesn’t exist anymore.

That’s very relevant when you’re looking at your results. And if you’re basing them off of an experience that technically is a phantom experience, then you may want to relook at the pre-analysis phase of your testing and your roadmap to really think about what’s needed for you to understand your baseline. Alright, so thank you all so much. I hope this is very very helpful. If you need any help at all with new testing and optimizing your business, tools, or help in any way, please reach out to me and the analytics team.

I’ve put a few links here so you can contact us. Here’s a website. The calculators I mentioned, Z score, Chi-Square, all of that. You can find it on the link of A/B testing calculators. It’s extremely interesting.

I would also recommend that you look back at some of your past test results and filter them through some of these different processes and see if you uncover anything. It could be really insightful. So I hope that was everything you were hoping for and back. Well, I would hope that you have some questions for me.

Vipul:

Yes, yes, I’m back, and, I was very closely listening to what you were mentioning in your presentation, and I really, really like, and I’m being very honest. I really like the clarity of thought therein Kenya because, I haven’t seen many presentations wherein, you know, sort of the structured approach to testing is discussed. And you included all those points in your presentation. So that was really, really, you know, a delight to listen. I would say thank you so much for that.

Kenya:

Thank you.

Vipul:

Great. Just building on this point, the question I had was, you know, structured approach to testing, not a lot of businesses follow it, right, but many businesses want to follow it. The reason that we at VWO recommended because it reduces the probability of, you know, running into inconclusive tests. So do you agree with this, or what point do you have, what thoughts do you have on this?

Kenya:

In regard to structured testing, I kind of chalk it up more to structured experimentation programs and less of like on the test itself because if you have those right checks and balances along the way, it honestly sets up every test that deploys up for success in terms of being conclusive. And you can even go into it knowing we may or may not get a result based on how we have our team set up. And to give you an example, I’ve worked with past clients where, you know, there’s a pre-analysis done and the pre-analysis includes checking tagging and checking, the flow of the customers and checking outside data that’s you know, not just your internal company’s opinions and data.

That alone has helped to kind of shave off test that works and don’t work or tests that are biased or tests that don’t really answer that question. And although that seems like a tedious process, and that is part of that structuredness that you’re speaking to. It’s something that really allows everyone to feel more empowered. And although it does kind of bring back that ability or kind of cut that ability to run a lot of tests. A lot of tests aren’t necessarily the best thing to do.

Quality versus quantity is always gonna be a winner, I think for anyone that’s in testing.

Vipul:

Right. So that’s a really important point. The evergreen debate of quality versus quantity has been an audience to a lot of webinars, and I am a participant in many discussions internally as well, wherein, some people are of the opinion that you should test more. Some people are of the opinion that, no, you should test less, but you should test the right things. So there’s always a debate around this.

That keeps the team’s morale high because everyone’s contributing and feeling powered, of course. And, that’s a great thing. Of course, you have to, I would suggest, as a business, one has to figure out for oneself what works best and then take a decision based on, you know, what the team agrees on. That would be a very diplomatic answer, I would say. But, yeah, you have to find for yourself what works for your good.

So there are a couple of questions and actually a lot. And I’ll start taking them 1 by 1. The first one I’ll take is from CUDA. Okay. So CUDA is referencing a test that they are running in their own organization.

So they’re asking if we have a test, and in this test variant B was winning but not statistically significant for 2 weeks, for example, 80%. Usually, we consider this as inconclusive. So in such a case, what would you suggest they do?

Kenya:

In that case, really there’s a lot to look at. If normally conclusive, results are around 90 to 95, for that page and that same KPI, then I wouldn’t necessarily use those results if it’s just stopping at 80%. I would look back at what variants I’m looking at was this distinguishability of it? And I’d honestly jump straight to step 3 I know I said don’t skip through, but if you know the setup and you know that the KPI is always the same and the page and location are the same, there’s clearly something else happening there. Now if it’s like you have 2 tests prior that reached 95 or 99, and this one has 80, you may altogether wanna look at what your level of confidence should be.

Vipul:

Right. Hope that answers your question, CUDA. I was actually reading another question, which is, again, a real example of the test that they must be running on their own website. It’s a bit long one.

Oh, it’s. I’m sorry if I’m pronouncing the name wrong Gerardo. He’s asking, I’m currently running a multivariate test and my main KPI is conversion rate. So let’s say that I’ve identified that I followed the three steps, and everything looks okay, and that the full test has run through the COVID season from March to May.

If my results still lack statistical confidence, between 83 to 85%, but results haven’t changed much during the last weeks. We haven’t identified patterns or commonalities in the winner variations. Should I stop testing or declare a winner now? Or should I, you know, keep running the test?

Kenya:

So there are a few ways to look at this one. And if I’m understanding it right, it’s running through COVID and it has consistently been between 83 and 85%. And the KPI’s conversion rate. Correct?

Vipul:

Sorry. Could you please repeat? I was actually reading another question.

Kenya:

Oh, no. You’re fine. I’m pretty sure I’ve got any saying once you get through the three steps. What should you do at that point? I would say that, As I mentioned before, the last one, if you’re always peeking at the the higher end of your confidence being around 90-95, break down that test.

So if it’s not giving you the answers, then maybe somewhere within that funnel there’s some type of variation happening along the way that’s causing some type of uncertainty. For the calculation of it, staying around 83 to 85. So I wouldn’t necessarily say turn it off if it’s possible to run it concurrently with, a much more specific test that’s slightly under the KPI of conversion rate, then that will give you a little insight, between the groups of at what point is it starting to really vary?

Vipul:

Right. That makes sense. Hope that answers your question, Gerard. And if it doesn’t, of course, feel free to reach out to Kenya directly after this webinar or email analytics.

Kenya:

Right? There’s a lot more time spent on a lot of these questions. So if, like you said, more questions.

Vipul:

I can see I can understand the concern behind this question and of course, it cannot be answered within a minute or 2. This needs a lot of explanation from both ends. So, yeah, do reach out to Kenya or the analytics directly and feel free to, you know, share your problems with them. Cool. Okay, I’ll pick this one.

The next question is from Carrie Wilkins. She’s asking, what is the most common reason for an inconclusive test, given the examples that you’ve listed?

Kenya:

I would say the most common that I have seen based on the ones that I’ve listed has been that the easiest and the worst thing is setting it up at the wrong point. So, firing it at the wrong time. As we went through that first step where it said the person set up the testifier login versus sign up versus the site level. What that means is that the customer, all the customers, you can call that the word I’m looking for, client-side version of testing. Sorry.

And if we look at the way that this test should have been set up because they’re getting 2 completely different sites, it should have been server-side. And I know that’s the argument of client versus server side, but really it’s a logical thing. Like, you wanna have both. There isn’t an either-or type of statement with it. It’s more of what gives you the ability to measure even site conversion at the end of those.

If you have one that’s their whole website is consistent. Their experience is consistent. There are sprinkled promotions in and out. There’s sprinkled this and that but they’re all tied to one unique program, then maybe those people need a completely different website experience than the ones that are not seeing any of that. And I’ve seen that it basically boils down to how should the test be set up, and I’ve seen that run for a week in arguments upon arguments, and it almost being settled by just setting it up in different ways that being an experiment of itself.

Vipul:

Right. I’m just looking at the time, and I have space for just one more question. So I actually wanted to know, since you’ve heard of, Lowe, and it’s a very big company, of course, needs no introduction. So I just wanted that if you could share a few examples, maybe without sharing any specifics of any inconclusive tests that you ran into while you were at Lowe.

Kenya:

There were many in the beginning. Right. The beginning as in, the beginning stages of the experimentation program. And I’ll keep it very broad because I don’t really wanna, you know, dish out in the specifics. But I would say when it came to areas where, you know, Lowe’s have a lot of foot traffic.

They have a lot of customers and customer types and they have the luxury of looking or running many tests at one time. Without the use of swim lanes, almost every test seemed to be inconclusive unless it was on homepage because the experience you’re able to get in and out of Lowe’s experiences so easily, and there are access points all over the website as extremely important to isolate those experiences. So that was, I would say, the highlight of inconclusive results if there were other impacting campaigns or impacting tests.

Vipul:

Okay. Yep. Thanks. I completely understand. It would be because it’s a really big company.

And we both wouldn’t want to run into issues, to really understand that. Perfect. So, yep, thank you so much Kenya for, 1st of all, you know, making this amazing insightful presentation, and sharing expertise with our audience today. I’m really, yeah, I’m really glad to have you, and I’m sure the audience must have loved every bit of it, whatever you shared. And of course, guys feel free to reach out to Kenya, if you have any questions.

Right? And, yeah, Kenya, would you want to direct them to an email address, or can we connect with you on LinkedIn maybe?

Kenya:

Yes. So they can contact me at, either LinkedIn or let’s see. Yep. That’s fine. Just connect through LinkedIn, and what I’ll do is I’ll pivot you through to the appropriate person within analytics depending on the question.

Vipul:

Sure. Just look out for Kenya Davis, you’ll find her definitely. Perfect. So thanks again, Kenya, and thank you everyone as well for attending this, session today. I hope you enjoyed it.

I hope you learned something new and interesting. Do fill in the survey that will turn up once this webinar is closed, your feedback will help us a lot going forward. And of course, do look out for our future webinars. We have another one coming up in the next 2 weeks, and it will be based on copywriting and using copywriting as a conversion influencer. The webinar will be hosted by Rishi Rawat.

He is the founder of Frictionless Commerce. So do check out, vwo.com/webinars, and you’ll get all the information there of all our past and upcoming webinars. And, yep, definitely, you will receive email notifications whenever we have new webinars in line, just like we did for Kenya’s webinar as well. Great. Have a great day, everyone, and Kenya, you too. Have a good day.

Kenya:

Awesome. Thank you. You too. Bye.

Follow us and stay on top of everything CRO

Unlocking Information from Inconclusive A/B Test Results

Key Takeaways

Summary of the session

Webinar Video

Webinar Deck

Top questions asked by the audience

What are your thoughts on structured testing?

If we have a test, and in this test variant B was winning but not statistically significant for 2 weeks, for example, 80%. Usually, we consider this as inconclusive. So in such a case, what would you suggest we do?

What is the most common reason for an inconclusive test, given the examples that you've listed?

Transcription

While we will deliver a demo that covers the entire VWO platform, please share a few details for us to personalize the demo for you.

Select the capabilities that you would like us to emphasise on during the demo.

Which of these sounds like you?

Please share the use cases, goals or needs that you are trying to solve.

Please provide your website URL or links to your application.