This website works best with JavaScript enabledLearn how to enable JavaScript

Back to all sessions

[Workshop] Psychology and Controversy - How to do A/B Testing the Right Way

Meet Oliver, Katja, and Ivan for a fun, deep dive into A/B testing's quirks, from bias hacks to guessing game winners and hot debates in CRO.

Transcript

[NOTE: This is a raw transcript and contains errors. Our team is working on editing the same. The final version of the transcript will be available soon.]

Welcome to Convex 2022. An annual conference on experimentation by VWO, a full funnel experimentation platform. Today, as you can see in front of you right now, we have a team of four people from LEAP, which is a digital agency based out of, uh, sorry, I forgot where exactly located. But of course, uh, I let, yeah, I let Oliver and the team describe where they are currently joining in from.

So I’ll give them their space to deduce themselves. But we have Vanessa, we have Ivan, we have Katya on the call, and they’ll be walking you through, uh, A workshop, which is titled psychology and controversy. How to do a B testing the right way. Quite an interesting title, by the way. So, uh, with that. And of course, this is going to be interactive.

So, uh, do send in your responses whenever you are asked to. It will really help in getting a pulse of where we are in terms of workshop. So with that, I’ll just jump off stage now and hand over the mic to Oliver and the team. Take it away. Perfect. Thank you so much. Good day. Good evening, everyone, wherever you are.

Um, as you just heard, we’re going to have a very interactive session for you guys with lots of Q and A’s and lots of Things where you can dive really deep into psychology and the techniques behind A B testing. Before we start with that, there’s a very quick three minute introduction into wipe psychology when it comes to conversion optimization.

That’s going to be my part. And then we’re going to hand over to the team. And let’s see, I’m trying to get to the next slide here. Perfect. As I say, a very quick introduction afterwards. Katja is going to take over, and you’re going to get to guess a bit, and you’re going to get to guess which heuristics go with which hypotheses, which is going to be quite fun.

Afterwards, Vanessa is going to walk you through a bunch of A B tests, and you’re going to guess which is the winning variant, and you’re also going to get to know why. And to cap things off, in the end, Ivan is going to talk about a bunch of controversies that surround A B testing. And it’s going to be quite a lively discussion, I assume, to figure out where everyone stands on these controversies.

So let’s get going and let’s get the non interactive part out of the way pretty quickly. That’s us. We are from LEAP. Um, LEAP is an agency based in Berlin, Germany. And the team is very, very skilled when it comes to lots of A B testing, lots of psychology. And let’s get going. Why do we use psychology? It all started A couple of years ago, when many people in our agency read the Nobel Prize winning bestseller by Daniel Kahneman, Thinking Fast and Slow.

And for those of you who haven’t read that book, Thinking Fast and Slow details how humans think in two systems. System one and the next slide that I’m desperately waiting for, but someone else seems to be using the slides as well. There, there it was. So system one is the system we’re spending 95 percent of our thinking time in.

It’s a subconscious, it’s intuitive, it’s fast, it’s instinctive. So that tells us that the majority of our decisions are not made rationally and logically and system number two, but very, very intuitively. And not very rationally at all, which is quite shocking because many people would call themselves very rational beings, but that’s something we more tell ourselves than we actually are.

And that happens mainly because our brain is inherently lazy and tries to avoid effort at all costs. So it has come up with many, many fast ways to come to conclusions and come to decisions without having to think too hard or too thorough. Um, an example for you guys out of everyday life. If you walk down the street.

Or drive down the street and you see a red traffic light, you are not gonna spend 20 minutes thinking about what you’re going to do next. You’re simply going to pause and stop. But on the other hand, if you want to buy a car and you’re not one of the very few people who’s way too rich. To ever consider anything that has to do with money.

You’re going to think very hard about this process. You’re going to compare cars. You’re going to compare types of cars. Types of fuels you’re going to put in your car. You’re going to compare different financing options, different banks, different credit models. So naturally you have to think very, very hard and thorough.

And for that, you need system two. And system number two is very, very exhausting for your brain. So in many cases, you might not come to a decision at all. You might just give up beforehand and in order to help people come to good decisions and come to good decisions fast, we can use many different psychological systems.

One of them is nudging. Nudging is by Richard Thaler and Cass Sunstein and Richard Thaler got a Nobel prize as well in 2017 for this concept. The concept in a nutshell tells us that every decision is based in a decision frame that surrounds the decision. And We can use psychological guardrails to help users make the right decisions based on those decision frames.

And amongst many other things, that’s can use to create converse with it upwards as an example from everyday life. Back in the day, when you went to the bank to withdraw some money from the ATM, you would enter your card, enter your pin, receive your money. And because your brain was there in system one and your brain said, Hey, I’m here to get some money.

I have my money. I’m going to go. Now you forget your card in the machine. And these days, banks have changed the decision architecture in this case. And these days, you can only receive your money after you’ve removed your card, which is a very, very simple change, but it’s a change that helps us, the users in this case, to avoid mistakes.

And that’s something that’s very, very important to us. We want to use the insights of Carla Mantala and many, many others to reduce barriers, reduce mistakes. and increase the motivation of users to buy. So we want to make life easier for them in our case mostly on websites and in order to do that We need smart hypotheses and some of these hypotheses katya has With her today, and she’s going to show you some of them and you’re going to have to guess which psychological biases are behind those So yeah, we’re going to have a little fun Depending on your experience.

Maybe this will not be new to you. Maybe you will learn something but all in all it’s It’s a good practice since, uh, yeah, it’s important to, if you want to learn sustainably, you know, from your data, from experimentation, it’s good to have hypotheses that you either disprove or find evidence towards. So you can learn about your target group and yeah, just, uh, learn and learn, uh, and not just test.

And, uh, if you have negative, uh, um, results that you’re also, uh, happy and can learn from those. Um, so yeah, what we’re going to do, we’re going to start here with the first example. Um, we basically always have a hypothesis on the left. And what you can now do is we have, um, heuristics and psychological effects on the right.

And now if I read it, you then, um, uh, uh, Put it in the according number that you think is the according psychological effect. So what we have as a hypothesis is if the info products are not reserved is displayed in a shopping cart, users will more likely start the checkout process because And now, um, yes, you can input the number that you think.

Are you receiving numbers, Ollie? Yes. It looks like we are getting responses. Okay. If you’ve gotten only one response as far from Ali and she says six, but I encourage other people in the audience list as well to send the response for this, for this question that is in front of you right now. Yes. Okay.

I’m not seeing the responses. So it might make sense if you just take over and say them every time because I can’t see them. Well, I mean, you can also just write it down for yourself and then, you know, with the honor system, see if you were right or not. This is just a nice little practice. And we just want to show also know that it really is helpful to build these exact hypotheses.

So, you know, it’s like we heard before, not just. Yes. Are you not able to view the questions box now? Um, not me, but it, I mean, I, I can view the question. I, I can view the questions box, but I don’t see any results coming in there. Oh, okay. That’s strange. So we have gotten four responses as far, Ali says six.

Nick says five, Alan says six, and Zahra also says six. So the majority wins. Um, yeah, so it’s loss aversion, which is a nice effect that we like to try. And of course, it also shows why it’s important to test these effects because it’s not one size fits all. Depending on your target group, you know, an effect can work or not work.

And I think we will have an example by Vanessa later on that topic as well. Um, so, yeah, this is one of those hypotheses on loss aversion. So if we go to the next example, uh, so the second one is if users are given the positive feedback, great choice after selecting a product, then users will more likely start the checkout process because

And then now you can input which, uh, heuristic you think is the according one. Yeah, I’ve got two people answering in three, Alexander and Ali. Yes. And almost every one of them has actually answered in three only. Alan, Nick, Kirsten, Zahra. Perfect. Yes, that is correct. So yeah, people are more motivated when receiving positive feedback, positive reinforcement.

So if we go to the next one, we have an interesting effect here. If a call to action button is positioned along the view of a person on a picture, so we see a picture of a person looking, you know, towards the button, then users will more likely click the call to action because Now you may vote again. So we have seven, seven, seven, seven.

Yes. Okay. I think we have a group of experienced people, but still, I hope you’ll see it as a nice exercise to, you know, keep fit in, in building hypotheses and having those effects in, in mind. So yeah, because users attention was guided towards their desired attention, action, um, with the effect of gaze cueing.

So we have a comment from Ali, by the way. Yes. Ali is saying that it’s a very helpful exercise. It’s a very helpful exercise. Great. I’m glad to hear.

Good. Uh, yeah, let’s continue. Um, okay. Okay. Um, just one little input. Um, the questions box is finally working for me now, so I can also take over if you want to go. Please go ahead. Okay, great. So we’re at the next one. Um, if feedback of others of other users is displayed on a product page, then users will more likely add the product to their cart because We got lots of nines.

Very good. Very good. Everybody has been doing their homework. I’m happy to hear. They would like to have a 1, 000 euro prize, too. Sorry, no prizes today. Just the fun of knowing that you all have the experience. Um, yeah, people. Yeah, this one. The next question on the next question before you reveal the actual answer.

Let’s try to get someone on the audio to share their reason for choosing that answer. Sure, let’s, let’s do that for the next one. Um, I, I don’t remember which was the next one, but we’ll see. Okay, so yeah, this, uh, social proof, um, it’s very important for people to, um, Yeah, know what, or to, to evaluate their own behavior.

They look at the behavior of other people. So, oh, yeah, this is a interesting one. Um, so if the initial price of a product is positioned before a reduced price of the same product, so, uh, then users are more likely to add the product to their cart because, yes, happy to take, um, um, um, an answer in their cart.

Thank you. Uh, with a voice. We’ve gotten to ten fours. Uh, let me know quickly. Anna, Alexander, Ali, Hamad, who would like to come up on the audio and share your reasoning for inputting this answer. Nick, just drop in your interest in the questions panel itself and I’ll unmute you. Okay. Hamad is ready to speak up with his reasons.

Another one. Please go ahead. Hi, everyone. Yes. Um, yeah, I think it’s a decoy. Um, you know, I chose decoy effect because, um, you know, they think they’re getting an offer because they feel they’re getting the value for their money. So it’s more like a decoy, uh, you know, tactics. So we’re converting, you know, customers.

That is actually a really interesting discussion because it is actually not decoy, decoy is, um, the, I mean, we’ll get to it later, is the inversely, um, uh, how you, if you add another option to two options. Um, taking it away already, uh, that it makes it more likely to select those. Uh, so what we actually have here, but I, it would be interesting to look at that more closely, whether this is not similar effects that work together here, but the one that we used, um, is the anchoring effect that basically says, um, that the first value that you perceive is the one that you set it as an anchor.

So the original price, you know, before being on sale. Is then allows you to perceive the, um, uh, the product on sale as a more advantageous. So, um, this is the, the reasoning we said here, but yeah, that would be a second typo hypothesis to put it, um, on the topic of decoy effect.

Let’s go to the next one. Yeah, please proceed to the next one. Gotcha. Okay. So if, uh, the mission of a charity is conveyed with emotional messages and images, so for, yeah. Some, um. A charity, you know, without not just having texts, but actually, you know, using emotional language and pictures, then users are more likely to donate.

That seems to be the hardest one yet. We have someone saying 1, we have someone saying 8. Mm hmm. And that’s pretty much it so far. You would like to come up on the audio and share your reasoning?

Let me unmute Ali here. I would love to hear your reason, Ali, definitely. Uh, she said no. That’s fine, it doesn’t matter. Oh, okay, sorry. I’ll unmute you again. That’s okay. Wanted to tell their answer. Yeah, we have a bunch of people saying eight now, but no one wanting to speak up yet. So let me see if Alexander would want to speak.

No, thanks. I would say Katja, just go ahead. Okay, go to the solution. So I, I get the reason why, because basically it’s, um, you’re saying that, okay, we’re, we’re giving them reasons why they should donate. And I guess that that depending on what you actually change in your test. That could apply to that as well.

What we were saying here is basically that we’re, um, um, appealing to the emotional, uh, decision making. And as Oli said before, on the topic of, you know, uh, going at the system one, um, and, uh, uh, the so called effect, effect heuristic, basically appealing to, uh, emotions.

Okay, let’s go to the next one, uh, for left. So, uh, this one, maybe, uh, everybody will be a little more, uh, on the same side. But, um, so the hypothesis is if in a list of benefits, the most important ones are positioned first and last, then the most important benefits will more likely be remembered by users.

Because

yes, lots of tens. Yeah, I was expecting that. Okay, good. Um, I’m just going to jump to the answer because I know that the other two parts are I don’t want to take all the speaking time of Vanessa and Ivan. Yes, because yeah, people tend to recall the first and last items of a series best and is the so called pretty old actually primacy recency effect.

So already it found by Ebbinghaus, uh, 18. 50. Okay, next one. Um, so yeah, I’m not gonna say it. I’m just gonna read it. Um, if users are told why they need a case for their cell phone, then users will more likely buy a phone case because

Lots of eights. Yes. So yeah, this is the one hypothesis we put into that heuristic. But yeah, of course, always also all tests, it’s not always easy to just get one heuristic. So it is important to look at what kind of effects can you have on the user’s That can change how they act.

So yeah, people are more willing to act if they’re giving a reason to do so. Okay. Two left. So, um, here is the example. If a third, more costly option for an insurance is offered, then users will more likely select the middle option because.

Yeah, you already gave it away a bit early and everyone knows, so everyone’s number four. Okay, so I think we can, oops, let’s go on back. So yeah, I already said it. It’s the asymmetry. Asymmetry. Sorry, it’s getting late. Um, uh, dominant, um, yeah, aligned, uh, options, uh, allow for, uh, uh, the middle option being more advantageous.

So, let’s get to the last one. Of course, we’ve used all the other ones, so, uh, uh, it’s just endowed progress that is left, but let’s go through the hypothesis. If a progress bar is added to a configuration funnel, then users will more likely complete the configuration because people provided with artificial advancement toward a goal exhibit greater persistence towards reaching said goal.

So it is, yeah, it increases their motivation and they will more likely. go through the whole funnel. Okay, I hope this was a little fun. I hope you either learned a little or just had a little more practice on things you knew already. And I will now give the microphone to Vanessa. So thanks. Um, I hope that I will, uh, have a good time here with the, um, with all of you.

Thank you for being here. I see that the numbers are dropping. We have 58 people now. stay a little bit longer. You can do it. We can all do it. We will have fun now. Um, I plan to speak for 15 minutes. So if I’m taking too long, feel free to cut me off. I can like skip some of the tests that I brought to you.

Um, I will give you very limited time to, to vote. I will count down from five to zero. And if, but if you want to ask something, if you want to contribute anything, uh, feel free to, to let Ollie know in the chat and then we, yeah, can unmute you and you can, uh, Talk to us. And what I did today is actually I, um, brought some of the IB tests we did recently.

I, my mouse is not working. Who is having the mouse? Nobody. I should be the person that is, wait, I clicked already three times. Oh no, no, something happened. Okay. The first test, um, that we are doing is a loss aversion test. Um, I still don’t see, um, I don’t know. I tried to move to the next page. I can also just do it if you want to.

Can you do it? Do you have control? Thank you. That’s perfect. So, what is really interesting here, Katja already told us about loss aversion. You all know what it is. And this is actually something that a lot of shops do. And what we as an agency can do is we can run the same test for several target audiences for several pages.

So we use one hypothesis for two shops. Two shops that sell clothing. Both are German shops. One is, uh, selling the kind of underwear and sleepwear that your parents might buy. And the other is, uh, wearing, uh, is selling streetwear and sneakers. And, uh, yeah, has a lot of fans in the graffiti scene, so. A little bit of a younger crowd here, and we actually did the same thing in both shops.

We provided a message. Articles in your card are not reserved on the left. You have the slightly more conservative shop on the right. The shop number two is the one with the street where and our hypothesis was if uses a signal that their selected products are limited, then they are more likely to complete the purchase because they want to avoid the loss of the potential purchase, which is So now you have 5 seconds to vote whether shop 1 won, shop 2, none of them, or both.

Did you get some answers? Like, 5, 4, 3, 2, 1. Did we have some answers, Ali? We had some answers, none at all. We have the majority saying number two, um, and just one quick info for Denise, um, exactly where you just entered the question, you could cast your vote. Okay, great. So you voted to let’s look at the results.

Yeah, let’s look, go one slide further. I had this plan out. Yes. So you were right. Actually, it’s a shop number two where we have like insecure young people there. We we managed to get more of them to buy when we told them that their products are like limited, like available in a limited capacity. Um, For the conservative audience, this hypothesis did not work out.

So, this is something that proves that it’s really important to test things, even though they are sometimes viewed as best practices, and are done by a lot of shops, by a lot of websites, without even testing them. Um, we are, every test that we do, we are somehow convinced that it might work, um, but there are always negative results as well, and this makes it super important to test.

Okay, let’s continue to the next test, which is also great fun. Um, there we looked at a login page. Um, this is, uh, there we had the situation that we, um, there was the registration button is not very well visible on this page. You see the login form and on the top of the On the top of the page, below that there is the guest CTA.

Ich bin neu hier translates to I’m new here. And then there the CTA takes order as a guest. The Amazon Express checkout as well as the sign up option are not visible in the first few here. So, we saw that a lot of people did not continue here. So we decided, okay, let’s show them all of their options in a way that they have a good overview of their options and they can, um, more easily decide on which way they want to continue.

And our hypothesis was if the different options are easier to find, then users will more likely continue and make the purchase because their preferred options are perceived faster and the cognitive effort is reduced. So let’s view the, uh, our variations. So, uh, variation one we see, um, the, I’m new here.

Option is moved to the top. The up the over the, the highest CTA is order as a guest. The below that is, uh, sign up now, uh, in a, in a secondary CTA. Below that we have, um, I’m already a customer. The form for the lock in lock in CTA and down there you have the Amazon Pay, uh, express checkout. And in variation two, we have like two, uh, two cards, um, and you can choose between I’m new here and, uh, log in.

If I’m new here is open, then you see, uh, order as a guest or, um, sign up now together with benefits of creating an account. Below it is, there’s always the express checkout. If you switch to, uh, log in. Then you will have the open login form, um, visible. Um, so you can vote now who won, uh, the original, the variation one, the variation two original is zero.

So I give you five seconds again, five, four, three, two, and one.

Okay. Um, this time we have a divided audience pretty much the same for one and two. And Denise is asking the question of success is defined as conversion for all user types, existing and new customers. Yes, our primary primary goal was sales here, but we also looked at the other at the other conversions.

So the primary goal was to increase sales because we do not want to impact sales in a negative way. But we also tracked the user behavior because we also. would like wanted to erase registration as well if we could. And so this was something that we also tried to do. And if you look at the next slide, um, we see that variation one, uh, even though sale was not significantly better, led to 58 percent more registrations in the test variation, which is good.

We also saw, um, for variation two, which was actually our winner. So everybody that voted two, you were right. Um, um, we have even more registrations here and we have even, uh, an uplift on sales. So, um, also it shows you can, there are several ways to solve a problem. Some might work better, some might also work good, but, um, there’s some might also still be room for improvement.

So let’s continue to the next test. This is a test that is one of my first tests actually, um, it’s a bit more complicated, like it’s actually not that complicated. Um, I had a, um, a shop, uh, that was a, is still a customer of mine and they had a very, very messy, uh, category page. All of the product listing pages looked very untidy, very messy and chaotic.

And a lot of users entered on these kind of product listing pages. So, um, I thought, okay, um, we, it is not really clear which element belongs to which product because the CTAs are so far down below. Um, they are also, uh, it also does not look like a trustworthy shop. And, uh, we are, we have to, we will try to, to give this page a proper structure, to give the product listings a proper structure all over the whole website so that they are easy, more easy to be perceived and also that they appear more trustworthy and like a proper online shop.

And this is what we did. And on the next page, I can show you the, um, the 2D, the design. And you can also just skip to the next one there. I have like a close up so that we can see it better. Great. So we see that in the original on the left, um, we have a dropdown where, where it says Auswahl, which is where you can, um, select multiple options.

If you choose a product variation here, for example, blue or green or whatever, um, then, um, then, uh, you, you, the, uh, so the buy CTA appears. So, um, we, because Yeah, we were not really, we changed this, um, and we pre selected an option so that there is always a CTA here. We try to give it a proper structure, we put a border around it so that it is clear that, um, what belongs to which, to which.

Uh, which product and we try to make the product more the whole shop look a bit more serious and everything the information to be easier to be perceived. So who do you think won the original or the variation one?

Do we have votes Oli? Pretty much everyone’s saying yes, everyone’s saying variation one. Okay, let’s move to the next page. And this was, is why this is my favorite test. Um, and this is why I’m showing it to you, even though it is two years old. Um, because we had a positive effect on, uh, the Add to Cart. Um, we had 2.

2 percent more Add to Carts, which is actually not as much as I had expected from that. But we had a sick, like not, not a significant downcast on sale, but it was quite significant. And we also did this test on mobile and there it also did not really work out. Um, so, um, and, and I also saw like a very negative tendency through the whole funnel and I was, I thought, okay, why can that be?

And on the next slide, um, I can show you some data and, um, I shot, I thought, okay, there are several possibilities where you can do an You can do it at your cart. In general that we have this 2.2% uplift. You see that in the, in the column improvement rate, and this was significantly positive. You see there are product listings on the, uh, search result page, on the category page.

And the, the changed, uh, pro product listings are also, um, used for the records on the product, product page. And there. Everywhere, we have very, very strong uplifts from like almost 50 percent on the search result page to the 31 percent more add to cards for the records. But we also see that we have minus 10 percent on the regular add to card on the product detail page.

So that means, okay, we have a, we have a shift here. And I thought, okay, probably then we have just a lot of, um, add to cards less on the product detail page. So let’s look on the absolute numbers, which I also brought to you. But if we do like a slight little calculation and the, and the lowest, um, yeah, thing we see at a card product detail page.

And then we see that we have less than thousand conversions less for the variation one than for the original. Um, if we look at the three lines above, um, already for the search result page for the, the upper. upper line, we see that there are like 1, 500 conversions more in the variation one. So it’s not, it’s not about absolute numbers here.

Um, so I thought, okay, what, what, why could that be? And then I thought, okay, probably the product loyalty, the decision to buy is It’s probably created on the product detail page where they have more pictures, more information and where they can read, um, user like reviews of the product, whatever. And, um, so they need to go on this product detail page in order to, um, to do the purchase decision.

So I said, okay, let’s be very radical here and remove the add to cart on the category page as a follow up test. And we did that. We, um, and on the next slide, I can show you the variations again. We, um, on, and let’s go, go to the closeup. We removed the ugly dropdown and replaced it with the information, uh, more variations, um, available.

Um, so in that, the customer does not have to switch any, anything on the, on the category page or on the product listing page here, but we removed the add to cards. CTAs entirely. What do you think won?

I think I probably gave it away a little bit already with my uh, enthusiastic reaction to this test. Yes, people say variation won. Yeah, so let’s look at it and yes, we had like a very, I brought you several goals from the funnel. The primary goal was again sale of course, but we also looked at the whole funnel and the whole funnel was positive for the whole.

For the whole thing. And also we, we saw that, uh, that this time, um, the product detail page was not affected negatively. Um, and it was an overall, yeah, an overall good thing. The test worked better than the original and, uh, yeah, that was very interesting for me and one of the reasons why I like to track a lot of additional goals to see how many people are interacting, uh, to see how the users are behaving differently.

So let’s continue to our next test. Um, we have, um, here, uh, this is another shop, uh, typical clothing shop. Here we have, uh, the product detail page of a shoe. And, um, in some online shops, um, the primary CTA on the product detail page is only active when you choose an option. And, um, in this case, it was the same, but, um, we thought, okay, users might not understand that the product is still available, that you just, that you only have to, uh, choose your size in order to Um, be able to add the product to your cart.

So, um, for one reason, the inactive button does not have any action affordance. It does not seem clickable. It does not motivate you to click on it. And it can also be like misunderstood by the users. So we said, okay, if we make the primary CTA active, then users would be more likely to buy the product because they understand the CTA better.

Um, that the product is available and it also has a higher action affordance. So I can show you my two variations that I did. Um, the first variation is very simple. It’s actually totally the same what happens when you click on the CTA in the original. You get a, um, get an error message if you don’t, do not have, um, your size selected.

So the first state is in the, in the, in the variation one, the button is, um, active. If you click on it without selecting your size, you get this bitter, valid and of course, please select your size error message. This is the variation one, which is very simple change. And the second variation, I wanted to make it a little bit more organic.

I wanted to say, I didn’t want to have this error moment for the user. I wanted it to be more of a natural flow. So I said, okay, if you click on the button without choosing your size, we will just show you a nice green checkmark and then ask you very friendly, without an error message, to select your size to complete the add to card.

So these are our two options. On the next slide, you. We have the voting again. Um, what did you think? What do you think is the best option? 0, 1, or 2? Original, variation 1, or variation 2? Majority? No, not a majority anymore. Divided between 1 and 2. Um, and Mark has also asked if we can provide sample sizes and timescale of testing.

So how long did the tests run for? We don’t have it in the presentation here, but maybe you remember. Um, I cannot tell you entirely, but for what I’m always waiting for is that I have around 1, 000 sales per variation. This is one thing that I look at and I let the test run for at least two weeks. This customer has a lot of traffic on mobile, so I think this was probably around two, two and a half, three weeks.

Um, but other, my, the, the test with, uh, shopping, uh, with, uh, with the crafting supplies before with, with the ugly category page, I think that ran for at least four weeks. So we always try to have like whole weeks, at least two, uh, usually the test run between two and four weeks. Um, and we try to get the thousand sales per variation.

Perfect. Okay. For this one, now we have a slight majority for number two. Okay. Yeah, this was actually also my favorite, um, but I spoiler you now, um, it didn’t win, did not win. Uh, show us the results, Katja. Thank you. So we see here that we have a slight positive effect, um, for variation two, but it’s the significance is only 85%.

I mean, with us, with small effects, they might become significant if we leave the test. on for longer, which we didn’t do because our variation one, uh, had a stronger effect that was significant on sale. And, uh, that’s, this was actually surprising because my client was also very, very big, very big fan of the second option because it seems like more flowy and less intrusive.

But, um, yeah. So it could be like our German culture, probably it would be different in, for example, in the U. S. or America or different countries. That’s why we kissed. Yeah, that’s the example. Even though we often have favorites in our testing. That’s true. So always test everything. We can always support you if you have any questions about testing.

And, but yeah, this is the reason why we test and why we do not just implement everything that we learned from other clients. Um, yeah, but it’s great fun and it’s always super interesting what happens. Okay. Next test.

Sticky add to cart. Okay. So, um, here we had the case that we had a, this is a cart page. And down there we have the CTA, which means, uh, go to checkout. And it was, had very bad usability, um, because it was not very well clickable, to be honest. And, um, it didn’t really appear as a button. And there was actually no context provided so that it appeared like a buy box.

So we said, okay, we want to, um, And we also saw that a lot of users were leaving the checkout, um, the cart before entering the checkout. So we said, okay, we need to give them a little bit more security, a little bit more information. We maybe need to frame the button a bit different in order for it to be recognized as a, as a proper CTA and in order for the, to, to give the user more security so that they enter the checkout.

Um, and this was what we tried was on the next page. Um, we. Actually had, uh, one variation where we said, okay, we just give them the total sum of their purchase to just give them a little sum up of their order in the, in the buy box, uh, just for additional security. And we had a second variation where we also, um, added.

the possible payment options below because we we know that the payment options are the logos of the payment options can work as trust signals and can help the user to feel more safe because they know these options they trust this this the payment and this overshadows and this is like a halo effect that that can be, yeah, impact the whole buy box positively.

So we have two variations, um, on the next page we all have the voting again, the original variation one, variation two, and I would like you to ask, uh, to vote again who won.

Majority for number two? Yeah, you are correct. Uh, so both variations worked, um, and it’s, it’s really nice to see that, uh, both. it looks like the effect builds itself up. So we have an effect for giving the total sum and then we have a bigger effect for giving the sum and the payment options. So that was a nice test and it was a fun result.

Okay. Next one. Uh, okay. Um, Ah, yeah, this is one of Ivan’s tests, I think, uh, the one that I really liked. Um, we had, uh, this is a perfume page. They have really a lot of traffic, so we can do a lot of variations here, which is nice for us. And, um, we have the situation here that we know that users who do an EduCard on average do at least 2.

5, uh, EduCard. So they have 2. 5 products in their card. A lot of them do add a lot of more products. And we also see that the EduCard overlay is very Intrusive. It’s very big and the rest of the page goes dark and you cannot do much else and you also have recommendations on it And I don’t know if you have ever purchased a perfume But there are several options of the various that are very similar So you can get an eau de toilette an eau de parfum.

You can also get several sizes. Um, and there it is Quite expensive. So, um, in this case, we had recommendations that were kind of, um, very, very similar to the chosen product. And we thought, okay, maybe that can lead to, to confusion. Maybe that can distract the user and cause insecurity in their choice. So we had several things that we tried.

And the first variation that we tried was, okay, we kept everything very simple. We just removed the recommendations. I’m gonna skip the hypothesis. I think that’s, uh, you can deal without it. We removed the records and the second is, second variation was, we tried another look of another buy box that we see in a lot of online shops.

It’s a bit more, um, it’s a bit less intrusive. Uh, the page stays, uh, clear and you just have this sticky, Um, this smaller, sticky overlay, the Etch a Card overlay. Um, the next variation that we did, um, was even less intrusive, where we said, okay, we remove the overlay entirely, we just go with the direct feedback on the button.

Um, if you do an Etch a Card, the, the black button, Turns green for five seconds, I think, and the text changes. So it says something like the product was successfully added to the cart. So we have these three options and the original. And we again, oh yeah, there’s something, there’s a mix up with the layout.

But we have again zero original, one variation one without the recommendations, say two, the variation two with a, with a sticky overlay and three with a button feedback. What do you think?

Mostly three. Interesting. Um, actually, um, when we look at the results, I think I have everything on one page, uh, variation three was kind of interchangeable and, um, variation two and variation one were both significant. And in our case, a variation one, one, We had a 4 percent more sales in this test variation.

And, um, this was actually very interesting to me because, um, of course we sometimes also test recommendations in EduCard overlays and for some customers, for some clients, this really works. But in this case, the distracting recommendations really seem to be the big problem here. And, um, this overlay really seemed to solve that very well.

And it’s also, it also could be that the, um, The two buttons in the overlay that are very prominent, go to, go to, um, the checkout or go to the card that they are also very, very important for the user journey here. So I think that was my last test. I hope I did. One second, one question from Mark. Um, was this also a mobile test or only on desktop?

I think, um, we did a similar mobile test, but not the exact same thing, because I think we had a different situation there because the add to cart overlay on mobile is. from its nature different than for mobile because of the space issue. Um, I do not remember the entirety of the mobile test. Um, how, how, I think we had a little bit of different variants there.

Ivan, do you remember that? Was it? Yeah, we didn’t have the recommendation. We didn’t have the recommendation, uh, part, uh, at the, at the card overlay. So we had a different result there. Okay. Yeah. Also interesting. Also, it was a, it was a, uh, uh, quite a different hypothesis and we had different results. Yeah.

Because, uh, yeah, because of the spacing issue, but it’s also interesting because, um, yeah, the record thing was. Exclusive to desktop. So I’m done with my part. I hope you had I have to interrupt you once more because Brian is asking if there was any confusion on what to do next in version three Because it doesn’t seem like anything happened Um, I don’t think so that I mean, we don’t know what the users I think, but, um, for Variation 3, um, the button is always visible when you, when you click it.

And, um, the button is originally black and the, the wording says, uh, add to cart. And if you click it, click on it, then, um, then, um, yeah, it’s, it turns green and it tells you product has been added to cart. Where users could be confused is how they could enter the card. I mean, of course, um in the in the upper right corner where the card icon sits, the change was also shown with a small one for one product and that was in the in the card or to when the second product added to it.

So this could maybe not have enough, be stimulant enough to prompt the user to click on that. So that would be possible that that’s a problem.

Perfect, I’m done interrupting you. Okay, I’m done with talking, so I hope you had a good time and enjoy Ivan. Okay, thank you Vanessa. Thanks guys for staying with us. My topic is a bit not that interesting but I think very, very We will leave the field of heuristics and psychology and we’ll enter the field of statistics and test validity, which is of course also as important.

Um, and probably once you understood that, uh, everything is a great things and a great thing to do, and, um, you will have the enough resources you want to run as many a B test as possible. And once you are already running like every. Uh, like you, you test with every single user, um, you will think of ways to even more to conduct even more experiments and probably one of those questions or opportunity.

You already asked yourself, for example, such as should I probably optimize for micro conversions? Or should I, can I run multiple ab tests at the same time? Or how about reducing the confidence level, uh, from a 95 to 90%? Um, and I wanna, uh, discuss this, uh, two, this three, um, options with you. And of course, tell them our point of view on this topic.

Um, so let’s, um, start with the 1st question. Should or should you not optimize for micro conversions such as, um, um, at a cart rate? Um, or should you always aim for a macro uplift such as sales? So, uh, you can just write, uh. Yes, you can, you should, uh, optimize for micro conversions or no, you should also always, uh, try to increase sales.

Yes. That’s the answer. Yes. Okay. Perfectly. It’s even as, um, I mean, we had like one example of Vanessa. Where we had, uh, an uh, down lift, an uplift on ATO cart, a down lift on sales. It was kinda, kinda messy and it was great that she dig down and understand the reason why. And, um, I think, um, I have a different, uh, uh, another bi uh, example where it makes sense to.

Uh, also check the micro conversion and probably ignore sales if it’s like inconclusive. Uh, for example, if you wanna, um, highlight your filter and have the hypothesis that if filters are displayed more prominently, then more users will be more likely to buy it, because it is easier for them when they use the filter to find the desired product.

And you have like, um, this result. So you, uh, see. That you are, um, microconversion filter usage doesn’t increase. It’s like inconclusive product. Detail views also doesn’t increase, but you have, uh, like an uplift and check out and sales. Um, probably it could be like a, uh, uh, false positive and, um, I would, uh, prefer, uh, and probably I wouldn’t like, um, yeah, call it a winner.

And, but if you have like a case B where you see a filter usage did, uh, uh, rise, um, so you have an uplift and you have an uplift in product detail use. And that will tell you, okay, now people, it did work, people use the filters more often. And, uh, especially if then the endocard rate rise, then I would like feel, um, yeah, safe to say, okay, it did work even in, if like the further, um, metrics like checkout and say, uh, It could be just, um, that you didn’t have, like, enough, um, uh, traffic in arc and enough people, uh, to, to, uh, be able to detect the effect on this particular metric.

But if you see the uplift in the other ones. Um, I would, uh, yeah, prefer, uh, to, to, uh, take, call it a winner. Um, even, um, yeah, in comparison to the case, any

other questions or opinions on this topic?

Let’s take it for the moment. Everyone pretty much said let’s. Do micro and macro possible. Okay. Of course, if you have enough traffic, you should like Vanessa, check everything. You are at least those metrics that are important for your, for your hypothesis. But if your traffic is like, uh, limited and you have to decide.

Um, then, um, again, based on your hypothesis should be also important to check if it, uh, on the, on the first reaction where the change did happen. Okay, let’s move to the next question. Should you run more than one AP test at the same time on your website? Please again say yes, if you should do it or no, you shouldn’t like this, of course, opinions like, okay, there could be, um, like an interference, like the one could influence the other test.

And yeah, I will, I’m interested, uh, on your opinion. Okay. Firstly, Gabriela, uh, said about the last one, how much she relates to it and your recommendation because they have a very stubborn vendor. Um, and she really likes what you just said, which is great. Um, and otherwise everyone says, um, yes, you should test several things as long as they don’t overlap.

And as long as it works with GDPR and cookie policy.

Okay, of course. Yeah. And I think, um, if, except for the case where you have a pretty same hypothesis. Um, yeah, you should think of, okay, can I like combined it and tested within one test? Uh, we would also, um, strongly recommend chapter one of the three questions. One, the one where we have like the strongest opinion in recommending do it.

Doing it because, um, like, um, your test when you conduct an experience, there are already several, um, or many, many factors that influence your, your, your, your test, uh, like, uh, external factors, uh, like season, season, seasonality, sorry, uh, weather, day of the week and so on. And then your own, uh, um, like activities of your marketing department.

If you have like, sales week or what else, then probably your competitors, um, or, um, of this activities can influence your, um, your running tests and, um, the most likely view your audience, your, your target group that can also, um, change, um, or they are not. Everyone is not the same. They have different preferences, device, attitude, and that’s why you get such recommendations like, please segment your data to gain insights.

So, um, it is, and you’re so it is, you don’t have like a perfect situation. And, but you’re looking for, um, like you’re, if you’re testing a strong hypothesis, if you’re looking for a strong effect, you want to have like a. A winner which will stand like the time and won’t be a loser like one the next week.

So, um, if you add another test on a, um, on another page, then it will be just one another factor that like could influence your, the other test. Um, but you shouldn’t like to worry too much about it because there are already hundreds factors. And, um, just make sure that the hypothesis are different. Um, yeah, it would be good to, to have them on different pages, the tests, and then you should be safe.

Last one, probably the most, uh, difficult one or most where we, it’s not that easy to have like one recommendation. Um, should you go with 95, uh, confidence level, or can you go 90%. Would be interested to, to get your opinions. So yes, for, uh, you should stick with 95 or no, you can go lower, for example, to 90.

No answer to that. Everyone seems to be unsure. Okay. I had like, uh, I watched yesterday. 95 percent there’s already too much pollution. 95 percent of it’s an important change, but lower, if it’s not too crucial to the bottom line. I think it’s, yeah, it’s, it’s, it’s a great, uh, summary. I had like a watch yesterday, a video where someone said, okay, if I could, I should, I would, I would always go with 99%.

I think this is too extreme. Yeah. I have a few thoughts there. So the first thing you think is, okay, if I reduce the confidence levels, then like my, um, false positive rate will, uh, double. Which is, which is, of course, true, but the other side is the downside is like, if you want to have a higher confidence level, you will need much more participants and the downside is that you, uh, your test will, uh, run.

Longer. So if you reduce, if you go from 95 to 90, you can, um, dependence on some factors like the, uh, your, um, conversion rate. Uh, but yeah, around about, uh, you can, uh, you will need, uh, 25 percent less. participants. So you can run 25 percent more experiments when going from 95 for 90. So the picture I show, uh, shown, um, a few slides before was not like correct.

Um, but the like, um, radar pictures is this one. So if you have like 95%, you can run like 100, 100 tests and assume you have like, Your real, um, success rate is 33%, then you will find like, uh, 26, um, six, like real winners. Some of them you won’t find before, uh, because of the beta error. And then you will have like, um, three false positives.

And if you switch to 90%, you will be able to run more experiments. So we’ll have 125 tests and assuming your successful rate will stay with 33, you will be able to find, uh, much more, uh, um, yeah, winners, uh, 33, which are, uh, 20 percent more real winners in comparison to the other. Even if you will have, uh, like, uh, twice the number of the false positives.

But yeah, like the summary, if it is like a change, such as you change a headline, you, um, highlight your filters, and it is like a change with which don’t will don’t have like, um, additional costs, like a new feature, which you have would have to maintain, then I think 90 percent is fine. And if it is like a risk, a very important, um, Um, Yeah, uh, decision you have to make, um, then, uh, you, you should, uh, probably go then to stick with the standard of 95%.

Are there questions or opinions?

Okay. So, uh, thank you for, um, staying with us till the end. Um, if you guys want to see some psychology in action on your side, you can write us an email with the subject psychology to oe. lieb. de. Please remember de. We’re a German company. Thank you very much.

Speaker

Oliver Engelbrecht

Head of Marketing & International Growth, LEAP

Katja Kaiser

Senior UX Researcher, LEAP

Ivan Gluschko

Teamlead CRO, LEAP

Other Suggested Sessions

How Experimentation Works at eBay

Peak into eBay's experimentation journey with Benjamin, revealing the impact of data analysis and strategic testing on business decisions.

Decathlon’s Blueprint for Server-Side Experimentation

Join Antoine Tissier of Decathlon as he shares expert insights on server-side experimentation, AI integration, and ethical eCommerce testing for global success.

Personalization in Action: Maximizing Impact with What You’ve Got

Jay Lansdown reveals how New Look leverages user data to create personalized experiences, driving customer engagement and results through strategic, resource-efficient approaches.