Fireside chat with Ton Wesseling and Florentien Winckers

Transcription

Disclaimer- Please be aware that the content below is computer-generated, so kindly disregard any potential errors or shortcomings.

Jan from VWO: Good morning everybody, to another interesting webinar with VWO. And, if I say another, I think that’s the wrong way to say it because this one is kind of unique, first of all, it’s not only with one guest, it’s with two guests, ...

but more importantly, is who is with me today. And I’m really excited, to welcome two friends and very special experts in the field of experimentation and testing with us. Ladies first is Florentien. Hello. Good morning. How are you today?

Florentien Winckers:

Good. Seems fine.

Jan:

I’m really happy to see you again after where we’ve seen each other in Daxel, at Tongsy Bend. Florentien is with Albert Heijn, and she is in charge of experimentation in a very broad sense. So you would tell us more about how you managed to, create this wonderful experimentation culture at Albert Knight. Welcome. And, well, guess who is with us today?

Ton Wesseling. I can’t believe that you’re here with us. It’s incredible to have you, on stage with us. Ton is the, I mean, it’s really hard to summarize who’s Ton. I mean, for the few people in the audience who do not know Ton Wesseling, he is the creator of Conversion Hotel Conference, one of the most unique and best conversion rate optimization events in Europe clearly, and I say this without Ton having paid me to say it. So it’s, really great to have you here. Hi, Ton.

Ton Wesseling:

Hi, Jan. Thank you for having me. Excited about this webinar and looking forward to answering the questions from the participants, some interesting questions in there.

Jan:

Yeah. Actually, it’s the first time we’ve done it in this format. So, normally, we have a lot of webinars where smart people, like, you guys share some insights from their daily work with us or, deliver a keynote on a particular subject and so on. But recently, we have seen an increasing number of questions coming up during this webinar. So we thought it would be a good idea to have true experts in this field of experimentation and testing with us and ask our attendees your questions.

And we did ask this way before the event itself. Nevertheless, there is a box for anybody in the audience. There is a box where you can share the questions. If you haven’t shared them already with us, if there’s anything coming up in the conversation today, please do not hesitate to ask your questions, and we will do our best to, give you, a good answer on that. So, yeah, I would say let’s get the ball started and, start with the questions that not only one person has asked, but several persons have asked. It’s a bit generic, but it’s still a burning question for a lot of people.

What are the top challenges actually, faced by any organization when they are building an experimentation program? So I’ve seen that when people start working on experimentation or an optimization program, there’s a lot of enthusiasm until the organization is facing its top challenges. So what are the top challenges, Florentien, that you started working on? What were the main roadblocks that can come into the way, especially in the initial phase?

Florentien:

Oh, I think first of all, it’s important that everyone’s enthusiastic, but you need to keep them enthusiastic. So in the beginning, they all want to learn more about experimentation and, sometimes when it gets busier and they are under time constraints. And, often it’s, people who call it a reason that experimentation takes more time for them to develop or deliver. And, then it’ll hook, get, then you can create conflicts. And, I think that’s a point where, it’s very important that experimentation really is a part of the product development, and is not seen as an addition that really needs to be part of it. So, I think that it’s an important challenge.

Jan:

And Ton, you have, I mean that’s when very often people call you, your agency, Ton, when these main challenges come up, sometimes they call you earlier, and sometimes they call you later. What is your point on that? So if you would have to pick 3 or 5 challenges, which would be your top ones?

Ton:

Yeah. An experimentation program has 3 main pillars that will always need to be there. The first one is, of course, buy-in from upper management, like, Florentien is saying that you wanna make experimentation as easy as possible, but in the beginning, it will be an add-on. It will be an extra. So it takes effort. And, if the top management is not supporting this, then it’s kinda hard to put the effort in.

So buy-in from upper management is really, really important from the start of your program, if you start with else buy-in, it’s gonna be really hard to pivot, once you grow bigger. The second thing, of course, is the platform. Without a technological platform that’s easy to use, and is really embedded in your organization with the right metrics being measured, without that, without proper data, it’s really hard to build up a program. And the first thing is, and maybe this is not seen as imported enough, but if you put garbage in the experimentation system, it will spit out garbage. You will run experiments on garbage, and the results will be garbage.

So, knowledge of consumer psychology to me, is the 1st pillar, to be able to understand what you are optimizing, to understand the brain of your user, and to do proper research and to optimize customer experience, then you need to have knowledge of consumer psychology. So those 3 pillars buy-in from upper management the right technological platform and consumer psychology knowledge are to meet the 3 pillars to build a successful experimentation program.

Jan:

Yeah. Go ahead. If I add something to that, I also think that the quality of your data is very important. People really need to trust the data to make them trust the experimentation program as a whole. If they don’t trust the data, then you’re nowhere with your experimentation care.

Ton:

I fully agree. You have this famous saying that trust comes by food and leaves on a horse once trust is broken in experimentation, it’s really hard to get the trust back again in the organization for experimentation. Yeah.

Jan:

And are we sometimes, if I say we, as people who are working in this field every day, are we sometimes overpromising I mean, I’ve seen that, I’ve seen disappointment when the first tests were finished with the barrier and did not win and where the hypothesis could not be confirmed and so on. And, also I have talked to people who were disappointed that the conversion rates didn’t double in the first 3 months. So is there a lot of overpromising, happening in the experimentation industry?

Florentien:

Well, it depends, I think, on what you think is overpromising, or that maybe you’re not overpromising, but you’re not focusing on the right things to make, the right impact, or maybe your impact is on a whole other level, by decreasing risk, and not by increasing your impact, but, it’s so I think overpromising is a kind of a fake term form and experimentation platform.

Ton:

I think the biggest problem is when people only see experimentation as something to lift their conversions, then this over-promising topic becomes an issue. Because if you believe that the sum of your A/B test winners, is the money you will be adding to the bottom line. That’s not true because of all sorts of statistical reasons. But once you see experimentation as something to test everything you put live, if you ship some codes for your products. It kinda makes sense to do a risk analysis, security-wise, will it be stable?

Is it fast enough but then also you wanna learn if it’s not hurting conversions. And if you were not experimenting before, then we’re just shipping codes without knowing what the effects on the bottom line were. You were shipping lots of stuff that was hurting the business. And now you can see if it’s not hurting the business. So you’re validating, what you’re shipping and that’s also bringing a lot of money and a lot of insights to the company.

So if you take that perspective, then overpromising is not an issue anymore because in the end, you wanna fill out everything and it doesn’t make sense to ship something that’s not being validated.

Jan:

Yeah. I think you said something very important there, if it’s just reduced to creating a lift in your conversion rates, and that’s it, and so on. That is missing the point of creating an experimentation culture. It’s, let’s say, an underlying principle. And, when we first met for tea, you told me a bit about your role and, how you addressed that. It was really interesting because Albert Heijn is clearly a place where you can see and feel an experimentation culture. So what is it? How did you build this experimentation culture where everybody embraces testing and, accepts it, and considers this to be something really positive and not annoying?

Florentien:

Yeah. Well, it starts with making them realize what the added value is of experimentation. And that you can’t really ship any new product launch without testing it because you’re facing risk when doing so. So it’s really a tool for people to offer people in the company to make, to measure the impact they’re making, and to see if they’re making the right changes for the company. So it started, I think, with a mindset of change, but a very important part of building our experimentation culture is also in the people.

So, it makes people want to do experimentation. And, I was thinking about this and, I think you can implement all Cialdinis. It was Siri trying to create an experimentation culture. So one thing I started doing was sharing the people that were doing experiments and sharing, what kind of experiments they did, and what the impact was at the beginning when I was only trying to make the content team, trying to do experiments. I created something called the experiment, the conversion kings, and everyone within the conversion team, in the content team that did experiments. I photoshopped, like, a crumb, and I emailed everyone with, okay, we have a new, content, conversion queen. And that made people want to be a part of it.

So that was really a motivator for them to start doing an experiment themselves. And, when people did their first experiment, it became way easier to do the second, then the third, and the 4th. So I think the biggest struggle is to get to do the first one. And this was just something that I didn’t begin, and while we’re growing bigger, and, now we’re also working with product owners, the CRO team, and, other analysts. We now have an experimentation community that people want to be part of.

Ton:

It’s a great way to get people in your group because, in the end, the experimentation is really fun. It’s rewarding. It’s high-speed. It’s solving puzzles. It’s having fun together.

So if you make it seem to the outside also as a fun group that you wanna be part of, you wanna be part of that specific community, that’s really important. Also I would like to add transparency. I believe you also are doing this within Albert Heijn. You wanna be transparent about everything you’re doing with experimentation.

Everyone who wants to know about some experiments should be able to dive in and get access without asking upfront. It should be an open system that you really want to be part of and it also should be easy to become part of that system. But I like to crown, Photoshopping, system. I think that there’s, like, a nice internal marketing touch to get people more enthusiastic about experimentation.

Florentien:

We host different events to share what kind of experiments have been run. So we have a monthly meeting, the experimentally, where experiments run in various departments are shared within the company so that everyone can hear more about it and, that there are things that can be shared. And we also organized the experimentation awards, so that, we could really shine a light on everyone that didn’t experiment and, could win a prize.

Jan:

Really interesting. So that’s a kind of gamification of experimentation to a certain extent. So it’s really with a purpose, but it’s a fun thing. And, not just dry data-driven analysis. So it’s actually a playground that makes sense for the company and, and fun to do.

Yeah. Interesting. Let me have a look at a couple of questions that came in recently from, smaller companies, and their related challenges. So one company has how do small and medium-sized businesses try to optimize if we’re looking at a future where, Google AdWords, Google Analytics, and Facebook are likely to be banned. I think she’s referring to a cookie-less world or something like that.

And they don’t have the budget to implement a big data solution, that will be required in this particular surrounding. So there’s let me translate it to two questions. Is there really a cookie-less world we are looking at and what will be the future for these medium-sized companies, will this be the end of experimentation there?

Ton:

I believe there are 2 separate issues here. The cookie-less world is one thing. But this question is also, looking at the European legal issues that are going on with storing data at US-controlled companies, that are under the Pfizer X, in the US. Within the next months, every European country will be not allowed to store data even in Europe on a server owned by Google or Facebook or whatsoever because they are part of the Surveillance Act of the US. And they can get that data, which is not allowed from a European perspective.

So that’s gonna be a challenge for every company, not only small-medium business but also for the large companies like, all the time. There’s no European cloud or whatsoever, that can handle all that data. So we have a problem there. It’s of course, if you are still selecting, your software it kinda makes sense if you’re in Europe to, work with European software vendors, but it’s hard. There’s not enough available, to help with this challenge, but we’ll have to see what the future brings.

The other question, the cookless world is maybe even more interesting because if you’re not allowed to do anything with cookies, how can you recognize users in your experiments? And there’s also a problem for bigger companies and the small and medium businesses, without cookies, of course, it’s third-party cookies that are being banned now or not being able to push anymore, to the browser. First-party cookies are doable. So surface site experimentation is a launch or cookie from their own web server. I think every vendor nowadays has a solution for that.

But if that also is gonna be banned, in the end, you want people to log in. You want people to identify themselves to you. And if you are a brand that can be trusted and you have, incentives, to, a loyalty program, that those people log on to your websites, then they will identify themselves. And then you even solve the issue of running experiments on mobile and laptop at the same time to the same user because that’s the challenge we are already facing nowadays that you have cross-browser issues in experimentation. So, I think everyone will be pushing for logins more and more.

Florentien:

Yeah. I think that logged-in users are becoming more and more valuable. So, I think the focus needs to be on that and maybe the experimentation incentives that you’re going to be more focused on getting more people to log in. Or to get people to become more loyal, or to become part of your loyalty program. So, of course, those things are also of high value.

Jan:

Yeah. That’s affirmative. We see a lot of experiments in our platform that are kind of more focused than ever before on registrations, logins, and so on. So there’s definitely something going on towards that. But I think the tranquilizer here for the small companies is that it’s not only for the small company, so it’s a bigger challenge that everybody is facing. And, let’s see what the future holds. Maybe we’re gonna have a European cloud and, in the future. But, that’s it.

Ton:

For SMBs, if you don’t have a large group of loyal users who are logging into your websites and only working with new leads or prospects who are not identifying themselves, then you have to step down to session-based optimization. You’re gonna use session metrics. And, of course, the trade-off is no experimentation or session-based experimentation. And, of course, user-based experimentation is of higher value, but session-based experimentation is of way bigger value than no experimentation. So there will be more noise in your data.

There will be some more risk, but still, it makes sense to run experiments and then on a lower level of data.

Jan:

Yeah. That’s right. Another question from the same group, here, was that many companies think that when they do not have the volume to test, they would, like to know how many tests can, or, should run simultaneously, and if you run more tests, simultaneously, what could be the impact on the customer or on the user there. So, how many tests can or should you run simultaneously? Is there a rule to that?

Florentien:

I think it depends on the amount of traffic you have and the amount of experiments you can run simultaneously. Running, testing and multi-energy shouldn’t be a problem if you have enough traffic, but, it also depends on the metric you are testing on because for some kinds of metrics, you need to have more data available to measure any effects for other metrics for you. We want to see, like, a bigger change then, you might need less traffic. So there is not really one answer to this question, in my opinion. I think that you can run simultaneous tests but you really need to think carefully, about how much traffic you have and, how many tests you have.

Yeah. How much traffic do you have available for the different tests?

Ton:

I don’t really agree with the simultaneous tests. But I will pause that answer, for a couple of seconds because I believe there are two questions in this question. The first question is when I don’t have the number of users to run experiments, what should I do? I’m a small media business company. I’m below 1000 transactions per month.

Should I be running experiments? And, then, of course, the answer would be no because you don’t have the data to come to significant outcomes. From a transaction perspective, maybe you have enough clicks, and you can still run experiments. You can do fake door experiments. You can buy traffic on Facebook or Google as long as it’s allowed to do so, to understand, to use it as research, to understand the behavior of your users and your visitors, because, in the end, optimization is a task, experimentation, testing culture, is a mindset.

So you continue to want to test and learn, and maybe you cannot use experiments that often to test and learn, but you can still use all sorts of other researchers to come up with your user research, feedback, surveys, screen recordings, heat mapping, your data, cohorts, which you can use to come up with better alternatives, but then you will probably not be able to test them against transactions. So you need to take more risk, but you can still do the whole process of optimizing your website but you will not be able to exactly. Yeah. If you have the numbers to run experiments, then the number of simultaneous experiments is unlimited. Because what happens, of course, there are 2 sorts of experiments.

Like, you have a checkout flow of three steps. You can run an experiment on step 1 and step 2. Or you can even run, like, 4 experiments at the same time as step 1. And this is the part where most people will say, well, you should not do or do like, a multi-parent experiment, but you can have, like, 4 A/B experiments at the same time on one specific pitch. What will happen in reality is that there will be more noise in the data because, maybe the combination of experiments both variations 1 and 2 are adding value to the conversion and not separately. They have to be there together. So you will get more noise your winning percentage will go down, so the number of significant results will go down a little bit. But then you’re, like, instead of one experiment with, like, a 30%, win percentage gland do 4 experiments with a 25 percent win percentage, in the end, will lead to more significant outcomes. So we always take that noise for granted of course, your percentage will go down a little, but the number of absolute significant outcomes will increase.

So you will learn faster in that case, and you will outgrow your competitors who are only running one experiment and are not doing simultaneous experiments. And of course, there’s a technical challenge here. If you run, like, 25 A/B experiments, on the same page in all different elements, at some point, the page will be broken. So our road system should be really good.

And there will be development limitations to the number of experiments you can run at the same time. I think that’s the limit, but it’s not from a statistical perspective.

Jan:

We have in our enterprise plan, a feature called mutually exclusive groups, which helped a lot in this matter, and it’s probably the most demanded feature, why people upgrade to the enterprise version of VWO, where one single user only on one page only sees always one experiment. So it is just that, but you need a lot of traffic, of course,

Ton:

But that doesn’t make sense because then you’re limiting the number of users in your experiment. I rather have 2 experiments with everyone in there. Presently, much more exclusive experiments with only 50% of the population in there because it will limit us the chances of finding significant outcomes unless I’m Facebook or Amazon.

Jan:

Yeah. But, I mean, it adds some value in some strategies there, and, it helps companies to, let’s say, reduce the noise. It does, definitely. But there are other ways to implement it as well. I agree.

There’s number 3 in this block someone who is running an online rental business for baby clothes and crow and they are crow beginners. So interesting business concept, renting out baby clothes, what metrics are necessary to track along the entire customer journey on a website online shop to effectively do optimization? So, I mean, of course, there are an unlimited number of metrics, but what would be, let’s say, in a classical online shopping, user journey, what would be if you had set up your road map for experimentation, what would be the top metrics that you would have in mind?

Ton:

Reach, interest, sale, loyalty. The typical e-commerce flow, attention, action, loyalty.

Florentien:

Order value. Or, anything that is interesting in your products. So any micro conversion from the ads to baskets or checkouts, steps but I think the metrics that you need to track or if you want to track are also depending on the strategy of your company. Since, of course, you want to experiment to increase the impact of your company or make your company better. So tracking the metrics that are aligned with your strategy, is also very important. For instance, if you want to, if your USB is that you want to have the most loyal customers, then I think that is, one of the most important things that you, would like to track.

Jan:

Yeah. Without having a look at it, what is the actual status quo? I mean, to understand if you were, I mean, if your question you wanna start with optimization. So what are my priorities?

What should I start measuring? I think the first thing that you should actually do is look at your user journey and, ask yourself or ask Google Analytics where you lose your customers. Is it a bounce rate challenge, that people arrive and they just drop off again immediately, then you

Ton:

If you have a data list available, that will be, of course, the best thing to do. If you don’t have a data analyst, makes sense to optimize backward. So start at the final step. If you see a user dropout after having stuff added to the baskets or subscription-based, I’m not sure what kind of business this is. But if you lose people in the final step, then optimize that step and then work backward. Because if you start optimizing your advertising campaigns and your homepage, It could be that you’re building up the wrong story. And then in the end, you promise something that you can have the liver up to, and they will drop off in the end. So work backward in optimization. But if you have a data list available, then it kinda makes sense to look at where’s the big and drop out. This is from, like, attracting to, browsing around? Is it from browsing around to Add to Basket? Is it from Add to Basket to the sale? Is it from sale to loyal buyers, we’re at a bigger surface. There’s the highest opportunity to optimize.

Jan:

Yeah. Absolutely. And it’s interesting to say when we have our first contact with clients very often, there is no clear picture of where is the problem. So, setting up a funnel, so if you go in this way, trying to generate these insights. And I said, you either review your Google Analytics or you use something like VWO funnel analysis that’s, of course, our preferred option.

But, I think there are ways to do that. And then you can look at the metrics where you really need to engage. And ask that. So, again, I would like to come back because it has been asked so many times, and talk about the cookie-less world. Which kind of becomes a buzzword, I feel.

I’ll read that. It just dropped here and there and so on. Let’s clarify this a little bit. Question number 1, is there gonna be a cookie-less world?

Florentien:

Yeah. Interesting question.

Ton:

Will all browsers ban the possibility of storing information locally on the user’s machine I think that’s the main question. Yeah. It could be, but they probably will give the control keys to the user. So if the user is allowing you to store local data.

Jan:

Yeah. Exactly. I don’t get this discussion. I mean, as long it’s a transparent way of advising and getting the consent in a defined way, let’s say, without tricking him into clicking on yes. To everything. I mean, it’s like we’re living in a free world. No. I mean, I can get by. I’m I’m I’m also registering or not. That’s also a free decision. They can’t prohibit it. They’re all registering.

Ton:

The default is gonna be not being allowed to store information because the default used to be you can store information unless it gets rejected. Exactly. And now the default comes, you can you can not store information unless you get consent to store information.

Florentien:

Oh, I think that the way companies will cope with these kinds of problems will also change. You already see that giving more, more advantages to people who are logging in is a better loyalty program. I think those are gonna be more and more important for people to get them to store their data.

Ton:

Yeah, and data may not be stored in a local machine anymore. If you look at the Teamburners Lee initiative of Solid, which is creating parts of data hosted in the clouds, controlled by the user in itself. And that user can allow websites to use that specific data. And even data from other websites, that specific websites can use to optimize the experiments for specific users, but then you are in full control of the data, not locally, but hosted in the clouds, and you are controlling who can share, who can use that information and store information. We probably will go in that specific direction, but as a website owner, you will be able to use user behavior when the user allows you to use it. So if you ask for consent and they will give it to you, then you can use it.

Jan

Yeah. I get it. But it gets us kind of to the consent management thing. So, I mean, nowadays, everybody is already asking for consent because they have to comply with the actual regulations, etc., and so on, but there are such a wide range of different ways of asking for consent. So the boring text blocks really kill everything, and the only thing you wanna do is leave that aside.

And then the more, let’s say, natural language thing in combination with legal language, of course, but in a more human way, is our people. Are we already testing in this particular field? Is that what you came across as well? Were people testing different ways of asking for consent?

Florentien:

Oh, I think that’s a very smart thing to do, to create a little bit of more trust and transparency to see how you can get more people to agree on the terms.

Ton:

Yeah. Well, when GDPR hit Europe, everyone started testing with consent questions. And I think the biggest problem was that, back then we were still allowed to have a cookie wall and people just had to click yes to be able to visit the website, which worked really well because they just came there to reach this news article. And if they click yes, nothing happens, nothing bad happens, and you can see the website.

And then you had to ask for more consent, like the opt-in consent, and then the publishers would tell you, okay, if you give consent, you will have, like, the more targeted banners. If you don’t give consent, you still have banners, but they will be not targeted. So useless banners. Which option do you prefer?

Oh, I like the targeted one. So, as long as it keeps in this loop, for one, we’ll try to optimize the number of consents, but that’s not the proper consents you wanna have. The direction we’re going is that you wanna be a trustworthy brand that really deserves consent from the user. In all directions from this, the European Union is focusing on really getting proper consent and not tricking someone into that specific consent. So, you have to be open. You have to worry about the environment. You have to tell everyone what are you doing with sustainability with your brands, because, yeah, you need to get that trust.

Jan:

I totally agree. We have a couple of users that, joined us today and came up with mixed questions, which are a little bit more specific, but a lot of people will come across this question. So, one is, what is your perspective on creating a holdout across simultaneous site page tests so you understand and maintain a baseline experience?

Ton:

Maybe I want to explain the holdout to the attendees. This is the process where you let’s say, have a fixed percentage of 10% of your users that will not be part of any experiments. So they will have the same experience throughout your whole experimentation process. And then the question is, is this a good thing to do?

Jan:

Yeah. And and what do you think? Is it a good thing to do? Is there a reason to be for our companies? Are your clients applying this?

Ton:

The only companies that are asking for this are the companies that do not trust experimentation yet. So they don’t trust us exactly. And maybe they were over-promised, coming back to the question in the beginning, that they see all these winners and once they get implemented, they don’t see the uplift, that they were expecting because they were just counting the numbers from all the winning experiments and they don’t look at false discovery rates or type M errors and so on. So this is why they wanna create a whole life because they wanna see what’s going on.

So I don’t think that’s the best thing to do, because the problem is that they don’t trust statistics. So you have a different problem to solve and create all the group. It’s not a problem. It’s not how you wanna solve this specific problem. You wanna have trust in experimentation and data.

So you have to teach them, explain to them, hire an external someone that elaborates on how statistics is done in experimentation And once trust is there, then you can continue with the program, and then you don’t have to worry about the whole.

Florentien:

Exactly. I agree. So, I think having a whole lot of groups, if your experimentation platform really works, can cost you a lot of money since this group isn’t exposed to all your winning variants. So actually it’s quite, expensive to do so.

And it also makes the code, for the developers way more complex because all the old variants need to be held intact as well. And if you really want to know what, something you did in the past or added in the past, still adds up to your metrics. Then you can also, do some reverse testing. So, removing it for a small group to see if the thing you’ve added, is still a winning variant or if it’s still adding value. So in that way, maybe.

Jan:

Yeah. I think we all agree on that. Yeah. It’s altogether, a can-do thing, but it’s not really recommended. Second question, this block of questions is really interesting. What would be your advice on the trade-off between small A/B testing on single elements versus testing a redesign?

Florentien:

Well, in my opinion, I think it’s always better to do small experiments. When testing your redesign, of course, sometimes you really want to change the look and feel of your website. But, the problem with the redesign is you’re making so many changes that you don’t have any clue, what change is a result of what effects or what effect is from the changes. So, I would always recommend making the change as small as possible. And if you want to change a whole page I know from my experience that sometimes there are just bigger changes that need to be made due to programs that need to be switched or something. Well, maybe then try to group together, changes that are likely to have the same effect. So that you can learn as much as possible from your experiments.

Jan:

I hear you.

Ton:

In the end, of course, you wanna do both. You have experimentation for user research. You have experimentation for conversion optimization. The work we do is all based on, Edward Deming, and the quality circle plan to check extra small incremental steps to understand what’s going on. And once you understand what’s going on, you can create a new level of quality.

So it kind of makes sense. If you look at a specific product page, for instance, run all these small experiments to really understand what’s going on. Once you understand user behavior, you apply them to a new design. That’s your new basic rule. And of course, you will test that specific design.

And if it’s good, you can continue and optimize from there. So in the end, you have to do both. Of course, if you really wanna understand what’s causing the difference. You have to do a small test. Sometimes you have to take a bigger step. So it’s not a boat. I’ve seen small changes that cause really big effects. I’ve seen big changes that cause small effects and vice versa.

Jan:

Yeah. True. However, I remember one case where, a client asked us to start with A/B testing, on a website that was totally broken. That was absolutely crap. There was no there was no way, that you could deal with it.

And so my answer then, in this case, it’s not either or it’s you start with a redesign and then immediately start optimizing, on the new elements, but don’t fix something that’s absolutely totally outdated and separated.

Ton:

Yeah. You don’t wanna work on something that’s really that because you were only optimized for a local maximum, and then it will be dead plus 2%.

Jan:

Yeah. And better and better start with redesign then. And then quickly switch it to AB testing and improving that. And do not spend too much time with the initial redesign, rather get with something healthy.

And then start optimizing it, immediately. There was another interesting question coming in, well, that’s a very easy one. Can every company independent of its status quo and stage kick off experimentation?

Florentine:

Yes. Of course.

Jan:

Exactly. It’s probably the simplest question. It’s cool. Yeah.

Ton:

It’s a mindset. So if you wanna become an experimentation-driven company, of course, you can start by doing this and experimentation-driven does not mean that everything has to be an A/B experiment. If you don’t have the data in the beginning, then you can still have the experimentation mindset and test, learn, and optimize. And at some point, you can also run experiments. But if you’re really low on data and you’re still beginning, you’re a startup then you have to take more risk, but you still can do user research screen recordings, heatmaps, data, and so on.

Jan:

Exactly.

Ton:

And then, at some point, scale up to experimentation.

Florentien:

And I think the first experiments can be very easy, you know, like, the usual button changes, color changes, and when you get more mature in your CRO, your experimentation mindset, then, your experiments also become more complex, and eventually testing anything in, the backend. To improve the smartphone.

Ton:

Please don’t tell them to do button color testing.

Jan:

No. It’s a star.

Ton:

It’s not a better color. It’s about, like, visual fluency, visual hierarchy, and a dash of buttons stand out enough, to be able for people to see it. And then this could be that you have to use a different color than your normal, logo color, for instance, or your website color, but it’s not about the color. It’s about the perception of the button to the users.

Florentien:

Yeah. But it was just a start for people.

Jan:

Absolutely. Perfect. We’ve got some questions coming in from the audience. So let’s switch to, those ones, because I have a couple of them that are quite interesting. There are a lot of people in the audience that are involved in conversion rate optimization. Royal is asking, do you think it’s possible to be a CRO specialist without including A/B testing in your operations?

So Yes. Thanks. Yes. But, I mean, is that question implying, for instance, the question Ton, are you running? I mean, you are close specialists.

That’s your business and so on. Your agency has a lot of experts in this field. How much testing do you do in your own operations?

Ton:

For our own company, we are a small 1,000,000 business company, of course. So well, we don’t have the data to run experiments on our website. We run experiments on our email links and our advertising because then we have the data to make proper decisions. But to me, conversion optimization is a task of experimentation cultures and mindsets. So you can perform conversion optimization without A/B testing.

You’re just gonna have different ways of telling that you’re doing something good. You will be more biased. You will have fewer quality decisions, and lower quality because you’re not AB testing, but you can still do conversion with optimization. That does matter. We still optimize our websites. But, yeah, we cannot run A/B experiments on our website.

Florentien:

No. I think you can do all sorts of research to optimize and just A/B testing or controlled experiments are the ones that are, the highest in evidence, the best in evidence. So, it depends on the risk you would like to take as well. Or adverse device that you are willing to accept.

Jan:

This is kind of the answer to the next question where Molly asks us would you run an experiment if you knew you wouldn’t have the time to reach statistical significance? Could you read the results based on the visual shift in user behavior and take action, knowing that the bottom line conversion rate was not 100% accurate?

Ton:

If you don’t have the power to run experiments, and power is defined as how big is the chance that you will find a significant outcome if there is a difference to be detected? If that’s too low and you know upfront that your outcome will not be significant, then you should not run the experiment. You need to take more risks. Just implement it. And signs, like, there’s nothing like almost significance.

There’s also nothing like I signed and it’s looking like it’s probably getting significance. That’s all statistical nonsense. It’s false positives that are ruining your decision-making. If you don’t have the data, then you cannot run experiments and just have to make a decision. And that will be faster, and more risky, but you’re in that specific stage where you have to take more risk.

Florentien:

Definitely. Yeah. And you agree with that so.

Jan:

There is one question that came up, and I think that is related to something that we’ve heard from Eutron as well. What are your thoughts on the impact of a false discovery rate while calculating the impact of a conversion rate optimization program? I see it on smiling.

Ton:

Yeah. You know it’s so easy to present, one A/B experiment. Like, we have a significant effect. It’s like a 4% uplift and base if we implement this within 1 year, we will generate 1,000,000 extra revenue. You cannot say this from one experiment because those experiments can be a false positive.

It can be that the outcome is significant, but the measured outcome is not the same as the reality. The only way you can have money for your hero program is to look at, like, 100 experiments if you have a whole group of experiments, then you can calculate the false discovery rates because you know upfront based on your significance levels, how many of those outcomes will probably be a false positive. And then also once you implement them all, you have this type M error adding to the equation. All your significant results are right skewed as I say in statistics. So the outcome looks more positive than it is in real. So if you consider those two, then you can calculate the edit value of your optimization program.

But, of course, also make sure you are adding the negatively significant outcomes, the stuff you were not implementing because the experiment was telling you not to implement this. If you add those 2 together, or those 3 together, then you have a failure to your program, but you will not be able to tell if it’s that one experiment that brought the money, you just don’t know. Only if you like, we want to experiment 6 or 7 times like they do in science if you have to publish a paper, then you probably can tell with a certain assurance that this is reality and this one experiment made a difference, but we are in business. We’re not in science. We don’t wanna rerun the same experiments seven times just to be 100% sure.

Florentien:

Oh, and, in addition to that, you also need to take into account, what customers were included in your experiments, and what customers were not. So what percentage of your customers does it apply to even, and the seasonality effect? So if, for instance, Albert Heijn is making a lot of money in December, and, if you are running your experiments in December, then well, I would recommend rerunning your experiments again, then in an off-season. But you need to take into account that if there’s something going on or maybe in summer, then when baskets are smaller,, the effect might also be smaller than, it would be in another month.

Ton:

Yeah. There’s a big discussion on how much time are you allowed to take for a winning implementation to add value to the bottom line. Is it like 6 months, 1 year, 2 years? And what we’ve learned, is that it differs per experiment. If it’s like a usability optimization, if you’re taking away hurdles from your website, then it can really help for a long time and will add value for a long time.

If it’s a specific, consumer psychology-facing optimization, that’s putting you in a different perspective than the competitor, then maybe it only works for 3 months because the competitor will copy this, and it will not be an Adventist anymore. So It really depends per experiment on how long it will take, but going back to the question of false discovery rates. Yes. From all the significant results that you’re implementing, there will be some results in there that will not add any value to the bottom line because there are false positives.

Is there a problem? No. It’s not, because they’re probably not hurting your business. And the biggest problem is in having false negatives where you had a good ID, but you were not able to prove this in an experiment. I think that’s a bigger problem.

Of course, false discovery rates need to be taken into consideration when calculating the amount of money in your program. But if you’re in a company that really believes in test and learn experimentation, this is not something you do anymore because everything is being tested. Doesn’t make sense to ship anything without validating it.

Jan:

I actually thought we’d just see a question coming in from Dimitra, which is really interesting. Does it make sense to lower the traffic used in a particular experiment if we are seeing a negative uplift in the 1st days of the experiment? So we see that our hypothesis seems to be wrong. It’s going in the wrong direction. Panic kicks in. The boss shouts out.

Ton:

Or you continue or stop your experiment is the Simpson’s paradox as it’s called in statistics. You can look this up on Wikipedia. In Simpson’s paradox, if you shift traffic during the experiments, then you will have all sorts of statistical issues with that experiment. So if you really see that the experiment has a really low chance of becoming a significant positive outcome anymore, then just stop the experiment because it’s hurting your business. And maybe something’s wrong, maybe something is broken, or there’s a bug that can also be an issue, but it’s you can calculate the chance of that experiment still becoming a positive outcome along the way. It’s like sequential testing. So if it’s really hurting, then don’t lower the traffic. Just stop the experiment. Go back to the drawing table.

Florentien:

Well, if the amount of traffic that you’ve collected is way below the sample size you’ve calculated, then the effect might also change over time because effects can fluctuate when your sample size hasn’t reached its max or if the size is calculated. So I think it depends on how much traffic you already have in your test and on what day you are running your experiments. I think if you see a major change, then, you should really be alerted and stop your experiment. But if it’s just like, a small negative effect, then just wait for at least 1 week. So you have, like, at least 1 business week of data.

And then see if you already reached your sample size. But don’t look at your results to Austin because you will have, like, the peaking effect. So I think if you decide to change your changes, then I would recommend stopping and otherwise just wait until they’re sample size.

Ton:

And this is why you wanna do sequential analysis. So if you know upfront that your experiment will run for 4 weeks, you can already take a consideration that you wanna pick after 1 week, 2 weeks, and 3 weeks, of course, for backtracking and then the separation mismatch you will be continuously, but if you take specific weeks for sequential approach, then after 1 week, if it’s really significant, like 99%, then you can stop even from a winning experiment in the sequential price, but I fully agree with Florentien. They don’t be worried, like, in the first 1 or 2 days, unless, like, there’s zero transactions coming in, then you probably have a technological issue. And, of course, then you should.

Jan:

Yeah. It depends probably on the importance of the test if it’s running on your own page and every user is affected and so on. And if it’s a test that is quite revolutionary and quite where you weren’t so confident, in the first place, then eventually you might consider that, but normally, whenever we see that in a running program, typically we encourage you to keep going for a week or something like that if just, but not look at 24 conversions and then start thinking that your hypothesis must be totally wrong.

Okay. There’s one more, well, a couple of more that have been coming in. How would you suggest predicting the contribution of experimentation to revenue when multiple tests are being run on the same site? So what is the attribution of different experiments? Are we doing this in such a way? So how would you suggest predicting the contribution of experimentation to revenue when multiple tests are being run on the same property?

Florentien:

You can still tell from this. Yeah. The effect of each experiment. Right?

Ton:

There are some issues where we’ll have a positive significant outcome and some will be inconclusive. Yeah. And the very ones will add value unless they are false positives, but you cannot calculate the value for one experiment. You have to look at the whole group of experiments. It doesn’t really sound like a company that’s still a bit immature in experimentation and wants to understand what’s adding value. In the end, you will test everything.

And it’s like if you create a new medicine, you wanna test this first before you ship it to potential users. Otherwise, you may ask yourself if a lot of people will buy.

You know, if you’re giving products without testing, maybe your clients, your customers will die So don’t do that.

Jan:

Oh, Ton, you brought the the the keyword die. You just mentioned it. So there’s a question for Ton, somebody is asking. You once said conversion rate optimization jobs will die. I’m really shocked, Ton, what makes you say that, and when exactly will this happen? Because hopefully not soon.

Ton:

I believe it’s already happening. Florentien, are there still CRO specialists within Albert Heijn? Are you all experimentation consultants?

Florentien:

No. We have Sierra specialists, but we are growing into an organization where we do zero consulting. So where the product teams are testing, and other parts of the company as well, but I really think that it should be consulted by a zero consultant. So that’s something we’re kind of shifting on moving on to.

Ton:

This she wrote just will die as part of a keynote presentation. I once did in Austin at CXA Live. If you look at the growth of experimentation or conversion optimization companies, it starts with one person and running so many premises, doing some CRO, it becomes a whole team of people being the CRO specialist, but then if you wanna scale up, you wanna have every marketing team and product team run experiments and base their decisions based on validated data. So then you need to build a sense of excellence that’s making it possible for people to run experiments and the product teams will run experiments. But in this product team, there will be a copywriter, a designer, and a developer, but not a specific person with the task of CRO.

This is what they do. They optimize stuff. They wanna have better outcomes. So then that specific specialist role will disappear because you have this center of excellence for experimentation that’s helping all the teams. And, of course, if you need help with consumer agnostic, it will not be your CRO specialist. It will be a consumer behavior or a consumer experience specialist who is helping a specific team to get better results. So then it becomes obsolete that there’s no reason for a CRO specialist anymore because the whole company is having this approach.

Jan:

Great. We’ve still one minute to go. There’s one last question coming in. What is the best experimentation platform in the world? No.

I’m just kidding. I mean that we would ask that but it is so obvious, Richmond. Right? So we’re not gonna discuss that now. I just wanted to say thank you very much Florentien. And thank you, Ton, for these insights. I think it really inspired people to either kick off or go deeper and scale up experimentation. There are so many things that are to be done. And, getting back to what you said initially Florentien, and as long as it’s a fun thing and there’s a little bit of gamification and there’s a reward and so on, it can really bring in a lot of positivity into an organization.

That is also what I take away from, the spirit that I saw when we met at Conversion Hotel, in November last year, there is a lot of fun in it. There is a lot of success in it. There is a lot of measurable success in it, and that is what makes us all enjoy, our not-so-s soon-to-be-dying jobs as experimentation consultants. Thanks again for being with us. I hope to have you back, and for all the people in the audience, please, if you have further questions, that you really need help with, drop us an email. You’re going to get the recording of the session anyhow. So just answer with an email to us, and we will, be happy to share this with Florentien and Ton to find you the right answers. Thank you guys. Have a wonderful Tuesday.

And hope to see you soon again.

Florentine:

Yes. Thanks for having me.

Follow us and stay on top of everything CRO

Fireside chat with Ton Wesseling and Florentien Winckers

Key Takeaways

Summary of the session

Webinar Video

Top questions asked by the audience

How do small and medium-sized businesses try to optimize if we're looking at a future where, Google AdWords, Google Analytics, and Facebook are likely to be banned.

Is there gonna be a cookie-less world?

What is your perspective on creating a holdout across simultaneous site page tests so you understand and maintain a baseline experience?

What would be your advice on the trade-off between small A/B testing on single elements versus testing a redesign?

Can every company independent of its status quo and stage kick off experimentation?

Do you think it's possible to be a CRO specialist without including A/B testing in your operations?

Would you run an experiment if you knew you wouldn't have the time to reach statistical significance? Could you read the results based on the visual shift in user behavior and take action, knowing that the bottom line conversion rate was not 100% accurate?

What are your thoughts on the impact of a false discovery rate while calculating the impact of a conversion rate optimization program?

How would you suggest predicting the contribution of experimentation to revenue when multiple tests are being run on the same site?

Transcription

While we will deliver a demo that covers the entire VWO platform, please share a few details for us to personalize the demo for you.

Select the capabilities that you would like us to emphasise on during the demo.

Which of these sounds like you?

Please share the use cases, goals or needs that you are trying to solve.

Please provide your website URL or links to your application.