Shop, but Don’t Drop: How ZALORA Turned Churn Around With A/B Testing
Zalora's data-driven anti-churn strategy: May Chin reveals how micro-segmented flash sales transformed user engagement and boosted conversion rates by 6-15%.
Summary
May Chin, Head of Product at Zalora, shared insights on tackling customer churn using hyper-personalization during ConvEx 2024. She outlined Zalora's churn definition—users inactive for 12 months—and revealed that 39% of their user base met this criterion annually. Research identified key pain points: an overwhelming number of generic campaigns, complex voucher mechanics, and lack of personalization. To address these issues, Zalora adopted a multi-bucket churn segmentation, targeting users at different risk levels with tailored approaches.
A notable solution was the micro-segmented flash sale, a time-sensitive, personalized discount mechanism improving customer retention while maintaining profitability. A/B testing confirmed a 16% conversion rate uplift. May emphasized the importance of addressing real user pain points, leveraging A/B testing, and fostering stakeholder collaboration.
Key Takeaways
- Hyper-personalization drives retention.
- A/B testing is essential for data-driven decisions.
- Segmented targeting maximizes ROI.
Transcript
NOTE: This is a raw transcript and contains grammatical errors. The curated transcript will be uploaded soon.
Oliver Engelbrecht: All right. Welcome everyone to another session here at convicts 2024. Uh, my name is Ollie. I’m responsible for the dark region here at BWO.
And it’s my pleasure to introduce to you our guests for the session. Uh, May from Zalora. She is the head of product and she has brought us a fantastic case about her hyper personalization. May, it’s such a pleasure to have you here.
How are you?
May Chin: I’m doing good. Thank you so much for having me.
Oliver Engelbrecht: Wonderful. So I’ve already teased a tiny bit about what your presentation is going to be all about. Do you want to give us just a very quick idea, um, about the case that you’re about to share with us?
May Chin: Sure. So what I’ll be sharing with you guys today is really more of a story than anything else, but it is a real life story and it’s a story about how we in Zalora solved a fundamental churn problem that was significantly eating away at our top line. But more on that later, I’ll be walking you guys through the exact chapters we went through in this story and how we ultimately arrived at a solution to it through AB testing.
Oliver Engelbrecht: That’s wonderful. And that’s a really, really exciting case. I was already able to have a little peek at it. So without further ado, the stage is yours, May.
Um, please share your insights.
May Chin: Awesome. So thank you once again for having me and I really hope you will find what I have to share today. Genuinely insightful. So as I briefly alluded to earlier, the topic I’ll be speaking on today, as you can tell from the rather click baity title is how we in Zalora identified and solved a fundamental term problem that was really depleting our top line.
And essentially what this meant was we were losing more and more active shoppers every single day. And it was only through In depth user research and thoughtful A B testing that we were able to chip away at this problem. But more on that later. As first, I would like to quickly introduce myself.
So as all the already mentioned, my name is me. I am the head of product growth and analytics at Solora. And throughout my entire career, I have strategized on growth strategies from multi million user products across a variety of different industries. such as EduTech, HealthTech, and of course, right now, e commerce.
I have then joined my current company around four years ago to help launch a new experimentation and analytics business unit from scratch. A little bit about Zalora as well, in case some of you are not familiar. Zalora is Southeast Asia’s largest fashion e commerce platform. with over 12 million monthly active users, and we house many different products across many different brands and categories, such as fashion, shoes, accessories, sports, luxury, et cetera.
We are also a proud member of the German based global fashion group, which accrues around 1 billion euros of average revenue and 42 million orders in a year to give you an idea of the level of scale we operate at. So now that we’ve gotten our introductions out of the way, we can move on to the more interesting part, which is the story of churn that I promised to share with you earlier. And this will be a story I tell in four parts. First, I will be explaining the churn problem scenario that we were grappling with.
Second, I will be explaining the research methodologies we applied in order to uncover the root causes of this problem. And third, I will be sharing with you how these identified pain points were then mapped to viable solutions that were administered via A B tests. And finally, every story needs a moral, so I will be closing off with that too. So first of all, the problem scenario.
But before that, I’d first like to concretize this problem. A little bit more by more clearly articulating what exactly it is. I mean, when I say churn, and this is of course, a definition that is highly specific to Zalora. So for us, we define churn as any user who has not made a purchase for more than 12 consecutive months.
They would then be considered as lost to us. And only then would stringent recovery treatments be given to them in the hopes of changing their minds. Now, you might already be thinking that this is a somewhat problematic approach, which is too much of a lagging indicator. And I actually do agree.
But we will get back to that later. So now that we know how Zalora specifically defines term, we can then size the problem a little bit more. And to quickly visualize for you the sheer enormity of this problem. This is what our typical user base looks like at the end of every year.
We would have around 39 percent of our user base meeting the churn criteria I described earlier, and only a remaining 62 percent remaining as active shoppers. Now, isn’t that crazy? And this to me is really emblematic of the heightened competition in the Southeast Asian e commerce space, as well as our failure to meet our users needs, wants, desires, and therefore our failure to retain them effectively. Somehow it gets worse.
Looking further into our quantitative data, we can see that a majority of users who churned did so after only two orders, and usually it only takes around 30 days or less for the majority of users to make those two orders and then never purchase with us again. So this means that the decision to churn. actually occurs extremely quickly. Okay, so now that we know that most users generally perform two orders before churning and that the decision to stop purchasing occurs in 30 days or less, we then asked ourselves, do these users still come back to browse on Zalora after those 30 days, even if they are not actually purchasing anything?
And what we found was, sorry to say, somewhat disheartening. What we found was that After those 30 days, most of them already stopped visiting Zalora forever. in just a week and a half. This means that most of our users become unreachable extremely quickly after their last order with us, which greatly limits the opportunities we have to change their minds.
Not all is lost though, as although the dismal situation I described applies to the majority of users, there is still a significant long tail which exists. And namely, a long tail of users who still stick around for quite a bit to browse on Zalora, even though they are not actively purchasing. To be exact, we found that 57 percent of churn users still have active sessions with us for up to six months, and 34% For up to a year. And so this presents an opportunity to target these long tail users with creative recovery features.
Now, what does all of this mean in aggregate? So first of all, it is screamingly obvious that our current business definition of churn is too much of a lagging indicator. We should not be waiting for 12 months of no purchases before taking serious action. And we should definitely be raising the alarm bells for the need of user recovery much sooner than 12 months.
Second of all, this also tells us that the decision to churn occurs extremely quickly, much more quickly than we had initially anticipated. In fact, the numbers show that this can occur in just 7. 5 days. post the user’s last order.
It means that it will be especially challenging for us to build predictive measures of churn, as there is not a significant divergence in behaviors between soon to be churned users and still active users up until the point of the user actually churning. And so this of course makes predictive modeling more challenging. Finally, And importantly, as it is somewhat of a light of the end of the tunnel, there is still a significant long tail of users who stopped purchasing, but still have relatively frequent browsing sessions on Zalora. So we can still reach these users somehow in the hopes of changing their minds.
So this brings us to the research chapter of my story. Now, better equipped with a deeper understanding of the problem scenario, we were ready to do some root cause, root cause research. First of all, the approach we took was a fairly simple one, starting with the identification of millions of eligible users who were previously active shoppers, but had now gone without an order. for several months.
However, they were still browsing somewhat actively. This is essentially a representation of the long tail users I mentioned earlier. We also took care to have an adequate representation of these users in direct proportion to our geographic presence in our six key markets in Asia. We were then able to get thousands of responses from these users to a series of in depth perception evolution questions, such as when you made your last order on Zalora, why did you do so?
And does that pull factor still exist in present day? Why or why not? Or what was your favorite thing about Zalora when you were still shopping actively? Does this favorite thing still exist today?
Why or why not? And etc. And finally, from the tons and tons of qualitative pain points we were able to extract, we identified commonalities, group them accordingly, and prioritize them according to the perceived intensity of each pain point. And in summary, what we heard from these, from these data points was a resounding cry for help.
Feedback from our users, such as them saying there are way too many campaigns happening. It’s overwhelming. And also these campaigns feel way too generic and they don’t feel exclusive. Even when they do find a campaign that they like, the voucher code application at CART is just too confusing to the point where they get overwhelmed again.
And they also don’t know how to apply all of their multiple desired voucher codes to arrive at the best deal combination, et cetera. And already from this feedback, we can see three clear points shining through. First was that we had way too many campaigns and promotions running at any given time. Who would have ever thought that having too many campaigns could be a bad thing?
But the research doesn’t lie, as our users seem to find the sheer volume of them very overwhelming, requiring too much mental effort to comprehend. Next was that even for those users who were savvy enough to sift through our many campaigns to find something they might like, there was still a perception of genericness, a feeling of every user is seeing the same campaign anyway, so how is this special? Or rather, a feeling of a lack of individual exclusivity. And finally, our users felt that our voucher codes itself We’re too complex to understand with a large volume of campaigns running at any given time.
Users would need to combine multiple voucher codes in certain permutations in order to arrive at the best possible discount. And the reality is very few do that. actually have the mental space or savviness to do this, and so they usually give up, drop off, chalk it up as just another missed deal, and call it a day. And now, with a clear articulation of the pain points our users were feeling, we were able to more confidently map this to a tangible solution.
First off, even before thinking of any user facing features, we had to fix the fundamentals, which in this case was the very foundational approach we use to define churn in Solora. As I mentioned before, we define churn as those who have gone for 12 consecutive months without placing an order. And this is clearly too much of a lagging definition, considering how quickly users end up lapsing once their last order has been made. So what we did was to take a more segmented and actionable approach.
Even in the short term, we first identified a segment of users who have previously made two orders in the past, as this indicates some kind of prior activeness level on our platforms. We then divided this segment of users into three different buckets of varying churn. The first bucket would be users who have not purchased for 3 5 months since their last order. The second bucket would be users who have not purchased for 6 8 months since their last order.
And the final bucket would be those who have not purchased for 9 11 months The idea is that as the buckets progress, users become closer and closer to being lost completely. And this approach was very suitable as an MVP, as first of all, it is simple, easily explained, and could be built very quickly, whilst we work on more complex predictive measures of churn in parallel. Second, this approach also gave us the benefit of fine tuned control over the corresponding user experiences that were making use of the segmentation logic. For example, for Bucket1 users, we can see that they are showing initial signs of lapsing, but we would still be able to hold off on less aggressive discounts for them as these users might not yet need so strong of a push to be recovered.
So we’d be able to more closely fine tune for guardrails such as profitability per user to ensure our unit economics were not unnecessarily harmed. Third, this approach also had the benefit of being tightly coupled with our existing company wide reporting of churn, which considers users to be churned after 12 months. As I mentioned, for example, we know that the impact of any treatments given to bucket three users would only be observable 30 to nine days after the fact, because that is when these users would have otherwise churned in our existing reporting. And so, with these foundational improvements in how we think of Churn ready to go, we were now ready to think about features for the end user experience.
We came up with around 17 new ideas overall, all of which are currently in varying stages of launch as I speak, but that’s a story for another day. The one I’ll be sharing with you guys today is our micro segmented flash sale. And I think it’s a really interesting idea to share as it was designed to directly address the churn pain points I described earlier. So what is a micro segmented flash sale?
It is essentially a feature which shows the users a time sensitive special discount for only 72 hours. And it would be triggered at a unique time of month, time of week, time of day for. every single user. So it would be impossible to predict when it would next occur for you.
With this flash sale, users will get to enjoy a simple, no frills, X percent off stackable voucher code. So this really helps to reduce the mental effort required to understand how exactly to make use of this voucher. And also the intensity of the X percent discount given by this voucher gradually increases across the buckets of churn risk that I shared with you earlier. Importantly, this voucher code would only be applicable to items already in the user’s carts and wishlists, as well as algorithmically recommended products to pair with these cart and wishlist items to encourage outfit building.
And hopefully you can already see that. So, as you can see, how powerful and beneficial this feature is, as it was really designed specifically to shut down the pain points observed from our research. Namely, this is an extremely time sensitive feature, so users feel a strong desire of having to act now in that moment. It’s also, as I mentioned before, a very simple campaign mechanism, which entails just a simple voucher code which stacks so that users don’t have to worry about complex voucher permutations.
There was also no unnecessary harming or profitability per user as higher discounts were only given to those who truly need it, only given to users who were showing a higher risk of churn so we’d be able to control for profits as a guardrail more effectively. And also, you can already see how this initiative was highly personalized on the individual level to what they already have in their carts and wishlists, as well as algorithmically recommended items that users can use to build outfits. So already you can see How this feature really aims to directly address the observed pain points. And
here you can also see a quick visual of how this flash sale feature was embedded and communicated consistently throughout the entire shopping funnel. And the goal was really to make it feel more like an app takeover and less of an isolated campaign, which felt better. Artificially latched on to the user experience. So this, in my opinion, made for a much more cohesive and complete journey.
As any, as any discipline decision maker would do. We didn’t take our research and ideas at face value and took the effort to launch this flash sale as a controlled AB tests across our three. three turn risk buckets and how we structured it was as follows. Obviously we had a control group which saw no inkling of this flash sale feature at all.
They would simply see our normal app and the usual ongoing campaigns. We then of course had a variation group which saw the flash sale as the very first thing on the homepage alongside the other BAU ongoing campaigns and the churn bucket user base was split equally. 5050 across these control and variation experiences. So we launched this AV test and already in the first week, we identified a single data point which caused quite a bit of concern internally.
Namely, that only 2 percent of users were choosing to apply the special flash sale voucher code at checkout. This behavior was interpreted by some to mean that This feature was ultimately a failure and that we should shut it down immediately because it’s clear that users purchase intent was not being shifted in a meaningful way by this flash sale. However, us in the A B testing world know that we should never pause an experiment prematurely. We should let it run and let the final results speak for themselves.
So let it run. We did. And after a couple of weeks, we’re able to arrive at a statistically significant conclusion. And what we saw was that although the flash sales voucher code usage remained low, this ultimately didn’t matter because we still saw a net positive outcome in terms of the final conversion rate, a predicted 16 percent higher conversion rate to be exact.
And this enabled us to infer that. It was a reflection of our own risk tolerance. The discount percentage we had set for the flash sales voucher code was simply too low and therefore not attractive enough. So users would still end up discovering more attractive vouchers upon checkout.
However, ultimately, as I mentioned, this, didn’t really matter in the grand scheme of things because the flash sale was still highly effective in catching users initial attention and garnering that initial purchase intent in the first place. And without an A B test, we would have never been able to draw this conclusion and would have instead deemed this idea to be a failure. Instead, thanks to having A. V.
testing in our arsenal, we were able to draw the correct conclusion that this feature was actually a success with a net positive outcome for us. So we are finally at the end of my story. And as I mentioned, every story has a moral to it. That’s what I’ll be closing off with today.
So moral number one is that personalization on the individual level is now Table stakes is not enough to launch a series of generic same for all campaigns and call it a day Users are expecting curated experiences now more than ever and not providing This will harm your retention curves at some point as users move to Other competitors who are providing tailored experiences. Second, is that every single feature you launch needs to address a pain point which actually exists. Instead of just being a feature for the sake of being a feature. You will need to meticulously build upon user feedback, or as I like to call it, cries for help.
as a key input. And remember, it’s often the long tail users who have trouble being heard, as most companies tend to average out numbers, behaviors, and feedback, meaning that key outlier behaviors, which often have the highest potential of moving the needle, get lost, and delivering based on averages lead to average experiences. Finally, remember that it is only Only through AB testing that you are able to assess the validity of any given idea with clear eyes. Even in the story that I shared, how we not leveraged AB testing, we would have been distracted by extraneous variables such as the low voucher code usage and would not have been able to identify the The power of this feature as a strong purchase intent driver.
So that’s it for me today, guys. Once again, thank you so much for having me here and I hope you found some value in that story as you are partaking on your own AB testing journeys. Thank you so much.
Oliver Engelbrecht: Wonderful, May. Thank you so much for these insights. I think it’s a really great case of people on a team like yours, looking at data and really just drawing really smart conclusions from it. Of course, with everything you’ve shared, there’s a couple of questions that I came up with that I would love to hear from you about.
Number one is you mentioned that you already have quite a backlog of things that you are planning to do on this topic in the next couple of months. Um, do you have any favorite approach that you can already talk about of what’s going to come next?
May Chin: Sure. So in terms of what coming. Well, what is coming next is always a balancing act between a few deciding factors, the first of which is, of course, the projected impact, which is very obvious, um, or the projected minimum detectable effect for each feature. And what we are really controlling for here is to ensure that the projected uplift is more than enough to account for the means.
and technical cost of building this feature. So that’s one variable we always look at. The other is also, of course, I think more of a subjective one. We also tend to prioritize ideas more highly if the pain point corresponding to that idea was overly represented in our user research.
So are users feeling a particular friction point or pain point disproportionately? And do we have any ideas in our roadmap? That we are confident could alleviate this pain. So this is more of a subjective variable, but I think one that one that’s important to consider as well.
So we have around, I think, 10 to 15 more ideas set to be delivered till end of year. Um, so really excited to see how those perform.
Oliver Engelbrecht: Okay. If you say I’m to end up here, I can imagine there’s some Christmas heavy ideas in there as well, probably. Um, so when you were implementing all these ideas and changes, I feel like you would have had to involve quite a bunch of stakeholders and convince some stakeholders of certain things. Because one, you were kind of changing the definition of what churn looks like, at least for your team and your purposes.
Um, how did you do that? And who did you have to involve? Uh, On this to be able to go ahead.
May Chin: So in very simple terms, we had to pick our battles. So this of churn definition was definitely a huge point of contention. So we apply the same definition, not just in our company, but in all of our sister companies, as well as our parent companies. So the way that we positioned it was to frame it as we’re not really changing the definition per se, but what we are giving you is a parallel leading indicator to the official definition.
So what we. What we really took care of was to make sure that we communicated in a way where it didn’t seem we were trying to change everything just for the sake of it, but to instead frame it as more of a supporting, um, enhancement to the existing reporting that already came about in terms of stakeholder collaboration. I’m actually really proud to share that this flash sale idea and the other, you know, 10s, Tens of ideas that we have upcoming. We’re all co created with our business stakeholders, such as those from our revenue teams, marketing teams, and CRM teams.
All of these ideas were actually born from collaborative workshops. So to even attribute this idea to a specific individual, I wouldn’t be able to tell you who came up with it because we literally have 10 or 15 people, um, sharing their ideas collaboratively on a call. And this I think made the justification and buy in a lot easier because everyone already felt like a co creator from day one. So this really helped things to move on much faster as well.
Oliver Engelbrecht: That’s a fantastic approach. And I would imagine that also helps because you would have to get approval on doing flash sets on the first place, because of course it’s in some stages, at least it does impact the bottom line, right? So if all those teams have been involved, that would be an easy sign off Perfect. Um, what I would also be really interested in, you are testing very deep in the funnel at the moment, uh, with people who have already purchased, uh, do you think that these kind of hyper personalization ideas and testing ideas could also work much earlier in the funnel, like as early as someone keeps coming to your page, you can see them coming back, but they simply haven’t purchased for the first time yet.
And you want to nudge them towards that first purchase.
May Chin: Yeah. So the short answer to that is absolutely. And I think you’re kind of calling me out a little bit. And the reason why we predominantly focus on bottom of funnel personalization is because it’s simply easier when it comes to the top of funnel.
Especially with new user acquisition, you inherently deal with a cold start problem where personalization is almost impossible to do because you know very little about that user coming into your platform. So you would have to leverage more complex mechanisms such as lookalike audiences based on, um, Similar behaviors with your existing user base and also multi armed bandits where you might not know what this new user wants to see but the bandit will help you find that out over time. It’s definitely more complex though and it’s very much, um, very much an area we want to devote a lot of energy into up till end of year.
Oliver Engelbrecht: Sounds great. I mean, looking at your experimentation program as a whole, um, what you’re doing here is something that’s very, very advanced in my opinion, looking at all the online shops out there. Um, how did you get to this point? Um, what was the journey really of building an experimentation program that can do things like that, especially involving all these different teams without creating additional friction?
May Chin: Yep, so we Pretty much started our A B testing program almost from scratch around four years ago. We were running some level of A B testing back then, but it was usually on a fragmented ad hoc basis. And so when I first joined the company, I was the first hire in this product growth unit and had to build up this entire program from scratch. The approach I was taking back then was That was once again to really pick my battles and to also focus on the lowest hanging fruits, ideas that could be validated and tested extremely quickly so that our inside generation throughput was as maximized as possible.
And this achieved a few things. First of all, it helped to justify further investments into my team very quickly. And second of all, it also helped a very quickly answer burning company questions that have been thrown around for years, but no one providing answers to them. So that was, I think, a really good and leveraging way to get started.
So from there, of course, as we gradually built the legitimacy of this team, we were able to take on slightly more complex, bigger problems. Um, so the complexity of what we were working on grew in direct proportion to our team size. And if we were to fast forward to present day, we now launch around 100 to 200 experiments every single year. A quick fun fact I want to share is that our velocity has slowed down by quite a bit compared to last year.
It has actually halved compared to a year ago. And this was a very conscious decision that we were making. Back then we were optimizing purely for velocity, and now we’re instead optimizing for our AB testing success rate, which means that a lot more upfront work needs to be put into user research or quantitative data gathering, which can slow down your pace, but it should improve the quality of your AB testing inputs, which I think is a natural progression for any company as their experimentation program matures.
Oliver Engelbrecht: That makes perfect sense. Absolutely. Just one more question on that, because it’s something you’ve mentioned repeatedly, and I really love this approach is picking your battles. What is your personal.
I don’t know, matrix or approach to figuring out which is a better worth picking.
May Chin: So first off, um, one very simple guiding principle that I always have in mind is to not attempt to change the equivalent of the laws of physics in your company. There are certain immovable variables that no matter how much justification you try to get simply will not change. So it’s more, it’s more a matter of trying to work your way around it instead of trying to take it head on, which I think can often be very tempting to do. So I, even now I still need to constantly remind myself the, the other deciding variable that I keep in mind when it comes to these, these blockers and internal friction points is to see if the other departments or teams would genuinely benefit from my proposed suggestion.
If my proposal would only benefit my own team, it makes it a little bit harder to justify. And it also comes across as a little bit selfish in the grand scheme of things. At the end of the day, all teams have the same goal in mind, which is to help the company succeed. So if maybe I lose this time round and you win, That’s completely fine because we are still working towards the same bigger picture.
So it’s not to say I’m able to approach it perfectly every single time, but this is the general thought process I try to go through.
Oliver Engelbrecht: Sounds great. Thank you. That was really quotable. That’s when we should put on a little LinkedIn image, uh, after this.
Um, perfect. So that’s just one final question that we ask with every single session that we do here, and that is. a recommendation of a book that’s really inspired you lately. Um, doesn’t have to be work related, can be work related, of course, but is there something that you’ve read lately that you would love to share?
May Chin: Yeah. So to come to mind, I’m going to start with the more boring one. The first one is a very short ebook called the mom test. I’ve been reading it since the start of my career.
It essentially tells you how to position and frame your user interviews in a way to get actual honest feedback from them. And I found it very helpful. Um, I first read this book close to 10 years ago, actually. The second book, which I think is slightly more interesting is, um, by an author named Ted Chiang.
He’s not that well known, but I highly recommend him. He has an amazing anthology called Exhalation. Um, it’s an anthology of short sci fi stories and I promise you reading this book is completely life changing. So I highly recommend it.
Oliver Engelbrecht: That’s great. I mean, I have read the mom test. I was really impressed by it as well. Um, I think I’ve read it twice by now as well, but it sounds like you’ve read it even more often.
Um, and sci fi is always good. So I’m going to check that one out. Thank you so much for all your insights today. May was a real pleasure listening to you and talking to you and to everyone who’s listening, thank you so much for tuning in and enjoy the rest of the sessions.
May Chin: Thank you so much.