This website works best with JavaScript enabledLearn how to enable JavaScript

Back to all sessions

The Science Of Testing At Trainline

Learn how Europe's leading train app delivers the best ticket booking experience to its users through online experimentation.

Summary

Iqbal Ali, Head of Optimization at Trainline, shares insights into their scientific approach to experimentation in web optimization. Trainline, a global train ticket seller with over 80 million monthly visits, aims to foster a test-and-learn culture across its large team. Ali emphasizes the importance of expecting a low win rate (20-30%) in experiments, focusing on learning from both successful and unsuccessful tests. The process involves creating robust yet flexible methods to accommodate various teams, maximizing resource efficiency, and minimizing subjectivity.

A key aspect is formulating clear hypotheses to guide experiments, focusing on understanding user behavior and improving design and development. These hypotheses are structured to establish a cause-effect relationship between variables, aiming to prove causality and derive actionable insights. Justification for each hypothesis is tied to optimization themes like relevance and clarity. Ali also discusses the importance of prioritization in experimentation, aiming to maximize the velocity of launching and concluding experiments.

Key Takeaways

The company prioritizes creating hypotheses that clearly define the expected impact and justification, ensuring focus and relevance in their experiments.
The hypothesis formulation at Trainline involves isolating variables to establish clear cause-effect relationships, enhancing the understanding of user behavior and conversion levers.
Justifications for hypotheses are linked to proven optimization themes, such as relevance and clarity, ensuring that each test is grounded in strategies known to impact conversion rates.

Transcript

0:00

Pallav: Hi everyone, welcome to ConvEx where we are celebrating experiment-driven marketing. I am Pallav and I’m an Optimization Consultant here at VWO. Do you want to fix leaks in your conversion funnel? Then do check out VWO. I am really excited to have Iqbal Ali here with us today. He is head of optimization at Trainline. So, Trainline is a leading platform to book train tickets in Europe, and they also serve in 45 different countries in Europe and across the world. I’m really excited to have you here with us Iqbal. Iqbal, before you begin your presentation, I just want to inform all the audience here with us that they can go to the official ConvEx’s LinkedIn group and they can ask all of your questions there itself regarding the presentation. So, with that Iqbal I think we are good to go, and the stage is all yours.

1:00

Iqbal: Hi everybody! And so I’ve titled my talk Science of Testing at Trainline. I think you’ll become aware why. My name is Iqbal, and I’m the Head of Optimization [at Trainline], as Pallav has explained. So a little bit about me: I’m the head of optimization at Trainline, I’ve been experimenting for about five years now, I’ve got a background in UX and front-end development, and I’m also a bit of a process nut which will become apparent. So about Trainline. We sell train tickets worldwide, that’s about over 45 countries and the rest of the world. Just to give you a sense of the traffic that we have so, we’ve got 80 million visits per month. This will give you a sense of the conversion rate, we sell more than 204 tickets a minute. So, that’s a lot of tickets and selling. And our team: our team is a big 600+ people, and we’ve got 300+ people in the travel … as travel tech, that we call them, specialist and developers as well. So there’s a big team. Okay, so a little bit about this talk: hopefully, I’ll be giving you an insight into our experimentation process and I’ll be explaining why and how we try and maintain a scientific approach to experimentation, just kind of like why it’s important.

2:35

So what’s our goal? Well, our goal is to develop a process that can be rolled out across the entire organization and we want to develop a test and learn culture.

2:46

So we’ve got like I said before, there’s a lot of people at the Trainline and lot of teams each with different goals and responsibility. And so, our goal is to roll out a process that kind of fits everybody and also, obviously, to increase conversion. So, I think it’s important to note what our expectation is going into our any experiment is that we expect a win rate of about 20 to 30 percent. Inversely that means that 70 to 80% of the time we expect our experiments to not win. So, it’s important with those experiments are not winning that we’re still learning something from those experiments. And also in terms of looking at the win rate, 20 to 30%, I think it’s important to note that test philosophy is important so, we need to be running as many experiments as we can in order to gain as much conversion uplift as we can. So what are our challenges?

4:04

Well, we’ve got a lot, a lot of people so we want to try and develop a process that is robust yet flexible enough to deal with all of the different teams and their responsibilities. I think even though we’ve got a lot of people having still important to maximize the efficiency with which we use them, the efficiency of resources, it’s not necessarily just efficiency of resources of their personnel, but also in terms of traffic. And we want to remove as much subjectivity as possible from the process. So we want to be objective in the way that we prioritize and go about experimentation. And as mentioned before, you know, maximize the use of traffic: we’ve got a lot of traffic we want to try and utilize as much of that traffic as possible in experiments. And I think this is a really key point. We want to maximize learnings for effective prioritization.

5:10

So this is what keeps us […] this kind of like a process that helps us learn and develop new hypothesis. So I guess the keyword here is scientific – its objective, we want a process that’s repeatable, that’s reusable, we want a process that can help us achieve some validated learnings. And this is an overview of our process. So we start with the hypothesis, we prioritize, deliver, and by deliver I mean design, develop, launch the experiments. And then we have the analyzing phase where we just analyze the experiment, and hopefully from that analysis we get some learnings to then feed into creating new hypothesis, and better prioritize a hypothesis. So then that loopback is important. And each time we do that loop, we’re hopefully learning something new and we’re hopefully getting better. I’ve highlighted two of these steps in red. That’s because these are the areas that I’ll be talking about today.

6:31

So hypothesis and a little bit about them, and specifically how we write them; we have a very specific way that we write a hypothesis. But why are hypothesis so important? Well, what it helps us do is a good hypothesis will keep the focus on why we’re running an experiment, you know, it’s important to know with every experiment that we run exactly what are we expecting to learn from it, and why are we running this experiment exactly in the shape and form that we are. A good experiment as well, will help us determine the metrics we need to track. I’m a firm believer in that an experiment is only as good as the metrics that you’re tracking.

7:23

So then, your experiment helps you get some deeper insight into user behavior, even if the experiment isn’t a win. It also helps to focus our design and development. So by having a well-structured hypothesis, it just kind of helps focus designers into the reasons why we’re running an experiment, and then it helps them develop the design in that specific fashion. And also from a development perspective, it helps for everybody to know what we’re building. Because as well we’ll be able to build metrics that everybody would be able to input into, into the metrics you want to track.

8:13

And a good hypothesis, it determines what we’ll be learning ultimately because, based on your experiment design, which is going to be based on the hypothesis, it kind of comes back to this, you know, ‘what have we learned from an experiment?’ If an experiment, if a hypothesis is validated or not validated, we’ll have learned something from that process. So here’s a template of how we write a hypothesis. So it’s a very simple templates we just say, [changing or adding X] will [describe], and then we describe the impacts, and then we have what we call the [justification]. So there are two parts here:

8:54

There’s the first part here which is ‘changing and adding will’, and then we describe the impact, and then there’s a second part which is the justification. And I’ll be going into both of these parts in a little bit more detail and starting with the first part which is where we talk about variables basically. So our site is just full to the brim of variables. I think every site is full to the brim with variables and what mean by variables is that it’s stuff like clicks to a button, use of a particular ticket type in our instance, and also anything to do with page views and stuff like that. But the variables are not necessarily metrics. They could be, but they could also be stuff like the text on our site, button color, iconography, and then we’ve got the other variable, the other important variable of the other side, which is that dollar symbol and that’s our conversion rate.

9:57

So what we’re trying to prove with a hypothesis is that we’re trying to say that we’ve got these two variables. So we isolate these two variables and we got a predictor and the dependent. So when we isolate a predictor variable, what we’re doing is we’re picking something like some text or something like that on our site, and what we’re saying is that, if we make a change to this variable then we predict that it will impact conversion which is our dependent variable, and we’re predicting that that impact will be positive. So that in a nutshell kind of encapsulates the first part of the hypothesis. And what we’re trying to seek is to prove that changing one variable affects another – that’s ultimately what a hypothesis is. And by proving causality, we’ve learned something: we’ve learned, you know, that something is a conversion lever or that something works better than something else.

11:06

It’s also important to note that naming of variables, and explicitly naming our variables like this means that we know what metrics are important. It helps us know what to track for a specific experiment. So, here’s another example: we made a change to two predictor variables, so, maybe text change and maybe something else on the page. And what we still have is an impact to a dependent variable but, this time we can’t actually isolate what actually caused the impact. What we know is that both changes combined have caused the impact. Now there’s nothing really wrong with this experiment design, and we might purposely design an experiment this way, but it’s good to know what our expectations are with what we learned at the end.

12:02

So here is the thing again, so [changing or adding X] will [describe the impact]. We’ve broken it down into a predictor and the dependent variable. And so, on to the second part – Justification. And the justification is, you can see, we call it the how and the why. It is kind of like looking at the hypothesis and saying – what makes us think that this, this relationship between these two variables exist, and why it is in the direction that we think that it exists in?

12:45

So, why do we think that making such a change will impact our conversion rate in a specific way? And it all comes down to […] with justification, with optimization teams. So with over testing and over looking at researching other places that test, there are some common themes that we have, we call them optimization themes, that are known to move the lever in terms of conversion. So the stuff like relevance, clarity, confidence, reducing distraction and friction and adding some urgency and scarcity to pages. So, all of those things have been seen to move conversion, they’re kind of like conversion levers. So we try to relate justification with one of these themes and what this justification ultimately does is it helps us understand why this hypothesis exists in the first place and it helps us weed out weak hypothesis.

13:57

So by writing a justification, it actually really helps to validate, do we really think that these two variables are related. It’s, you know, you can easily weed out weak hypothesis, and you can easily Identify some of the stronger hypothesis.

14:19

So, let me look at an example here. This is our search results page. This is the old search results page so that page no longer looks like this. And in this instance, we’ve identified some, we have some quality insights that identified that wording ‘direct’ is a little bit confusing for users. So, that is some feedback that we’ve received. And a hypothesis or variation that we want to test is we want to change that wording from ‘direct’ to say ‘zero changes’ because we think this is clearer to users. So when we write a hypothesis, we write it like this on the search results page saying: ‘0 changes’ instead of ‘direct’ – so this is us declaring the predictor variable – will increase conversions – so that is a dependent variable – and we say that this will improve clarity and confidence – so this is our justification. And we’re tying it back to those optimization themes. We could also add here that this is based on qualitative insights but, what we tend to do is to try to keep the hypothesis statement as simple and compact as possible and move any other additional information into a separate section called background.

15:58

So we add in something, you know, what is the specific feedback that led us to disciple sister except. […] What this helps us do then is when we develop a hypothesis and we’re looking at prioritization, it helps us with our prioritization and this leads us quite nicely to the next section which is a prioritization. So it’s important to know what are our goals with prioritization. Well, our goal is to maximize velocity, and that means we want to maximize the experiment velocity in terms of launches, experiment launches, and experiment conclusion. So, we want to have as many experiments live as possible at any given point, and we want to launch as many experiments as possible, and we want to conclude these experiments as quickly as we can.

17:11

So, another point we want to do is we want to make it efficient use of resources, we want our prioritization to help us be more efficient with that useful resources. And we’re hoping that prioritization will keep us from being too subjective in the way that we prioritize certain experiments over others.

17:42

And that leads us to prioritization matrices. So, this is the system of ranking of ideas or ranking something by using a set of criteria which you score with, and what the criteria is could be different; there are lots of different types of prioritization matrices. It’s basically dependent on what is important to your specific goal and your team, and then based on that you can come up with some set criteria and then you can mark/score experiments based on those criteria.

18:25

So, what we use is something called PIE that’s something developed by The Wider Funnel, I believe. And we use a slightly modified version of PIE and also, it’s that this version of PIE is modified between different teams. I’ll go into more about what PIE actually stands for and what it means. But for now, it’s important to note that we’ve got the P, the I, the E, these are criteria, and we’ve got our experiments, and then we score these experiments we get a total score, and then we rank them in terms of what is the highest score, what has the lowest score. And that will help us prioritize the tests that have got the biggest impact in terms of ROI, in terms of the criteria that we wanted to score against.

So what is PIE? Well, PIE – P stands for Potential, I for Importance and E for Ease. So, going backward from ‘Ease’, ease is the ease of development and implementation. So, it’s the effort that it takes to develop an experiment and put it out into the world. 10 is classed as the easiest thing you can do, so maybe some kind of like copy change or something, and one is maybe the hardest changing an entire flow or conversion funnel.

20:01

And what ease helps us do is it helps us with the efficiency of resources. So efficiency in terms of development resources, and also like efficiency in legal resources. So, if there’s something that needs a lot of legal inputs then that’s something we all also count as the ease of getting a test out. Next up is Importance. So, importance is the volume of traffic exposed to your experiment. So, I think it’s an important thing to note, i.e., the volume of traffic exposed to the experiment because even though we might be running an experiment on a specific page, maybe not all traffic to that page is going to be exposed to that experiment it might be only 50% of that traffic exposed to the experiment. And we do is we’ve gone away and we’ve ranked all of that most trafficked areas of our site and we ranked 10, we scored 10 with a page that has got the highest amount of traffic and everything else is scored relative to that.

21:20

So if there’s a page that has got 50% less traffic than our 10 then I get to score 5 and so on. And what this helps us do, what importance helps us do is it helps us prioritize experiments which not only conclude fast, but will give us the bigger impact because we’ve got a lot of traffic, and so, by focusing our efforts on the places where we’ve got the most amount of traffic we should hopefully reach conclusion to our experiments sooner, and also we should be able to see some bigger impacts.

So, on to Potential. So far, it’s been very similar to Wider Funnel’s high scoring methodology, and here we start to diverge a little bit in terms of we do things in a slightly different way. Here we’ve got a potential which we break into two parts. The first part of potential is very similar to Wider Funnel’s process, we have where we’ve ranked all our pages in terms of the highest drop-off. So, for example, if you’ve got a page, we go from one page to another, if you’ve got a funnel we go from step one to step two, and the drop-off is about 80% then maybe we want to prioritize that page in head of a page that has got only a drop-off of 5%, for example. So, as I said, we’ve gone over, ranked all areas of our sites with what is the biggest drop-off area and everything else is well, it’s scored relative to that.

Similarly as with importance. And what this does is it helps us focus on the worst-performing areas of our sites and app. And so, it get […] and it’s going to help us in conjunction with importance, it helps us focus in the right places – a lot of traffic and badly performing areas of our sites.

So part two potential: what we also do is we score a potential for this specific experiment to succeed. So the reason why we have this point two is because we could be doing a lot of tests on one single page, and what we needed is something to drill down and isolate based on what we’ve hoped that the process went through, the hypothesis, we’ve hopefully got some strong hypothesis in our backlog now, so then we want to be able to look at these hypotheses and be able to score favorably tests that we think have got a higher potential to succeed. So, what we do by default is that we say because let me start by saying, this has a tendency to become very subjective. We’ve optimized this part of our process quite a bit. Previously there was a lot of discussion around this part of the process, but since we’ve optimized that process, we’ve come to this definition of this way of working out the second part of potential. We choose that one at the time of discussion, and also made us a lot more objective. And, what we do is we basically say every experiment has made it this far, that has gone past the hypothesis validation phase – we’ve written our hypothesis, validated it because we can weed out all the weak hypothesis using justification- so, you can see that if an experiment has got this far it has a fifty-fifty percent chance of winning and then what we do is we add or subtract based on evidence.

25:38

So, if we’ve got some quality evidence, some insights from user feedback, for example, or if we’ve got some past experience with previous experiments, or something in the analysis that […] kind of suggests that this has a high potential of winning. And what this does is it helps us prioritize experiments objectively, and more importantly it helps us, you know that feedback loop was talking about before, it helps us learn. So, every time we’re going through that feedback loop, we’ve got a test that’s being analyzed, we’ve got a win, and then now with the new bunch of hypothesis that we’ve got, when we’re scoring these hypothesis, and we’re crowd sizing them, we’ve got more evidence to suggest whether or not, better kind of tweaked out our ability to predict whether an experiment is going to be a win or not.

27:00

So, here’s an example. We’ve got an example experiment here: 40% of traffic to a specific page is eligible for this experiment. So, imagine this: we’ve got this page, 40% of traffic that land on this page is eligible for this experiment. That means maybe specific segments. So, in that case, maybe, I don’t know, a single searching through, return search or something else that basically means that only 40% of traffic to a specific page is exposed to the test conditions. And where there is no previous history or evidence, so this is a brand new hypothesis, we don’t have anything to back up or any qualitative or quantitative insight to kind of predict where this could go. And where the experiment is easiest to implement. So let’s say it’s some kind of copy change. So, this is an example process that we go through with our PIE. So in this example, sort of, start with the P – that’s our middle line where there’s no previous history or evidence. So, imagining this page, we’re focusing up to this page, this specific experiment is maybe focused on the page with the worst drop-off, highest drop-off. So, that’s the page with a score of 10 for the first part of potential. So, since we’ve got no previous history or evidence for this test it has a fifty-fifty percent chance of succeeding so, therefore we give it a score of 5.

28:44

And in terms of importance, they said 40 percent of the traffic is eligible, is going to be exposed to test conditions. And for this example page, let’s say the score is 6.2. So, what we do is we take 40% of that 6.2 and that gives us the score for I. And then for ease – 10, it’s just some copy changes. So this is what we do – we just kind of sum of them all up and then we get a PIE score at the end. It’s important to note that, like I said in the beginning, we’re a big team, there’s lots of us, and not every team uses the same criteria in the same way and some teams also have different criteria that they’re marking against. The PIE scoring that I showed is just a classical example, it’s the starting point. From that point onwards, it’s quite flexible. The point is that we have a certain criteria that we know is important to that team’s goals and what it is that they wanted to test.

30:03

So he’s our overview again: So hypothesis , prioritize, deliver, analyze and then we’ve got that loop. So, like I mentioned before, hopefully, each time we’re doing that loop, we’re learning something new, it feeds into more hypothesis, and then ultimately it helps us with this win rate of 20 to 30%. So hopefully, it keeps getting us win rates in the high region of that, and hopefully, means that from each experiment that we’re running, the whole team’s learning something new, something interesting, something different, and all of that feeds into new hypothesis, and we have a backlog of experiment ideas that’s really healthy.

30:57

And so what we hope this will give and what we find it gives us is a robust yet flexible process. This kind of process helps us maximize the efficiency of resources. It’s helped us be as objective as possible in the scoring by saying that by default everything’s got 50/50 unless there’s actually objective some data to suggest whether a test could be a high potential to win or low potential to win. And it helps us maximize the use of traffic as well. And ultimately, like I said, the main goal of this is to maximize the learnings for effective prioritization. Soif the process is working then each time we’re doing that loop, we’re learning something new and with that, kind of, processes is getting better.

32:03

Another important point is, and I think I touched on it a little bit earlier with the high scoring, we optimize our process. So, by knowing what it is is that we’re wanting to achieve with our process, we’re able to put some success metrics against it and then what that helps us do ultimately, I think, is to optimize the process to get better at it and hopefully that leads to more wins and more interesting experiments. Thank you.

32:38

Pallav: Thanks a lot Iqbal for a wonderful presentation. I absolutely loved your minimalistic slides and it was really simple to understand all the things which you said in the presentation.

Iqbal: Thank you very much.

Pallav: So we would be starting up with questions, I do have a few questions to ask you. So I was wondering in the entire presentation you mentioned about the process and optimizing that process. So, I was wondering if there are any other way in which you have optimize this process beyond what you have already shared with us?

Iqbal: Yeah. So there’s some other steps with the delivery and the analysis. So, we’ve rolled out or with the analysis section, we used to use Frequentist statistics to determine test wins, we’ve now moved to Bayesian analysis. And we’ve implemented a process or a way of working with Bayes called “Expected Loss” which is, I think, related to your data science, engineer over at VWO Chris Stucchio, wrote a paper about which was quite interesting. And yeah, that’s been really helpful in terms of trying to get an easier way for people to understand experiment wins, an easy and more business-centric way of determining test wins rather than the scientific kind of frequencies p-values. And also there’s, in terms of delivery as well, as like just in terms of development processes just optimizing those, tweaking those to get better in that section as well.

Pallav: Yep. So there was a slide which said that this process would not be the same for every team, right? So I was wondering if the process which you are using at Trainline, how has that changed people’s understanding of user behavior or experimentation in general?

34:52

Iqbal: I think it’s made people more aware that not everything is going to be a win. I think everybody goes in with an expectation that oh, we’ve got to win rate of 20 to 30%, yeah, we can get better than that. And so this and looking at specific, sort of, experiments and saying this is definitely a win, there’s not a shadow of a doubt that this is going to be a loss. But, in actual fact, once you launch an experiment, there’s so many variables and conditions out there that you really don’t know or cannot predict the outcome of an experiment. So, I guess that’s another thing that we’ve learned and others who’ve done the process have learned. And also, it’s helped us, kind of, fine-tune our, sort of the way that users use the site, our understanding of how the users use the site, what their behaviors are and stuff like that, and how that might maybe, differs from what they say in terms of the user feedback that they present.

Pallav: So you talked about the complexity of concluding a campaign, right? So, I had a lot of occasions in which I created multiple goals, conversion points or which you could call as metrics. And what happens is that I created 3 metrics and sometimes 2 metrics gives positive results in favor of our hypothesis, and 1 metric actually gives negative result in opposition of our hypothesis. Then, do you have a process regarding that- how do you conclude a campaign like for this example?

36:39

Iqbal: So in that example, what we look at is say, we’ve got a number of metrics and that some metrics are positive, some metrics and negative. Ultimately there’ll be a success or critical metric that we’d be interested in, and if that critical metric is met then we base our decision whether or not to roll an experiment into production. Like for example, if it ultimately is a win then yes, we could roll it but, maybe like you’re mentioning. the hypothesis is not quite correct in terms of we made this change. But, it’s not because we increased clicks to something that increased conversion, but maybe because we decreased clicks to something and then that increased conversion. So, but ultimately, the next step would be in a way we would do it is that it’s ultimately a win, we drove it out, we come up with new hypothesis in terms of this is a valuable learning that we’ve had and generated for a brand new set of hypothesis in which to test and to iterate on that experiment. If the experiment is ultimately a loss then again, we do that iteration and then we’d iterate on that feature or on that thing just to see what it was that happened there that caused such a such an impact or such a behavioral change.

Pallav: Hmm. So talking about hypothesis in general, do you create a hypothesis device-specific? For example, when you took the example of the experiment which you ran in which you changed the copy. So, let’s say that campaign actually wins for desktop and losses for mobile devices, right? So then how do you conclude it? Do you roll out across all the devices or do you take a decision on the basis of data itself?

Iqbal: Good question! So, it depends on the outcome and the value of the impact on both the desktop and the mobile. Sometimes there have been cases where we have separate applications for mobile and desktop. So, the impact is never as […] we don’t have to deal with things in that kind of saying sort of scope. […] If tablet is a loss and desktop is a win, it ultimately depends on overall what the impact is. And then there’d be some deeper analysis that goes into that in terms of what is the overall impact, and maybe we’d only roll it out to desktop only, or maybe only roll it out to tablet only. And what we did is we turn and look at the cost of which, you know, starting to diverge the two platforms perhaps. But ultimately, maybe as well, what we would do is we would probably feed into more hypothesis, and seeing if we can have a solution that actually works for both. So what we tend to look at is, when we have the happy path where every segment is positive and then we can roll out a test, and then we have a slightly less happy path where we have certain segments that are lower based on the importance of those segments – we might decide to look back, do another experiment before we can roll something out as a feature. So there exists a certain loop basically, that is all important: analyzing and creating new hypothesis based on that, and seeing if we can optimize what we’ve already optimized.

40:41

Pallav: Okay, you mentioned PIE in your slides. So what do you do when the PIE score is exactly the same for two different hypotheses? Then, how do you prioritize them?

Iqbal: I guess it comes down to whichever one is slightly … there’ll be one that’s maybe slightly less friction to get out the door: maybe one that doesn’t need so much design work, maybe something that can be done by developers. You know, there’d be certain instances even if you got two that are scored exactly the same, there might be some differences in them, or there might be some kind of business incentive to go with one or the other. So, the prioritization or PIE scoring, we don’t stick rigidly to the order in which they’ve been they’ve been ranked.

41:38

It’s just kind of indicative so we can see like, these tests are biggest ROI going from highest to lowest, then there’ll be stuff that the business wants to roll out – some things that might be strategic, that maybe you want to roll out in order to enable other things to happen. So, the reasons why we don’t obey the ranking, but at least we have some visibility in terms of why we’re doing certain things, and visibility that certain test is actually has got a higher ROI. So specific reasons why we might not decide to go with one, or we might decide to go with one or the other if two are the same.

Pallav: Hmm. So I believe that it would boil down to how easier it is for your team to develop a campaign. Do you actually have a dedicated team for testing?

42:39

Iqbal: We used to have a dedicated team for testing but, these days the goal is just to have all the teams testing. So, testing process is baked into every single team now, it’s like we want to have a test and learn culture across the company rather than having teams in a silo doing their, doing their experimentation.

Pallav: Great. So I think this is one of my favorite questions, which I wanted to ask you. So we have been facing a lot of challenges while running experimentation. So I would want to understand from you, what is your biggest challenge while running an experimentation program on a company like Trainline?

43:30

Iqbal: So I think apart from the operational processes, like getting people on board with testing, rolling out a testing culture across the company, aside from that, I think the biggest sort of challenge has been of data integrity of experiments. So, making sure that every experiment that we run has got a high data integrity. So yeah, because there’s a number […] so we have a kind of like a score to say how many tests have we launched, do we need to relaunch, or have just been invalid. So, we wanted to try and keep that rate as low as possible. And yeah, it’s a constant. That’s a real challenge in trying to make sure when so many teams[are] testing, how do we maintain that data integrity so that when you’re analyzing the experiment, it is valid data that you’re looking at.

Pallav: Okay, just a fun question. Are you are reader and how many books are you reading right now? And what are you currently reading?

Iqbal: So currently, I mean, I’m reading an audiobook, if audiobook counts? So, To Sell Is Human by Daniel Pink. It’s a really good book, it’s kind of like going through the point that every one of us is a salesperson and what the techniques and how sales selling has changed over the years and stuff. And also Naked Statistics, which I’ll be honest, I’ve been reading for a long time. But it’s a really good book, Charles Wheelan. And it’s actually an enjoyable read – wouldn’t think so based on the title, we think it’s about statistics, but he actually makes it really interesting. So that’s also another book that I’m reading.

Pallav: Those are great books for sure. I have also tried reading Daniel Pink. Thanks for answering all of our questions. That was a wonderful presentation. Just one last question, which we would want you to let us know: how can audience connect you and reach out to you?

Iqbal: So, LinkedIn is probably best. You can search for me – Iqbal Hussain Ali on LinkedIn. Twitter also @IqbalHAli. So yeah, just those two places.

46:15

Pallav: Great! Thanks a lot, Iqbal. I think that was a wonderful presentation, and that’s about it for our questions.

Speaker

Iqbal Ali

Head of Optimization, Trainline

246 Trainline

Other Suggested Sessions

UX Fundamentals for More Conversions

Join Karl for a deep dive into 5 crucial UX principles and 2 transformative marketing questions, blending humor and insight to tackle persistent online business flaws.

Personalization in Action: Maximizing Impact with What You’ve Got

Jay Lansdown reveals how New Look leverages user data to create personalized experiences, driving customer engagement and results through strategic, resource-efficient approaches.

How AT&T Turns Data into Gold

Explore how Brianna Warthan from AT&T integrates experimentation into product development, accelerates feature launches, and leverages AI to drive product success.