VWO Logo
Request Pricing
Dashboard
Like this post?
Read our in-depth guide to A/B Testing
11 Min Read

Multi-Armed Bandit (MAB) – A/B Testing Sans Regret

Shubhankar Gupta
Shubhankar Gupta
Experimentation and Growth at VWO

Most readers of this blog would be familiar with A/B Testing. Just as a quick reminder, A/B testing is an experiment where a random visitor to your digital property is shown a different version than the original (also called ‘control’) in an optimizer’s quest to discover an optimal version that maximizes conversions. For example, maybe it’s the red button that maximized clicks or maybe it’s the blue button. Who knows? Well, your A/B test would know. However, in this quest to maximize conversions, there is a cost that you incur – a sizable portion of your traffic is routed to a losing variant directly reducing your business metrics (like sales or conversions).

It is said that in an A/B test, the cost of increasing conversions is the conversions, per se. We say, touche’.

Take Jim’s case as an example. Jim, a UX Analyst, works with a mobile brand that is launching its latest and greatest handset next week. To hype demand and trigger wildfire sales, Jim decides to run flash sales on its mobile app for 3 days.

Here is a rider though – Jim is aware that the brand’s in-app navigation is poor ( he ran a survey with active users to arrive at the conclusion) and visitors face friction while locating the product. To improve the navigation, he decides to do an experiment where he creates a variation with more intuitive navigation leading users directly into the flash sales funnel to test if this version would solve discovery issues for the new handset. In short, Jim is trying to improve a crucial KPI – percentage of sessions in which users were able to discover the new handset.

He looks at incoming data from the experiment and observes that the tweaks to the in-app navigation are demonstrating strong uplift. Jim is too trigger happy though – he wants to share early results with the senior leadership and get them equally excited. Just as he is about to barge into the CMO’s cabin with a copy of early trends to convince her to direct more traffic towards the new navigation, he is stopped in his tracks by a quip from a Data Scientist.

“Jim, these trends are great but are they statistically robust? Where is the significance?”

“But we don’t have time to wait for it! The sale ends in 3 days!” Jim grunts.

Who is right? Jim who is tasked with doing the best possible within 3 days or the data scientist who is questioning statistical significance? Well, both are, and here’s why. Remember the earlier quip about the cost of increasing conversions, being the conversions? Jim’s situation warrants an approach that minimizes the cost of running an A/B test. The loss of conversions due to the low performing variation is called Bayesian Regret. Minimizing the regret is especially important in time-sensitive situations, or in cases where the cost of poor variations is high enough that businesses hesitate to run A/B tests. Since Jim is relying on a three-day window to maximize sales, he can’t wait for statistical significance AND lose out on conversions, which sometimes takes weeks (or months, for low traffic websites). If he waits for statistical significance, he won’t be able to use the results as the 3-day window will be over.

If Jim had Multi-Armed Bandit algorithms to use, this issue wouldn’t have happened. Here’s why.

an illustration of slot machines signifying multi-armed bandit algorithm

What are Multi-Armed Bandits?

MAB is a type of A/B Testing that uses machine learning to learn from data gathered during the test to dynamically increase the visitor allocation in favor of better-performing variations. What this means is that variations that aren’t good get less and less traffic allocation over time. Central to MAB is the concept of ‘dynamic traffic allocation’ – it’s a statistically robust method to continuously identify the degree with which a version is outperforming others and routes the majority of traffic dynamically and in real-time to the winning variant.

What MAB guarantees is that, unlike A/B test, during the course of the test, the total number of conversions will be maximized. The trade-off with A/B test is that statistical certainty takes a backseat in MAB because the focus is on conversions and finding out the exact conversion rates (of all variations, including the worse performing ones).

The Multi Armed Bandit Problem

MAB is named after a thought experiment where a gambler has to choose among multiple slot machines with different payouts, and a gambler’s task is to maximize the amount of money he takes back home. Imagine for a moment that you’re the gambler. How would you maximize your winnings? 

As you have multiple slot machines to choose from, you can either determine payout possibilities by taking a chance with all machines, collecting enough data until you know for sure which machine is the best. Doing this will reveal to you the exact payoff ratio of all slot machines, but in the process, you would have wasted a lot of money on low payoff machines. This is what happens in an A/B test. Or you can focus on a few slots faster, evaluate winnings like in the above case and maximize your investments over these slots for higher returns. This is what happens in MAB.

an illustration of how multi armed bandit functions
Multi-Armed bandit at work
Image Source[1]

Exploration and Exploitation

To understand MAB better, there are two pillars that power this algorithm – ‘exploration’ and ‘exploitation’. Most classic A/B tests are, by design, forever in ‘exploration’ mode – after all, determining statistically significant results is their reason for existence, hence the perpetual exploration. In an A/B test, the focus is on discovering the exact conversion rate of variations. MAB adds a twist to A/B testing – exploitation. Owing to the ‘maximize conversions and profit’ texture of MAB, exploitation, and exploration, run in parallel, akin to a train track – think of the algorithm exploring at a rate of many visitors per second, arriving at constantly shifting winning baselines and continuously allocating the majority of your traffic dynamically to the variant that has the higher chance of winning at that instant (exploitation).

It may sound like MAB uses heuristics to allocate more traffic to better performing variation. However, under the hood, VWO’s implementation of MAB is statistically robust. VWO uses a mathematical model to continuously update the estimated conversion rates of variations and allocates traffic split in direct proportion to those estimates. As the estimate of best performing variation gets better, that variation gets a higher percentage of traffic. In case you’re interested in learning the mathematics of VWO’s MAB algorithm, you may want to read more about a concept called Thompson Sampling[1].

Over the test cycle, the algorithm balances between exploration and exploitation phases. As high performers garner more conversions the traffic split continues to widen and reaches a point where a vast majority of users are getting served the better performing variation. MAB thus allows Jim from our example above, to progressively roll out the best version of his mobile app, without having to wait for his tests to reach statistical significance.

visual of ab testing vs bandit for different types of program

If MAB sounds so awesome, why would anyone do A/B testing?

It’s important to understand that A/B testing and MAB serve different use cases as their focus is different. An A/B test is done to collect data with its associated statistical confidence. A business then uses the collected data, interprets it in a larger context and then makes a decision. 

In contrast, MAB is an optimization algorithm that maximizes a given metric (which is conversions of a particular type in VWO’s context). There’s no intermediate stage of interpretation and analysis as the MAB algorithm is adjusting traffic automatically.

What this means is that A/B testing is perfect for cases where:

  • The objective is to collect data in order to make a critical business decision. Example: if you’re deciding the positioning of a product, engagement data on different positionings in an A/B test is an important data point (but not the only one)
  • The objective is to learn the impact of all variations with statistical confidence. Example: if you’ve put effort into developing a new product, you don’t just want to optimize for sales but also gather information on its performance so that the next time you can incorporate learnings into developing a better product.

In contrast, MAB is perfect for cases where:

  • There is no need for interpretation for results/performance of variations and all you care about is maximizing conversions. Example: if you’re testing color scheme, you just want to serve the one that maximizes conversions
  • The window of opportunity for optimization is short-lived and there’s not enough time for gathering statistical significant results. Example: optimizing pricing for a time-limited offer.
comparison between multi arm bandit and ab testing

In conclusion, it is fair to state that both A/B and MAB have their strengths and shortcomings- the dynamic between the two is complementary and not competitive.

Here are a few common real-world scenarios, where MAB has shown that it’s clearly superior to A/B Testing:

1. Opportunity cost of lost conversions is too high

Imagine you’re selling diamonds (or a car) online. Each lost conversion is probably worth thousands of dollars in lost opportunity for you. In that case, MAB’s focus on maximizing conversions is a perfect fit for your optimization needs.

2. Optimizing Click-Through Rates For News Outlets That Cover Time-Sensitive Events

Conjuring catchy headlines was initially an editors job, but that is clearly passe – ask our friends at The Washington Post[2]. The short shelf life of news pieces means that quick optimization is essential. They optimize and test headlines, photo thumbnails, video thumbnails, recommended news articles, and popular articles to drive maximum clicks inside a short window.

3. Continuous Optimization 

Optimizers have the ability to add or subtract multiple elements from variations and test across all simultaneously. In a traditional A/B test, there is little freedom to orchestrate changes once the experiment goes live because data sanctity is sacrosanct.

4. Optimizing Revenue with Low Traffic

If there’s not enough traffic, A/B tests can take really long to produce statistical significance. In such cases, a business may find it better to run an MAB as it is able to detect the potentially best version much earlier and direct an increasing amount of traffic to it.

Even though MAB has its merits, there are many scenarios where A/B Testing is clearly the better choice:

1. When you are aiming for Statistical Significance 

For all their strengths, MAB experiments are not the best choice when you want to get a statistically robust winner. A/B tests are still the fastest way to statistical significance even though you might lose some conversions in the process.

2. Optimizing for multiple metrics

Mature experimentation teams track 4+ goals per experiment, as experiences are composite of primary and secondary goals. While MAB experiments work great when optimizing for one key metric, they don’t work well for multiple goals as they only factor in the Primary Goal while allocating incoming traffic.

3. Post experiment analysis

Most experimenters like to slice and dice the data gathered during an experiment to check how different segments reacted to modifications on their web properties. This analysis is possible in A/B Tests but might not be possible in MAB as sufficient data might not be available for underperforming variations.

4. Incorporating learnings from all variations (including the poor ones) into further business decisions

During the course of the test, MAB allocates most traffic to the best performing variation. This means that poor-performing variations do not get enough traffic to reach statistical confidence. So, while you may know with confidence, the conversion rate for best performing variation, similar confidence may not be available for poor-performing ones. If getting this knowledge is important for a business decision (perhaps you want to know how bad is the losing variation as compared to the best one), an A/B test is the way to go.

Summing up

If you’re new to the world of Conversion and Experience Optimization, and you are not running tests yet, start now. According to Bain & Co[3], businesses that continuously improve customer experience grow 4% – 8% faster than their competitors. Both A/B testing and MAB are effective optimization methodologies –  MAB is a great alternative for optimizers who are pressed for time and can partake with statistical significance in exchange for more conversions in a short window. Reach out to us at sales@vwo.com if you want a brush with MAB. Alternatively, you can sign up here for a trial.


Like this post?
Read our in-depth guide to A/B Testing
More from VWO on A/B Testing

[Infographic] 14 Times In Business You Should A/B Test

A/B testing is the scientific way of arriving at the truth, or at least the…

Read More
Mohita Nagpal

Mohita Nagpal

2 Min Read

5 Easy A/B Test Ideas To Get You Started on Conversion Rate Optimization

(This is a guest post authored by Ohad Rozen from Toonimo.com) Most websites don’t have a…

Read More
Ohad Rozen

Ohad Rozen

7 Min Read

12 Game-Changing A/B Testing Tips for 2014

A new year demands fresh ideas, resolutions and plans. It’s that time of the year…

Read More
Mohita Nagpal

Mohita Nagpal

9 Min Read
Shanaz Khan from VWO

Hi, I am Shanaz from the VWO Research Desk.

Join our community of 10,000+ Marketing, Product & UX Folks today & never miss the latest from the world of experience optimization.

A value for this field is required.

Thank you!

Check your inbox for the confirmation mail