How to Leverage Bad Test Results
This article is a transcription of one of VWO’s Masters of Conversion webinars. Christopher Nolan is a data-driven growth strategist at ShipBob, which is a tech-enabled fulfillment service company. It also featured Vipul Bansal from VWO.
Key learnings from bad test results
- When you build a strong foundation for your testing by identifying the correct combination of tools, you will have confidence in your data.
- It’s not enough to call a test a failure when it doesn’t meet your KPIs. You need to take a step back and analyze why something that everyone agreed had benefits failed. When you do this, you will derive a ton of value that will guide your subsequent tests. An integrated testing system with analytics and tag management tools will help you with this analysis and bring forward important insights.
- As with failed tests, it is necessary to analyze the ‘why’ of test results even when a test is a success. Don’t take the results at face value and dig deeper into what got you there.
- When you’re communicating test results to the leaders in your company, show them the confidence you have in your data. Talk about your test results against the KPIs, and the insights you derived from the same. Be prepared with the next steps; tell them what you plan to do with these insights. With time, teams will be as happy with insights, as they are with positive results.
Pros and cons of A/B testing set-ups
An efficient experimentation set-up comprises an A/B testing tool to test your hypotheses, integrated analytics to gauge user behavior and site engagement, and a tag management tool which is a developer-less code implementation software that allows you to inject scripts.
- With a testing set up (such as VWO) in place, it is easier to define goals and create hypotheses based on your business’ KPIs, and replicate the same across tests.
- The testing set up offers you built-in visual reporting tools, such as heatmaps, scrollmaps, etc.
- The set up enables you to derive insights from raw data with the help of its built-in statistical significance measurement tool.
- You can only track user-defined goals, which means you need to define a goal before the test in order for it to be tracked.
- There is some restriction on what you can track in your goals and the number of ways in which you can slice your reporting is also limited.
Advantages of CMS and Web Analytics integrated testing tool
- It is easier to track user data on the web pages you are testing by installing the tag manager high on the page.
- It is easier to install tag managers within a CMS like WordPress, without needing a developer per se, as CMS offer plenty of plugin options.
- You can form segments and custom dimensions within the CMS if it is integrated with Google Analytics, which can help track user behavior to achieve your business goals.
Biggest failures led to the biggest testing wins
The BIG failed test by Chris was on BigCommerce’s pricing page. And that led to Chris’ biggest testing win.
Chris took what he calls as the “big swing” by redesigning the pricing page at BigCommerce, a SaaS company. The “big swing” failed as it resulted in a 4% decrease in the primary KPI – the trial conversion rate, and a 25% decrease in the demo conversion rate, which was a secondary goal. However, with the help of heatmaps, scrollmaps, and navigation summaries, Chris gathered insights that helped him find what was impacting the KPIs. These insights were:
- Mobile users liked the new design
Using scrollmap, it was found that visitors were finding it easier to scroll the pricing page to see all the features and plans before they made a decision in a shortened version for the mobile variation.
- The payment link is incredibly important
It was found during the navigation summary of the control version that visitors were clicking on a small link (the ‘Learn more’ link shown in the image below) by scrolling down. This link was removed in the variation.
- Toggles between Monthly and Annual plan modes drove more taps
The monthly-annual toggle on the pricing page drove more taps and scroll depth that correlated with trial conversion that did occur. However, this correlation could not be deduced from the heatmap as it only showed user behavior inferring that the KPIs were not met.
These insights led the team to create a V2 of the pricing page. This page had the toggle placed at the left bottom of the page and payment link retained from the control, based on the navigation summaries of the pricing page.
It saw a 15% lift in the trial conversion rate and 45% lift in the demo conversion rate.
The successful test of pricing page at ShipBoB
Previous learnings anchor your next experiment – when Chris joined ShipBob, the first thing he looked at based on his experience was once again the pricing page.
The form abandonment rate at ShipBob’ pricing page was 30%. Instead of testing the ‘Get started here’ call-to-action(CTA) button, which was placed above the fold and was the main CTA, he ran a test for the CTA copy ‘Request a quote’ link.
This link was placed in a no man’s land at the bottom right of the page, and 11% of all users landing on that page were tapping this tiny link.
Chris’ learnings at BigCommerce paid him well. He chose to make ‘Request a quote’ link as the main CTA instead of ‘Get started here’ for creating accounts. It resulted in a massive 104% lift in their new ‘Request a quote’ link clicks, using link tracking in Google Analytics.
His variation was a winner. However, learning from the insights, which included intensive study of the navigation summaries, tracking link, and scrollmaps data of the page in the analytics, led him to take a decision to maintain the conversion rate.
After a successful test, he found a link that he thought of getting rid of from his winner variation (shown in the image below). But through link tracking events configured in GA, it was found that users were still interacting with that small link at the bottom of the page, contributing to the overall conversion.
Effectively communicate test results to the leadership
- Highlight the importance of the test and your main KPIs.
- Bring on the table the key insights from the test results.
- Show your variations in the form of images (screenshots) with KPIs, followed by an insight, if you have one. For example, in Chris’ case at BigCommerce:
- Show the methodology if you have Product Heads and Chief Technical Officers in the room. For example, if they ask you how you tracked the link, you can show them the event you created along with the data in analytics.
- Link your data to your insights for credibility: If you have the Google analytics data pulled into an Excel spreadsheet, link your insight to it. It may be messy or raw but that is going to give you confidence that you’re not pulling these numbers out of thin air.
- Talk about specific successive plans to answer their ‘What’s next?.”
Learn from GTM & GA in test results
Use Google Tag Manager and Google Analytics in your testing to their full potential by digging into and deriving insights from:
- Navigation summaries
- Video engagement
- Chatbot engagement
- Scrollmap and link tracking
- Form fill engagement
- Custom dimension by pushing GA client ID
Some of the viewer’s questions Chris answered:
Q: What if, in a failed test, you are unable to find nuggets of data? Do you keep digging?
A: I think that you dig until you have looked at what you feel confident in and sometimes a failed test is just a failed test. I give it all that I have, that’s reasonable, and there’s no reason to get stuck on a test if we can’t find meaningful insight to move forward from.
Q: Would you recommend going for video recordings, or should one just focus on the data getting in from Google analytics or tag manager?
A: I think video recordings are valuable, including screen/session recording, heat mapping, and scrollmapping. They give you a place to view the actions of your visitors that you desired and also a segment of folks to further analyze.
Q: How important is it to retest a test winner?
A: Before you run any tests with any testing tool, run what I would call an A/A test. So, run a test where you do nothing to manipulate. Just be sure that you have confidence in the testing tool before you actually do the testing.
Q: At what point do you determine that you need to turn back and define your initial hypothesis to retest?
A: If you have a hundred conversions per variation, and you don’t have any data that suggests anything is strong win or loss, then maybe it’s time to reevaluate the hypothesis.
Q: How does the leadership actually react to the results and numbers? How do you personally think that leadership should take these insights?
A: I think that a good leader or a good stakeholder values the learning side of testing; however, that’s not always going to be the case. Have the confidence in your results, have the conviction in your original hypothesis, accept the failure, and have that next step.
Q: Did your companies help you in the entire optimization effort?
A: I wore multiple hats for optimization efforts at work. Typically, there is a whole testing ecosystem. However, how many people exist within that ecosystem is up to your business and how much resources they provide to it.
Here is the transcript of the entire webinar transcript. Read on for more detailed insights:
Great. Thank you all for showing up. I know it’s hard to make time on a Wednesday morning, but it’s very much appreciated and I hope you find value with this.
So without further ado, let’s jump in. Insights beyond the primary KPI.
How do you leverage those test results that are deemed bad or failures or just don’t drive business in the way that you expect?
[1.42-3.38] Chris’ professional journey and the scope of this webinar
My name’s Chris and I’m here to guide you through this. The main focus is growth and has been for my entire career. I’ve worked across B2B and B2C companies, agency life and on B2B SaaS for the last three or four years.
So, there are a lot of examples in this webinar, that will focus on the latter. But there’s a ton of insight here for really any industry.
A quick overview of the presentation. The goal is to do three things.
Build that foundation for your tracking and reporting on A/B tests, so that you can have confidence in reporting internally, externally to a leadership group within your own team, wherever you need that confidence in the data.
It’s building that foundation for how you analyze your test results once you have that foundational reporting and confidence in your data. And then three, like I said, how do you communicate that out?
And then really wanted to touch on other uses for a good track-stack beyond just A/B testing.
I wanted to touch on what this doesn’t include.
This isn’t going to be a walkthrough of how to set up Google analytics, Google tag manager testing tools. So, I’m not going to talk really about hypothesis creation or program strategy. Again, something I’m very passionate about, but this is focused more on that foundation for testing. It’s not going to be an explicit recommendation for any testing tool. Just want to make that clear.
And it doesn’t intentionally include terrible jokes, but I again cannot guarantee that!
[3.38-4.47] Three components of any basic testing set up
So, step one–it’s building that track-stack, right? Building that foundation.
There are really three components to any good basic testing setup. One, and it’s required as a testing tool – you can’t test without some website experience manipulation. You will know, the main purpose of that is to change the experience for the user and to allocate traffic accordingly to your control and whatever number of experiences you’re manipulating.
The two recommended tools for this track-stack that I highly advocate for are an analytics tool and a tag management tool.
Analytics is going to be used to measure your user behavior and site engagement. Most software companies or eCommerce companies will have some analytics set up.
And then tag management is really just a developer-less code implementation software. It allows you to do things like inject scripts and so much more beyond that we will get through.
But these are really the three foundational components of that testing, tracking foundation.
[04.50-07.43] Pros and cons of testing set ups
So if we’ve looked at what’s required and what’s recommended, I like to kind of look at pros and cons of the required versus the aggregate of all three components stacked.
With the testing tool, especially these days, you really can define and replicate goals and KPIs easily. Most testing tools require this before you can even launch a test.
What’s also really nice is that they spend a lot of time on visual recording. And everyone knows the power of a chart in a good presentation, but that’s out of the box. And then finally there’s built-in statistical significance measurement, test duration recommendations, things that you just won’t get if you’re looking at raw data.
So there is a ton of value in just leveraging the testing tool for results.
But, what it’s limited by, there are three main things. One, you can only track user-defined goals. So, that means that you have to define a goal before a test in order for that goal to be tracked.
What’s also really nice is that they spend a lot of time on visual reporting.
You can only slice up your reporting in a number of ways out. Again, I’ve put an asterisk here because a lot of these are getting very savvy with how to segment and track with more customization.
But at a basic level, you know, you have a device, you have a source and it’s relatively predefined so it’s not nonexistent, but it’s limited. And then lastly, it’s very difficult to backfill data beyond something like a page view.
So if you have, what I call an ‘Aha’ moment, right? Hindsight’s 2020, where it’s like, well, I really should have been tracking this previous lodging, this test, it’s very difficult to do that with a standard testing tool.
And so that’s the benefit of the combination of these three things. It’s all of the pros of that testing tool by itself. Plus you get access to all of the predefined goals that your business already has an existence with web analytics tools.
So you can look at the page views, the time on site, the source. Things that might be available in testing tools.
You can also see things like where did this user land before they made this conversion, what was the path that drove them into this testing experience.
You know, if you’re testing on something that’s not a landing page, oftentimes there’s value in understanding how the user got to that test.
There’s a lot more flexibility with user-defined goals, and this is where the tag management comes into play. And again, we will get much deeper into this.
You can implement things like scroll tracking, link tracking, form field engagement, things that are available in heat maps, scrollmaps, a lot of really nifty technology but not necessarily in a one-stop shop like in analytics, where you can take that data and cross reference it with other conversion metrics that you’re trying to understand.
Now that said, there are cons to this and I want to be clear with that. This requires integration set up with the testing tool. It’s not easily replicable like a predefined goal would be within a test.
[07.44 – 10.01] Advantages of having an integrated testing tool
You have to set up typically a custom dimension within your analytics tool and within the testing tool. It’s not a ton of work, but it can be cumbersome if you’re testing across a lot of different experiences at once.
So you have to do things like pull your data into Google sheets or an Excel and use the chart functionality there. There’s no built in statistical significance reporting, which can be difficult if you’re trying to make quick decisions.
That said, I have a ton of resources for A/B test calculators, please feel free to reach out to me after the presentation and I can provide that.
Test measurement – how do you set this up? Again, the presentation is not geared towards installation but wanted to provide some value here. Testing tool, again, the recommendation there, typically you want to install this in the header of your site directly rather than through something like a tag manager.
You can often implement directly with your content management system–your CMS, or you can hand it over to the developer if you have that resource and then just have them install it. Best recommendation is install it very high up on the page.
Web analytics. You can just solve directly on the site as well. You can solve your CMS or you can leverage Google tag manager to install Google analytics. And typically with tag manager, you can install it within your CMS.
There’s a lot of plugins, you know, defaults within most good CMS is like a WordPress or a craft that will allow for this.
This slide is indicative of how this presentation is going to look moving forward. A lot of links, a lot of texts.
Again, similar theme here, if you want to integrate testing tools, Google analytics, which I highly recommend, there are a couple of steps. Here’s a couple of integrations and links to that. And then in order to use this, you’ll need to create those custom dimensions and segments.
Don’t get overwhelmed. It’s a lot easier than it sounds and there’s really good documentation out there.
I’ve linked to two of the ones that I really trust (above slide). So that’s building your foundation, right? That’s step one. We’ve seen the importance and the benefits of either or now.
Now what does that bring to the value of your testing? That’s kind of the crux of this presentation.
[10.02-24.22] Failure stories – digging deeper into the results
So what I’ve done is I brought in a couple of real-world examples from my experience. Number one, you know what? I was at BigCommerce, my first job out of agency.
We did a lot of testing on our pricing page. This was the control experience of our pricing page when I got there.
And so know we have our four plans. It’s a pretty standard SaaS pricing page, annual benefits features, some calls to action beneath the feature set, and then really, you know, a phone number, call to action, some tech support and FAQs.
What we tested against was what I like to call a big swing, which I am in favor for.
I don’t completely advocate for isolated, variable isolation, A/B testing. It can be boring and it doesn’t always drive the value to the business.
It’s something like a redesign can drive and so we took a big swing. We said, let’s change the layout and the design. Let’s actually add this monthly annual pricing as something more interactive.
Let’s bring the plan cards a little bit closer to their feature sets. Let’s drastically reduce the length. Let’s add in tool tips, let’s bring in some social proof and some more value and FAQs and consultancy. So there’s a lot of changes on this page.
And what we saw, it was not what we expected. A 4% decrease in our main KPI, which was that trial conversion rate to try it for you, that the main CTA at BigCommerce at the time, and then a 25% decrease in that demo conversion rate, that’s a slightly upmarket call to action.
And what else was the result of that?
A pretty unhappy design team who busted their butts to get this out the door in time and an executive team who’d waited on this test for that duration of time that it took the design and development team to build out.
And now we’re sitting here with what could be deemed a failed test. So this sticks and this is going to happen if you’re testing and there’s really two ways to go about it.
One, you can say, let’s scrap that page and let’s just move forward with the control and we will see what we can test there. The second is to dig a little bit deeper and say, okay, this test failed but why?
So in order to look kind of beyond that main KPI, I really leveraged in this instance, three pieces of data and a fourth that’s not exclusive to this Google analytics–Google tag manager set up.
So I had installed scroll depth tracking. This is native to Google tag manager now and I’ve included instructions there that pushes the depth of scroll and whatever percentages you’d like.
And in 10%, 25%, 50%, I’d like to do 10% intervals into Google analytics as an event. It’s very nifty. You can look by variation at scroll depth at a very granular level. Similarly, I looked at element engagement. So a lot of testing tools do really well with this where you can basically point and click and you’ll have tracking on an element as a goal.
It’s separate from your main goal and so the challenge is that you can’t necessarily say what effect or impact did that engagement with that element have on conversion.
You can really just see differences in element engagement. I’ll get to why that’s important in a second. And then navigation summaries.
Honestly, if I were to recommend one major feature in Google analytics to anyone who’s getting started and trying to really crack their website open its navigation summaries, I’ve included an overview and I’ll show you the value there.
But it’s something that I highly advocate digging deeper into and really leveraging that as one of the more valuable assets that Google analytics provides. So, what did we see?
We saw that there’s really no change in how many users scroll down to the bottom of that features chart. We saw the massive shortening of that chart. So that was a really interesting insight.
We saw that users are scrolling regardless of how long we make, that they’re looking to see all of the features and all of the plans before they make a decision.
We also saw a massive engagement with our monthly toggle. So, we previously had this little snippet, this little message that said 10% off. But when we show that value in real tangible numbers, it seems to really draw the user in.
And what we saw again, what’s really important about this being in Google Analytics is that the higher scroll depth and the engagement with the annual and monthly toggle correlated to a higher trial conversion rate.
So we could have seen in a heat map that scroll depth and toggle engagement were high, but KPIs were not met and we could have deduced that those things are poor for conversion. But in reality they were what were driving the conversions that did occur.
Secondly, we saw that by device there was a massive difference. Those aggregate results that I mentioned earlier, right? A negative conversion rate difference in both trial and demo. werein fact mostly due to desktop users. Mobile users had a 10% trial increase in conversion rate. And so why?
We saw that users on mobile had much less scroll than users on desktop. Because we were able to implement different engagement tracking for this ‘try it free’ versus this ‘try it free’, we were able to deduce that that bottom CTA engagement was what was really driving that value.
And so lastly, we saw that the scroll on the desktop was driving clicks on this very small call to action down here. There’s this little payment processing link that says, ‘Learn more’.
And we saw that in the control experience where, where that link existed, where we’d removed it from the variation, not thinking about it, users clicked on that at a very high rate. And we’re two times more likely to convert the users who didn’t. And it’s this tiny link on the bottom left.
In order to figure this out, I created a sequence segment, I looked at navigation summaries to understand how many users clicked on this. And then I created a sequence segment for users who went from the pricing page to this payments page. And then I looked at the conversion rate within that sequence segment.
Again, sounds complicated. I’ve linked to how easy this truly is and found that again, the conversion rate was massive. And so we take these three insights–one, mobile users like this new scheme, two, the payments link is incredibly important, and three, the monthly toggle is driving conversion.
Now that said, we don’t really have evidence to suggest that the new layout is driving better conversion. We don’t have evidence to suggest that a truncated feature set on desktop is driving conversion.
But what we did was we kind of backtracked and said let’s take the control right and let’s see how we can take these insights and build it into a V2 of our original test. So you can see how we did that.
We revamped the design back to that kind of gray white color scheme. We added in the monthly annual toggle down on the left, we maintained the payment processing link and then we actually kept the experience similar but we did alter the design a little bit.
And I think this is really important and this is something that my mentors at my previous agency really advocated for.
Don’t be afraid of trying tracking experience for mobile and desktop. You want a site to be responsive, you don’t want it to break between experiences, but it’s okay for a mobile user to have a different experience than a desktop user.
Oftentimes, that’s what really is going to drive your business forward in a way that you wouldn’t expect.
And when we tested this, you know, don’t want to bury the lead 15% lift in trial conversion rate in the 45% lift in demo conversion rate.
So you know, just to take a large step back here, we had a test that took, you know, I don’t want to put anyone under the bus here, three to six months of work, to design out, to QA, to develop, to get into a testing tool, to set up, to set proper expectations, to make sure we had stakeholder approval and product marketing, and then to launch.
And at the end of it, we had negative results. And so to be able to take these insights that you gleaned from kind of that, tag manager, analytics, testing tool ecosystem and say, look, let’s not jump the gun here. Let’s see why and let’s see if there’s any benefit in what we all agreed was a better user experience.
And so, I really liked that example as a real-world example of how to take what anyone could comfortably call a failed test and move forward with it.
Here’s why it’s so cool. Right now I’m in ShipBop. We have a pricing page. It’s not one to one. We’re not pure SaaS, but it’s very similar, I guess like a website model. We have a free account creation with this. This gets started now call to action. We have an ability to talk to sales to request a quote.
The first thing that I looked at based on my experience got me to surprise. And I saw that our main call to action above the fold was ‘Get started now’. It was that account creation.
And then we had a link in and what I kind of like to call a no user’s land down here and hanging out in the bottom right, that led to requesting a quote. And you’ll never guess what we found.
11% of all users on that page clicked on that tiny link and that’s fourth highest, if any component on that page that includes navigation, that includes login, that includes the main call to action.
We saw the users who clicked on that converted to quoted around 70%. So that’s if you think about it, around a 30% form abandonment rate on that quote, which is a lower abandonment rate than we typically see on our quote form.
And so, you know, using that insight that I gleaned from a failed test experiment previously and bringing it to my new experience, there’s only one test that can follow after this. It’s testing the quote call to action, in the kind of main CTA spot. So that’s what we did.
We tested ‘Request to quote’ rather than ‘Get started now’ and we saw that we had a massive lift.
It was 104% increase in quote requests. As expected, a reduction in accounts created because we removed that as the main call to action. But on aggregate, it’s a 31% lift.
No one in their right mind is walking away from that.
But as you should be skeptical of failed results, you should be equally skeptical of what seemed like really good results.
Sometimes it’s not necessarily what you expect, right? So the first question that comes into my mind is kind of a seasoned skeptic. Was it actually the button or were people just clicking on that link?
Again, scrolling past the original thing and that that link is really what we need to promote on the page. It sounds silly, but you can answer it with link tracking.
Again, Google tag manager has this native as a variable to set up and I’ve linked to it.
What it does with the way I’ve set it up, is it pushes in every single link click on our site with the following event scheme.
So event category; links on links, forgive me, it is corny. Event action is whatever the text on that link was, is going to come through in Google analytics as an event action.
So you can see that here in this one shot that the event category is that top link engagement category, and the win action’s going to come through as this click text.
I like to put destination URL as the event label because if there are, let’s say multiple ‘learn more’ clicks or ‘learn more’ links on your homepage and you want to know which link was clicked, you might need that tertiary piece of data to say, oh, that’s the learn more link that goes to the the pricing, there is the learn more link that goes to the features.
And it’s very important to distinguish on this page. We only have one link that said, fill out this form. So it wasn’t necessary to dive deeper into that event label.
Surprisingly, we found that the request to quote button drove a substantial amount of conversions as we would expect, but people were still clicking on the ‘Learn more’ link.
So a reasonable person, could very much say, well, okay, cool, we added the CTA, it’s working, get rid of that link. We don’t need it down there, but it’s still working. And so this to me is the benefit of even looking past the successful tests because you want to maintain that lift.
There’s a joke in the A/B testing world where sometimes you see four or five positive test results, you go to the business and they’re like, why are we down?
Sometimes the reality of it is that test results don’t correlate with, with strong business numbers, its sometimes how it works. So you gotta do all you can to maintain the momentum of a successful test, as you implement.
It’s really important to say, okay, not only we fail, why, how can we respond, but also why did we win and how can we be sure to implement that here and also across the rest of any similar page on our site, especially with things like landing pages, if you find a win on a landing page. Take that template, start testing it across different campaigns, different channels. There’s a ton of value in that.
So that’s again, two real-world examples for being able to take a failed test or a successful test and look deeper and not just accept the result on its face. I think there’s a ton of value in that.
[24.23-27.33] How important is it to communicate your results effectively?
What’s next? You gotta be able to communicate this, right?
So this is advice from a seasoned loser. I have lost, I even confidently say that I’ve lost more tests than I’ve won and I’m still in business. I still get jobs.
It’s the reality of it. If you’re testing with purpose, you’re going to lose a lot. And so one of the things that you can do to kind of build your brand internally, if you’re in-house or, or if you’re in an agency and you’re looking to build your brand with clients, sell confidence in the thoroughness of that data and your analysis.
So if you’re working with folks who have that testing mentality to test in their mentality, they’re going to be comfortable with the insight, just as much as they’re comfortable with immediate results.
As you keep losing and you keep bringing insights, they’re eventually gonna get tired of that. But, build the confidence in the fact that you can find meaningful insight from your tests before you’re so concerned with every single test being a win.
Similar to that, don’t lie. I’ve had the, the instinct to be like, Oh, maybe I can slice and dice this data differently. No, be real. It’s very cool that you found this tiny link is important to the business, but don’t start with that. The agenda that I typically follow, and that’s worked for me across internal, external presentations is the hypothesis, why you think it’s important and your main key performance indicators.
Take some screenshots of the variance, show the result with the main KPI and then if you have insight beyond, go ahead and you bring that in, and then talk about next steps, right? Everyone wants to know, okay, cool, we saw this. What’s next?
One of the things that I’ve really found value in, a mentor of mine highly advocates for, is linking to your data, I think there’s a ton of value in that.
So especially to begin with, if you have this Google analytics data, you’ve pulled it into an Excel spreadsheet, maybe it’s raw, maybe it’s messy, link to it. That’s going to give confidence that you’re not pulling these numbers out of somewhere that’s, you know, not precise.
Similarly, don’t be afraid to show the methodology, especially for the CPOs and CTOs in the room. They want to know, okay, cool, you’re tracking links. How are you doing that or how on earth are you tracking scroll depth at a quantitative level?
I often do this in either the results or insights section of my test X and then really importantly, here we’re talking about very granular sampling and leveraging really, really granular data to get to these conclusions.
In Google analytics, it will often sample the data. You have to be really careful with this. Do not report on that. If you have a yellow check Mark in the top left corner of any of your analyses, just take a note of it.
If you do have Google analytics 360, you can pull it on a sampled report. If you have the free version, my recommendation is to either reduce the complexity or just remove your segment and use it for directional data for your own purposes or reduce the date range.
Typically, you know, things over a month they’re going to sample when you’re segmenting heavily. If you keep it to maybe week by week and aggregate that data, you’ll have more of a chance of that data not being sampled.
And lastly, again, not going to go through all of this. Just really want this to be valuable beyond the presentation.
[27.35-28.54] How Chris leverages GTM and GA in his current role
Here’s some of the things that I use Google tag manager for my current role and some of it’s superfluous, but a lot of it has really brought value, especially for field engagement.
If you’re in an eCommerce site or you’re working on direct response landing pages form field engagement is massive, it’ll tell you where in the forum people are dropping off, not just that they’re leaving your page. And that’s, that’s incredibly valuable insight.
Here’s some basic recommendations for Google analytics. One thing I would focus on here is navigation summaries.
If you have the time to really spend digging into your Navigation summaries, you will find value. I guarantee it. It is my first stop whenever I go into Google analytics and I go into Google analytics a lot.
I know that I didn’t really have a chance to speak to eCommerce in this. Like I said. So I’ve included some insights for eCommerce companies. I’ve included it, it’s in the PDF version of this. Apologies for not mentioning it right now, but, just want to clearly reiterate the value of this webinar is not necessarily exclusively in this presentation.
I want all the attendees to be able to take this, take it back to their business and use this over time.
[28.55-48.33] Viewers’ questions answered
Vipul from VWO:
Perfect. That was quite an insightful presentation, Chris. Not everyone actually talks about failures openly and I appreciate that you state examples of such failed experiments and how you were able to drive insights from even the failed experiments. And took it to your leadership and they took it well.
So that’s really a commendable effort. I think it’s a good time that we can now move on to the question and answer section.
So Mark is asking, a high level takeaway here could be that even if your test failed, you may be able to find some nuggets of data or hope in the results. What if you don’t, do you keep digging?
What if in a failed test, you are unable to find nuggets of data? Do you keep digging?
That’s a really good question, Mark. You know, there’s a point in time where all rabbit holes need to be come back up from. I think that you dig until you have looked at what you feel confident in and sometimes a failed test is just a failed test.
If you can say, you know, I looked at scroll depth, there’s really no difference that gleans insight. I looked at call to action engagement. Similarly, I looked at form field engagement.
You know, nothing here screams out why this test is a loser. It’s just support experience. And I think there’s still value in communicating that. I gave it all that I have that’s reasonable and there’s no reason to get stuck on this test as we can’t find meaningful insight to move forward from.
So sometimes you do have to scrap the test.
Vipul from VWO:
Right. I hope you got the answer Mark. Chad Candeva, I’m sorry if I pronounced your name incorrectly.
So Chad is asking about collecting data. Would you recommend us to go to video recordings or should one just focus on the data getting in from Google analytics or tag manager?
Would you recommend us to go to video recordings or should one just focus on the data getting in from Google analytics or tag manager?
So I think video recordings are valuable. I think screen, heat mapping and scroll thing are also valuable.
I think what they do is they give you a place to look, right. And what you have with your track-stack or your setup in Google analytics is an ability to say, okay, looks like three out of 10 users were leaving this page before, you know, they took the action that we desired there.
Is there something more there or it looks like tons of people are scrolling past our main call to action and are clicking on reviews? What is the conversion rate of those folks who click on reviews?
So it basically gives you a segment of folks to further analyze. I still think there’s tons of value in video recording and I leverage it internally. That’s a good question.
Vipul from VWO:
Makes sense. I think as much data as you can get, the better it is. Right. The video recording will definitely give you a more qualitative understanding of what’s really happening, while Google analytics will tell you more quantitative insight.
So, I think if you are able to mix both of them, if you’re able to derive insights from both of them, it will give you more clarity there. So Rishi Rawat is asking. Oh, okay. That’s an important question. That’s a good question. How important is it to retest a test winner?
How important is it to retest a test winner?
You know, that is an excellent question. And here’s what I would say. Before you run any tests with any testing tool, run what I would call an A/A test. Right? So, run a test where you do nothing to manipulate.
And I’m using finger quotes here, “your variation”, and just be sure to calibrate the tool with your site. It’s just like if you’re using a cooking thermometer, and you have the same temperature of water and you put the thermometer in one and it says 50, you put in the other, it’s at 75. It’s not an accurate measurement tool, right?
So I would say first have competence in the testing. Secondly, be comfortable with retesting, but maybe throw in a third variation. So, hey, you know, this just doesn’t make any sense.
I changed the call to action from like “don’t click this button” to “click me for free” and it still didn’t win. Well maybe test, you know, “try us free” and maybe just keep going down that road.
Especially with smaller sample sizes, there are false positives and there are false negatives. That’s the reality of testing.
So if you run into a situation where you still have a lot of confidence in your hypothesis and the data that you’ve gathered suggests that’s a good test.
You know there’s no harm in retesting. Just just be sure that you have confidence in the testing tool before you do that.
Vipul from VWO:
Right. And I think it’s as important to test the winner as it is important to test the failures. And that makes the entire testing process a very iterative process.
So you have to keep doubting yourself. You have to keep doubting the numbers that are getting and keep iterating on them. So that’s the best way in terms of understanding what the customer is trying to say, what’s the visitor trying to say in terms of, you know, building a better experience.
So thanks again for the great question, Rishi and Thanks Chris by answering it smartly. The next question is from Brandon. Brandon is asking, at what point do you determine that you need to turn back and define your initial hypothesis to retest?
What point do you determine that you need to turn back and define your initial hypothesis to retest?
So that’s a really good question. I would say if you have, I typically look at a threshold of like a hundred conversions per variation. If you have a hundred conversions per variation and you don’t have any data that suggests anything is strong win or loss, then maybe it’s time to reevaluate the hypothesis.
Now with smaller sample sizes, I would wait two to three weeks, check back in, see if anything’s changed. And if not, maybe it’s time to go back. But typically either a hundred conversions per variation or you know, two, three, four weeks.
Vipul from VWO:
Okay. So I hope you got the answer Brandon. So, yeah. This one question that’s coming from Brian Macy who is asking, how does this apply to the STD testing sites?
How does this apply to the STD testing sites?
That’s awesome. I used to work for this at one of my agency clients as a testing site. So it applies very, very, directly, I would say, in my experience things like seeing how many users here’s my experience to answer your question directly, when I was working in that space, there were a lot of complaints of people who had scheduled the test and expected it to be delivered to their home.
And this business model was you schedule a test and you go to the doctor where you had your test scheduled. So we tested making users select a location before they could even get into the checkup.
And we saw that we had reduced the number of users in the checkout. We had a higher conversion rate in that checkout.
Using the form field engagement, we saw that users who engage with the form at checkout, once we showed them the map, we’re like three or four times more likely to complete it because they were fully aware of what it entailed.
So I think there’s a ton of value for this. I think especially with form field tracking on checkout, especially with things like engagement. With scroll, if you have something that says, you know, by ordering this service, you should expect blank, blank and blank and how far down the page, that is, and you know that only 30% of people are seeing that.
Well then maybe you need to move that up the page. So, I have a lot of hypotheses about STD testing space cause I was an agency partner for one of them for probably a year and a half, but I used this very often in that space.
Vipul from VWO:
Great. I hope you got the answer Brian. So, one thing, Chris, that I have to highlight from this entire presentation is your passion for tracking and testing in particular. So I’m just curious to know what makes you passionate about testing and tracking.
What makes you passionate about testing and tracking?
Yeah. I mean, everyone loves testing, right? Everyone I’ve ever met in any business is curious about A/B testing. I think what people often miss is what the more granular tracking from the wins and losses can mean for the entire business.
So, I’m passionate about extrapolating out what seems like very innocuous or granular data points to answer more complex questions in the business. And so I think there’s a ton of value there.
I also studied psychology in school, which doesn’t seem like the necessary corollary there, but the psychology that I studied was very methods forward and so everything we did had to be rigidly studied, rigidly measured, and you couldn’t deduce a valid or invalid hypothesis without first establishing the foundation for that experiment.
And I think there’s a ton of value in applying a scientific method to testing. So that’s really what got me into it. I think tracking is a compliment to all of that.
And once you start digging into how granular you can get with tracking, sometimes like I admitted, I go overboard, I’d rank way too much stuff that maybe I use once a quarter. But it’s nice to have.
I always just go back to that hindsight 2020 in my experience and you know, it’s been a while that I’ve been in this space, it’s always better to be like, yes, I do have the answer to that seemingly tangential question rather than having to say no, but I’ll set it up so that I can answer your question next time.
So I think that’s really the value of this is having this robust tracking environment, an ecosystem where you have confidence that you can answer the question that any stakeholder in your business is going to ask.
Vipul from VWO:
Right. I mean that makes sense because when you have no, then you have these answers. These answers are not just opinions, right?
These are based on true data that you have seen, that you have tracked and that basically helps you in no coming forward.
As you know, some person who has done the research, who knows what he’s saying particularly. So that is a plus as well. So I am actually touched by the passion that you carry and would love to exalt some of some part of that as well.
So yeah, the second question that I had was regarding one of your slides and I think other examples that also that you’ve shared.
You mentioned that there was a 4% decrease on BigCommerce’s pricing page, right?
So once you run a test, you invest a lot of effort and time into it and the test does not give you the results that you were expecting or maybe went a little bit on the negative side. So when you take these numbers to your leadership, right, what is their reaction? How does the leadership actually react to those numbers and how do you personally think that the leadership should take these insights?
How does the leadership actually react to the results and numbers? How do you personally think that the leadership should take these insights?
Yeah, I think, I think it’s a great question. I think it gets back to one of the previous questions about, you know, what do you do if you’ve dug and dug and dug in and you’re still failing? I think, you know, it very much depends on who you’re presenting to.
I’ve worked in, like I said, a lot of different businesses and a lot of different stakeholders and some embrace the value of testing and embrace that a loss is not only going to happen, but it’s probably gonna happen often.
And some, you know, expect to the person that they’re paying an agency fee to, you know, to deliver immediately. And so there’s really not a blanket answer. I would say. You know, you have to, this is the theme, right?
Have the confidence in your results, have the conviction in your original hypothesis, accept the failure, and have that next step.
So if you have, if you have a failed test and you’ve dug and you’ve dug, then you say, okay, I’ve dug next step. We’re going to go back to the drawing board, right? And here’s a couple of the other hypotheses that I had for this set of pages or page or check out and we’re just going to keep going down the list, right?
I think that a good, a good leader or a good stakeholder in this space values the learning side of testing, but in reality, that’s not always going to be the case
So I would say going with conviction, going with competence and your data, accept the criticism of the results. Don’t necessarily accept, you don’t have to push back obviously to leadership, but don’t, don’t accept the criticism of the hypothesis.
Just say, yeah, you’re right, it’s a failed hypothesis, that’s why we test. And our next one is some that we have a lot of competence in this and here’s why.
Vipul from VWO:
That makes sense. The criticism should be taken and not with a pinch of ego or something. So yeah, that’s a great advice for all the leaders who might be listening or who are listening to it in the future.
So there’s again, another question from Mark. So he’s referring back to one, about your comment on hundred conversions or two weeks, or do you remember this speaking, Chris?
Chris: Yeah. So that’s like the threshold for.
Vipul from VWO:
So he’s asked me if I get 10,000 conversions and 500,000 visits a day, can I trust about something as a statistically significant in 24 hours or do we get into the realm of false positives?
If I get 10,000 conversions and 500,000 visits a day, can I trust something as statistically significant in 24 hours or do we get into the realm of false positives?
So, okay, Mark, now I know what you’re working with. So I used to work for a motorcycle parts vendor and that was similar numbers. Tons of conversions a day, tons of traffic.
I would say wait 2 weeks. I would say, you know, make sure that you’re giving time for the variation to really play out and the control to play out.
Now that said, if you’re a weekend and you’re seeing 140% lift in your variation, or worse as a business stakeholder, a hundred percent loss, cut it. And I would say be healthily skeptical of the positives.
Let those run, because it’s not hurting the business. If you need to cut the cord on the negatives, cut the cord, no one’s going to get mad at you for that and maybe come back and test it later on with that kind of traffic.
I mean, you can be testing four or five tests a month. So I would say if positive and stat sig has been reached and held for at least a day or two beyond the initial launch of the test, go ahead and be comfortable with that and then monitor.
This is the value of Google analytics. If you have an annotation, go ahead and annotate and Google analytics. This is when I launched that test and have all of the traffic allocated to it within the testing tool.
And then look back at it in a week and if you see the performance isn’t as you’d expect, then go ahead and revisit.
That’s what I typically do is I say, okay, if it’s high volume, high traffic, we have a win. Let’s draw a line in the sand here, let all traffic go to it and see if that mean is consistent.
And if not, then maybe revisit it. If we do see consistent lift as we’d expect, go ahead and implement or you know, take it out of the testing tool, and hard-code it however, you see fit to make that experience more consistent for your users.
But that’s a really good question. I’ve worked in that space as well and it’s, it’s really exciting, but it’s also really easy to draw conclusions very quickly.
So I would say, you know, be responsible to the business and then be skeptical of the lift.
Vipul from VWO:
I hope you noted that all of that down Mark, some really great advice there from Chris. Oh, I’m sorry Chris, we went a bit over time, but this is such an interesting session that I just have one more question for you.
Do you have a team at ShipBob or did you have a team at BigCommerce that helped you in the entire optimization effort?
Did your companies help you in the entire optimization effort?
And you know, sometimes you need to pull in the data analyst, you need to pull in the designers, you need to pull on the web developers, you need to pull in the stakeholders, you can pull on the product marketers for messaging, right? All of this, there’s an ecosystem in testing.
I don’t want to discourage people from being a one man band or a one woman band and getting in there and testing yourself and leveraging some of the capacities that a testing tool might provide in terms of resources for things in analytics and design and development. Typically there is a whole testing ecosystem.
However, how many people exist within that ecosystem is up to your business and up to how much they provide resources to it. But it’s a lot of hats to wear.
Vipul from VWO:
Right. And then sometimes if a single person is handling all the things, it might become overwhelming. So for some organizations, of course, it’s better if you start building a team and take the optimization program forward.
Perfect! So I think that’s it for today. Thanks again, Chris, for a wonderful and such an insightful session. I’m sure the audience must have loved it too. Thanks again and have a great day, guys.
Thank you all, and thank you for putting this on. Great day!