A/B Testing Website Copy With GPT-3.5 Turbo Opening New Doors for Experimentation Using AI
From initially helping humans out with redundant and manual tasks, to now mastering creative jobs such as making original art or composing music, AI has evolved and transformed in unprecedented ways. One such creative job that bots are surprisingly good at is writing copy! Yes, GPT-3.5 Turbo(Generative Pre-trained Transformer 3) is a neural-network powered AI that can produce nearly flawless text relevant to the given context. Built by OpenAI, a San Francisco-based research lab, GPT-3.5 Turbo is a third-generation powerful language generator that uses machine learning to predict and produce text, almost like a human.
If you expand on GPT-3.5 Turbo, here’s what it denotes:
Generative: Indicating that the goal of the model is to generate text by predicting one word at a time in a given sentence.
Pre-trained: Indicating that a huge amount of data has been fed into the system to train it.
Transformer: Indicating the algorithm used by the AI model, which specializes in natural language processing, i.e., how words are used in a language and what they mean.
Once GPT-3.5 Turbo is fed a prompt, it generates streams of text by predicting the possibility of a sentence existing in this world. Currently, the functionality is in beta and only offered to a select group (including VWO) through an API accessible via the cloud.
Let’s face it – copywriting is no easy task. GPT-3.5 Turbo’s robust and flexible language model can produce a short copy at scale. If you add to that the ability to test copy versions, you can get the best of both worlds. Also, some of the most commonly run tests revolve around webpage copy. So, integrating Open AI’s GPT-3.5 Turbo API with VWO Testing was the most natural and logical thing for us to do.
Our new feature enables you to use AI-generated copy to create variations for your website copy and deploy them without any help from IT. You can also test the AI-generated copy against the original human-written copy on your website. The next section covers how popular brands uncovered the practical implications of our new feature via a friendly contest between human-written and AI-generated copy.
VWO’s Human vs. AI competition
In August this year, VWO hosted a friendly competition between copy written by human copywriters and that by our new feature powered by OpenAI’s GPT-3.5 Turbo API. We invited participants from all over the world and tested AI-generated text against human-written one for their webpages with sufficient traffic via VWO or any other testing platform they were using.
Over 450 brands were given access to the AI copy generating feature during the course of this competition. Among the 18 shortlisted participants were Booking.com, Clark Germany GmbH, and Schneiders, to name a few. The AI feature was able to generate copies in various languages such as Spanish, German, Portuguese, etc. The participants were highly satisfied with the accuracy of the output in these languages.
All participants had to set up their tests keeping the original website copy as the control and the AI-generated one(s) as the variation.
Results of the competition
Among the 18 tests run by the confirmed participants, 1 had an existing (or new) human written copy as the winner, 3 had AI copy as the winner, 3 were declared as a tie, 2 are still awaiting results, and 9 were inconclusive.
Let’s take a look at some of the tests where the AI-generated copy won:
Schneiders [An eCommerce store for horse wear & equipment]
The team tested their topmost banner copy by creating a variation of the original page using the AI-backed language generator. Here’s a look at the control and variation from the test:
Once statistically significant results were achieved, the A/B test declared the variation to be the winner as it led to a 7.06% uplift in their banner clicks.
Clark Germany GmbH [An insurance agency based out of Frankfurt]
3 variations of the page headline were created using the AI copy and pit against the control. The test was run for 48 days. Following are the control and variations of the test:
Once the test reached conclusion (statistical significance > 90%), all 3 variations outperformed the control. Variation 2 resulted in the maximum uplift in their CTA clicks (15.77%), while Variation 1 and 3 resulted in an uplift of 9.13% & 7.13%, respectively.
Here’s the test that declared the human copy as the winner:
Booking.com [A global travel company]
The team at Booking.com tested the CTA on their hotel booking pages. 2 human-written copies were pitted against an AI-generated one. Following are the variations they created:
The human copy #1 won the test as it resulted in a 1.7% uplift in the CTA conversion rate.
Here’s a test that resulted in a human-AI tie:
Springworks [A SaaS company based out of India]
The team at Springworks tested their landing page headline by creating a variation using the AI-generated copy and pitting it against the original (control). Their goal was to improve clicks on the ‘Add Trivia’ CTA. The test was run for 8 days. Here’s a look at the control and variation:
Since the difference in the uplift in CTA clicks between the control and variation was less than 5%, and the test results were statistically insignificant, the test was declared to be a tie.
Let’s deep dive into the nitty-gritty of how VWO Testing and GPT-3.5 Turbo work together.
VWO Testing & GPT-3.5 Turbo
We integrated Open AI’s GPT-3.5 Turbo API with our Visual Editor so that every time you decide to run a test or deploy a change, you can generate copy recommendations that you can choose to create variations out of. This means you get to cut down on time spent brainstorming on variations and alternatives by having a library of AI-generated ideas readily available at your disposal.
Whether you are looking to optimize headlines, CTA text, product descriptions, or any other text on your site, you can quickly generate alternatives and either directly deploy them or test them against your original copy, both without any developer help. Either way, by automating this aspect of experimentation, you get to make your CRO program more efficient and agile.
Once you open VWO’s Visual Editor and click on any piece of text, you will find a ‘Suggest Variations’ option in the drop-down menu. Clicking on it will display a bunch of AI-powered copy suggestions (based on the existing copy) that you can choose from.
Sounds too good to be true? Sign up for a free trial by VWO and assess the GPT-3.5 Turbo feature for yourself.
What the future holds for GPT-3.5 Turbo, automated copywriting, and testing
Experts have conflicting views about the scope of GPT-3.5 Turbo and the extent to which humans can leverage it to automate copywriting. While some feel that the model can be trained to mimic and replace human written copy, others argue that it lacks the ability to construct cohesive sentences, use reasoning or logic constructively, or build a narrative – something you can only expect from a human copywriter.
It is hard to anticipate everything that might happen. We don’t think we can get everything right, certainly not up front. Still, it’s better to play around with this type of technology now while it can still be controlled and learn lessons to be applied as AI gets ever more powerfulGreg Brockman, Co-Founder & CTO, OpenAI
Whether automated copywriting will be a norm in the future is something we are yet to figure out as we explore GPT-3.5 Turbo’s full potential. However, what we know for sure is that this innovation is going to be revolutionary when it comes to copy experimentation.
With your new AI partner, you get to reduce the time spent on manual work as well as iterations with a copywriter. As you generate AI copy on demand within seconds and make real-time quick fixes on your website, you can create short-form content at scale and thus take a giant leap towards increasing your experimentation velocity and evolving your CRO program.
The power of GPT-3.5 Turbo in copy-driven optimization is immense, and we’ve only touched the tip of the iceberg so far. However, some restraint is advised because we cannot equate it with the intelligence of human copywriters – not yet, at least. The real value, at least for now, lies in being able to effectively test out copy variations while reducing the back and forth with copywriters and developers. The good thing is we can keep leveraging the power of GPT-3.5 Turbo to run better and faster experiments using platforms like VWO.