Data-Driven
Why you should embrace imperfect A/B testing: 5 strategies
A/B tests are an essential component of continuous product improvement. They allow you to make informed decisions that drive real improvements for your digital experience.
They seem straightforward. You compare two product versions by exposing specific user groups to each version. Then, you assess which version performs better.
But A/B testing can get pretty tricky. It’s not just about numbers. You face competing priorities, insufficient resources, and sometimes, fewer users than you'd like for testing.
You might not always have the luxury of perfect testing conditions, but that’s ok! Embracing imperfect testing conditions can still yield valuable insights into your digital experience. In this blog post, we'll explore five scenarios with less-than-ideal A/B testing conditions so you can learn how to make informed decisions regardless of the situation.
Let’s dive in.
Breaking down A/B testing
Let’s start with an example...
A SaaS company wants to increase the conversion rate for an onboarding flow. After conducting user research and analyzing data to identify friction points, the team suggests removing a step that requires users to select their job titles from a drop-down list.
User research indicated that new users spent more time than expected on this step. Sometimes, users didn’t fall neatly into one of the job categories. Other users groaned at having to enter this info before getting to poke around the new tool.
But while the proposed change seems easy enough to reduce friction, the Data Team is concerned. They’re tasked with reporting product adoption by user persona. Removing this step would impact their ability to do this.
The product manager faces this dilemma: is increasing conversions by removing the high friction step worth losing data on the user’s role? To weigh the tradeoff, the PM needs to know how much improvement to expect from the new onboarding flow.
An A/B test can answer this question.
The ideal A/B test design
In principle, the design of an experiment is straightforward: you randomize 50% of users to receive a new onboarding flow, and the remaining 50% receive the original flow. The 50/50 split maximizes the power to detect a difference between the two groups. You can reduce the required duration of the experiment via the CUPED method by including pre-experiment characteristics of the users. After you complete the experiment, standard statistical methods can compare the difference in completion rate between the two groups. And the randomization of users into each group ensures any difference is not due to systematic bias.
However, running A/B tests on real customers has complications that may require you to deviate from the ideal A/B test design.
That's ok! In fact, it's often exactly the right thing to do.
5 tricks for A/B testing in an imperfect environment
Let's look at five scenarios where you can deviate from ideal testing conditions and still get beneficial results.
#1: Randomize customer accounts instead of users
A customer or “account” contains multiple users. It’s critical to distinguish which users belong to which account during experimentation to avoid negatively impacting a customer’s experience.
For example, suppose one customer has eight users who receive the new onboarding flow, while six other users receive the original flow. If these users experience different onboarding flows, it can lead to unnecessary confusion within the account.
One way to avoid this confusion is to randomize customers, not users, to receive the new onboarding flow.
Benefit: This avoids confusing customers by ensuring that all users within an account land in the same group of the experiment.
#2: Experiment on fewer users
To reduce risk, some product teams prefer to run an experiment on less than 50% of the customer base. Exposing less than half of your users to an A/B test reduces your test's power, so the experiment must run for a more extended period to detect a meaningful impact.
Benefit: If a product change introduces an unexpected problem, it’s better to learn that when less than half of your user base has experienced it.
#3: Exclude certain customer groups
For various reasons, some customers may need to be assigned (not randomized) to a specific group of the A/B test. For example, you may want to avoid testing a new product feature on a high-value customer who has complained about past experimental features. Alternatively, a high-churn risk customer asking for a particular feature would benefit from being assigned to receive the experimental feature.
However, if you don’t randomize customers into a group, it can introduce bias into the test results that you should exclude from the final analysis.
Benefit: Reduce the risk of disrupting strategic customers and those who are potential churn threats.
#4: Prioritize for customer value and business impact, not A/B results
Fast-moving teams make iterative changes to the product continuously. In these cases, your team doesn’t have the luxury of waiting four weeks for an A/B test to conclude before making other improvements.
For instance, you may have a bug in your onboarding flow that only impacts a specific segment of users. From an A/B testing perspective, it would be better to wait until the test concludes to make changes that affect the success rate.
However, from a customer experience and business outcomes perspective, you should fix the bug as soon as possible.
Most teams should optimize for customer experience and business outcomes in these situations and account for additional changes in the analysis of the experiment.
Benefit: Minimal customer and business disruption and maximum impact.
#5: Define Rollout Decision Criteria
After an A/B test concludes, the Product Manager must decide if they will roll out the new feature to the user base. In addition to the A/B test’s results, the rollout decision should incorporate the costs and benefits of full deployment.
Even if the A/B test has neutral results, it might not be the best decision to roll it out to the entire user base. If a feature costs substantial money to maintain going forward but offers only a tiny benefit to users, the team may decide to kill the feature if it only provides a slight lift.
Conversely, you may still roll out a feature even if your A/B test has negative results. For example, suppose you’re A/B testing the first step of a major redesign. The results are negative, but qualitative research makes the team confident that the full redesign will still result in significant improvements. Even if the initial test results are negative, it might still be best to roll out the feature as long as the negative impact isn't significant.
Benefit: Not all improvements are created equal. Improvements should be feasible, viable, and scalable from a business perspective. Make sure to assess each accordingly.
Conclusion
Here’s a quick recap of all the optimization levers we’ve covered:
Users vs. customers
Randomize users into groups
Randomize customers into groups to reduce customer confusion
Experiment on fewer users
50/50 split maximizes the power of an A/B test
Consider exposing less than half of your customers to an experimental feature
Exclude certain groups
Randomize all users to one of the two groups
Assign high-value customers to a specific group for a variety of reasons
Consider other product changes
Do not confound the test with other changes during the test
The purity of the A/B test must balanced against the need for continuous product improvements
Rollout decision criteria
Determine if the A/B test had a positive finding
Incorporate other costs and benefits into the final rollout decision
As you venture further into product optimization, remember that A/B testing isn't just a tool; it's your ally for quicker learning, minimal risk, and, ultimately, achieving digital success.
Happy testing!
Want to learn more about A/B testing? Check out our comprehensive guide.