PM Best Practices: Creating a culture of experimentation (even at a B2B company!)

Any PM at a B2C company will tell you that being able to run and analyze experiments is a critical skill. A/B tests, for example, are the PM’s bread and butter: by splitting users into groups, assigning variations of a feature to each group and statistically evaluating the results, you can empirically determine which variation performs the best.

Here’s the key, though. For these results to be actionable, you need something important: sample size. Statistics 101 tells you an experiment is not conclusive if you haven’t run it on enough people for the differences to be statistically significant. 

For established B2C companies with tens of thousands of users, statistical significance isn’t usually a problem. But what about B2B companies, many of which have fewer (if potentially higher-paying) customers? Should having a smaller user base prohibit a B2B PM from bringing a rigorous and data-driven approach to launching new features? Of course not! At Heap we believe PMs can always put data to work testing ideas and surfacing insights.

Even when statistics are not on our side, we PMs can still create a culture of experimentation. Here’s how.

1. Experimentation is about validating hypotheses

At its core, experimentation is about validating hypotheses. It’s this mindset that creates your culture, far more than any tool will. There’s a big difference between saying “let’s build something and see if people adopt it/buy it,” and, “let’s find out if this new feature will impact this user behavior.” In the former case, you’re basically hoping and waiting. In the latter, you’re formulating a hypothesis around how a new feature will improve your customer’s experience of your product, then testing it rigorously.

In practice, Heap PMs use different methods to ensure that our decisions are guided by hypotheses:

  • Our product briefs track advanced metrics.

For example, when we shipped Suggested Reports (a new feature that offers users some common business questions they can answer with Heap) we wanted to know if this new feature would make it easier for newer users of Heap to figure out how the tool worked. Our hypothesis was that if we offered suggested uses for Heap within the Heap tool itself, novice users would find more value in the product.

To measure and report on a new feature like this, most PMs would turn to a metric like “number of people who used the new feature.” After all, if lots of people are using the feature, it must be working, right?

The problem is that this metric doesn’t tell you who those users are. If all the users of this new feature turn out to be experienced users, that would mean the new feature wasn’t doing what it was supposed to do—that it wasn’t making it easier for novice users to use the Heap tool. (Of course, if lots of experienced users ended up using the feature, that might give us a different and equally useful piece of information—perhaps the feature was useful for experienced users in a way we hadn’t anticipated.)

So instead of simple “number of users who used the feature,” we looked at the number of weekly Heap users overall. What we learned was that after we launched Suggested Reports, the number of users who ran a query in their first week of installing Heap went up. In other words, novice users were doing more with Heap right away. Our hypothesis was right!

The key is that we wouldn’t have learned this if we’d stuck to the standard metrics.

  • We build in systems to make sure we track our hypotheses on a longer time scale and compile our learnings in one place.

It’s easy to get caught up in day-to-day work and forget to check back on a previous launch to know how well it did. To prevent that from happening, we have a built-in process of writing “after-action reports” after each launch. You can read more about after-action reports here. 

2. Experimentation works better when it’s de-risked

Having a culture of experimentation means that you can test new ideas on a smaller subset of your user base and still get valuable insights. Why is this important? Because many hypotheses will be wrong. (That’s how experimentation works!) De-risking allows you to test hypotheses while minimizing the potential any experiment has for wreaking havoc on your site.

What are some of the tactics a B2B PM can use to de-risk bigger projects and not waste engineering resources?

  • Time-boxing work. When we explore new areas without a clear idea of how we might bring these ideas to market, we like to time-box the effort we put into the exploration. (By “time-box” we mean that we limit experimentation to a specific time period.) Saying “we will spend the next two weeks exploring this solution” brings clarity to the engineering team and other stakeholders, and helps avoid wasting time working on something that might never get to production. If after two weeks the experiment deserves more attention, we’ll give it a new time box.
  • Knowing when to invest in building the right foundations. There’s also a technical aspect to de-risking projects: being able to roll out new features gradually and roll them back if something happens. Some people think that only very large companies can afford to do that, but at Heap we’ve made the decision to invest in our foundations early to have a robust way of deploying code to only certain customers. We can also properly monitor the changes we make. (If you’re technical, we recently released an experimental framework that allows us to serve different versions of the Heap javascript snippet to different customers, along with advanced telemetry to monitor the code being deployed.) This investment in foundations is already paying off, as it allows us to roll out improvements to our customers faster and minimize the risk associated with a change. 

3. Experimentation is about getting feedback from the real world

What are the different ways a B2B PM can get sufficient feedback from the real world without running an A/B test? User interviews are great, but can be time consuming. They can also introduce bias towards the companies that were interviewed. At Heap, we also use other methods: 

  • Painted doors in the product. Painted doors are an easy, cheap way to test customer excitement about potential features. Here’s how they work: in your app, you show users a link to a new feature. The thing is, you haven’t actually built that feature yet. So when users click the link, they end up on a survey, or an option to contact us to know more. Simply tracking the number of people who click the link is a great way to gauge interest in a feature without actually developing it. Of course, you should limit yourself to one to two painted door experiments in your product at any time, or you will make the user experience very frustrating!
  • Identifying anecdotal evidence that collectively translates into a real need. The key here is to track customers’ ad-hoc feature requests in a centralized place. At Heap, we use Productboard to keep track of the thousands of feature requests we get. As a PM, I get great satisfaction from seeing patterns emerge, even between customers of different sizes and different industries. And knowing that I am not over-indexing on the customers I’ve talked with most recently (or who are the most insistent!) gives me peace of mind. 

4. Experimentation is about giving teams the freedom to test wild ideas

The word “experiment” still summons an image of the mad scientist, hair askew, lab coat stained with phosphorescent chemicals. Although cliché, there’s some truth to this stereotype, even in modern organizations: when you experiment you let your ideas go wild in a controlled environment.

Above I talked about how we de-risk our ideas. More fun, every year we also run a “Heap hack week.” For one week, we all (everyone in the company!) pause ongoing product work and instead work on exciting projects that we wouldn’t necessarily have tackled otherwise. Many of you may be familiar with the “hack week” concept, but at many companies events like those end up being devalued in favor of more immediate priorities. At Heap, we strongly value experimentation, so we make clear guidelines for hack week: 

  • Teams should be cross-functional: go-to-market teams and engineering teams often partner on new ideas
  • Ideas need to be bold: they would be a big deal for the company if put into production
  • Ideas need to be creative: people should think outside of the box and generate ideas that might not necessarily fall on any team’s roadmap

This year’s hack week generated more than 20 new projects, and helped the PM team realize the impact and engineering effort associated with them. We rapidly adjusted our roadmaps to make room for several of them! 

At Heap, we believe that the size of your user base doesn’t limit your commitment to experimentation. Feel free to try these ideas (or, ahem, experiment with them) and see if they work for you!

5. Bonus: Experimentation is about not listening to the HiPPO (highest paid person’s opinion)

Ok, I am cheating here. We don’t really have a best practice in place for this. It’s just something that happens organically at Heap. We’ve been recognized as a best place to work, and we’re hiring!)