How Autocapture Actually Works

August 26, 201913 min read

Over 10,000 businesses rely on Heap to deeply understand how users engage with their products. For the modern Product Manager, this is a necessary, even obvious, part of the job. But there’s a well-known problem: it’s not always an easy thing to do. If you’ve ever tried to understand user behavior, it’s likely that you’ve had to deal with the typical tradeoff of Product Analytics, which is this:

On one hand, you can quickly and easily spin up the free version of Google Analytics, but you end up lacking important data points.

Or, on the other hand, you can embark on the tall task of implementing tracking code, then waiting for data, and re-implementing, and so on and so forth. In this scenario, you can make smarter decisions with better data, but at a large cost.

Heap eliminates this tradeoff.

So how can a product team reliably get the data needed to inform business decisions with truth, while also avoiding an endless cycle of tracking code, bottlenecks, and stale events that nobody trusts?

Our answer is Autocapture + Virtualization:

Autocapture — capture all user interactions automatically, with one snippet of code.
Virtualization — organize those user interactions into flexible, retroactive, and easy-to-create Event Definitions, which apply a semantic name to the Autocaptured data.

This approach introduces a new paradigm to the analytics world, and for those accustomed to legacy solutions, it’s not always clear how it can possibly work well.

When teams first start to explore Heap, a few obvious benefits jump out: ease of implementation, speed to insight, retroactive data, etc. That said, there are several non-obvious aspects of Heap’s approach. This post will explore 7 of those, in case there were misconceptions that were holding you back from checking it out yourself.

#1 Autocapture works seamlessly with manual event tagging.

Autocapture helps our customers automate a very large portion of their analytics implementation, resulting in better data, smarter decisions, and more agile, data-driven product development. That said, we know that each business is unique, and Autocapture may not cover every distinct data point our customers want to analyze.

There are a number of things that Heap simply cannot (and should not) know without user input. Let’s say you want Heap to analyze user engagement based on how much an account is paying per year, or based on their Subscription Plan. What if you have a backend event, like “Payment Processed”, or “Free Trial Ended” that you need to track? What about understanding the state of certain users, like whether or not they’re currently on a free trial, or in an A/B test?

Not only are these pieces of information potentially sensitive, but our customers have nuanced use cases and analysis goals that we need to support. This means that we need to provide our customers a way to add custom data to Heap when necessary.

The types of use cases mentioned above fall outside of Autocaptured data, but fortunately, manual tagging is an option in Heap that works in harmony with everything that does get Autocaptured.

Heap has a number of APIs that allow our customers to send data into Heap the old-fashioned way when it makes sense. These event and user data endpoints are very similar to those that you might be used to in other analytics tools. The difference is that our APIs supplement Autocaptured data, rather than being the center of your entire implementation.

To put a number on it, just over 90% of the events used in Heap Reports across all of our customers are Virtual Events; the remaining ~10% are manually tagged Events.

We typically advise that teams manually capture 5-10 core KPIs with our Track API (actions like “Payment Processed” or “Sign Up Submitted”, for example). The events that should be manually tracked in Heap typically have some common qualities. They tend to be mission-critical, not automatically captured by a client-side user interaction, and/or unlikely to change much over time.

Because your core KPIs don’t change very much, limiting manual tracking to just these means it will be less brittle and require far less maintenance than a typical 100% manual implementation. After that, the hundreds of other ad-hoc events such as “Opened Project Dropdown” are thoroughly covered by Autocapture.

#2 Autocapture + Virtual Events = Best-In-Class Data Governance.

For those accustomed to the legacy analytics, it might be natural to think, “Capturing everything means that my team has this enormous dataset, but nowhere to begin.”

The reality here requires a deeper explanation of Virtual Events. This is where everything begins in Heap.

All the raw data that Autocapture collects sits quietly under the surface, out of view, and waiting to be called into action. Once you need a data point, you create a Virtual Event, which is just a human-readable name that points to the underlying data. Virtual Events are the units of analysis used in actual reporting.

Here’s the kicker – you don’t need an engineer to create the Virtual Event; Heap has a point-and-click UI that lets anyone do it (screenshot below). On top of that, since we’ve been collecting data in the background all along, the Virtual Event that you create is retroactive to the moment you installed Heap.

Before you can analyze an event in Heap, you have to Define it. Check out documentation on our Event Visualizer if you want to understand how this process actually works (it’s pretty easy).

This means a couple of things:

Users can only see and analyze Defined Events. All of the raw Autocaptured data lives in the background, which helps eliminate confusion and overwhelm.
Anyone (with the right permissions) can create a defined event. Autocapture is intended to help teams move faster. Sometimes, moving faster is just about doing something yourself, rather than waiting for your engineering team to add tracking code.
Virtual Events can be created at any time, which means the data is available retroactively – back to the moment Heap was installed. If you forget to track an event, you can access the data when you need it.
Virtual Events can be modified at any time. If you want to modify the Event name, change the event criteria, or even combine multiple events together, any user can do that on the fly and the data applies retroactively. This is the beauty of a Virtual Dataset!

#3 Autocaptured data can easily be augmented with richer context.

Any useful analysis implementation requires some number of “custom properties”. These go by different names (variables, attributes, etc.) but they mean the same general thing – pieces of information that either tell you something about a user, such as their Role or Name, or something about an Event, like Dollar Value or Number of Notifications.

In Heap, custom properties like this are not included in what we Autocapture, and for a good reason. Attempting to automatically capture custom properties would pose a serious security threat to our customers.

Since trust and security are our #1 priority, we have 2 ways to easily augment Autocaptured data with rich, contextual information that matter for your needs.

Snapshots. Snapshots are unique to Heap, so let’s start here with a basic example. A hotel booking app has an event for “Click Book Trip”, which gets Autocaptured. When analyzing that event, it’s probably also useful to know things like the page where the event happened or the device type of the user. These are event-level properties that are included in Autocapture. But what about more contextual information like “Number of Nights Booked” and “Total Cost” for the stay?

This is where Snapshots come in. Snapshots allow Heap users to add additional custom properties to an Autocaptured Event. More specifically, this means that admins in a Heap account can add values from text in an element on the page, or a value of Javascript, to enrich an event with the necessary context. Snapshot properties are not retroactive, but importantly, they are added to Autocaptured Events from within our UI, rather than in your codebase.

APIs. As mentioned above, in certain cases, you’ll need to enrich the Autocaptured dataset with user and event-level information. This includes custom Events that aren’t Autocaptured, like “Support Ticket Opened”, user details like “Subscription Type”, and state of users like “On Free Trial”.

#4 Heap keeps your analytics up-to-date as your apps evolve.

Teams that make frequent enhancements to their site or application are typically the ones who benefit most from Autocapture & Virtualization. Software and business agility are top-of-mind for most Product teams nowadays, but all too often, legacy analytics solutions present bottlenecks and data blackouts that prevent smart product decisions from being made quickly.

At Heap, we’re all about agility, which is why we built Autocapture & Virtualization with rapid iteration in mind. Our customers are able to move quickly and intelligently without having to consider the tactics of event tracking.

In the traditional paradigm of analytics, you need to remember to add tracking code as new pages, features, and elements get added to the site. If you forget to do this, all of the customer interactions with those things are lost in a Data Blackout – a period of time where no data was ever collected for an event. Worse yet, it’s usually the initial interactions with new features that are the most important to understand.

Even if you do remember to track new events, launching changes to your site or app becomes a tedious exercise that is at best an additional step in each sprint, and at worst, a disincentive to moving fast.

Autocapture doesn’t mind how often you iterate – it’s always collecting events in the background. When something changes in your app or site, Combo Events (a special type of Virtual Event) are the key to updating your analytics on the fly.

Let’s talk about how those work.

Let’s take a simple example and say you change a blue button that says “Sign Up” to a green button that says “Sign Up Today!” (we’ll assume that the markup of the site also changes beyond a simple text and color swap). Depending on how the Event is defined in Heap, Event count for the blue button will go to 0, and Heap will let you know.

At that point, you can create a new version of the event for the new green button. All you have to do then is create a Combo Event that merges the old version with the new version, and just like that, you have historical tracking for every iteration of a site change. No coding, no lost data.

#5 Autocapture and Virtualization create the ultimate foundation for data trust.

Untrustworthy data is the root of all evil when it comes to a data-driven product strategy. Brian Balfour explains this well – he calls it the Data Wheel of Death. “Untrustworthy Data” isn’t very meaningful as a blanket term. Instead, we at Heap tend to think about untrustworthy data as 4 distinct categories: Stale Data, Unclear Data, Inaccurate Data, and No Data. You can read more about the four categories of untrustworthy data here.

Autocapture + Virtualization turns out to be a pretty good antidote to some of the sneakiest and most sinister root causes of Untrustworthy Data. Since Autocapture collects event data in its most detailed form, and without any preconceptions about how Events should be set up before Virtualization does its job, a flexible, complete, and trustworthy dataset results. From this follows a couple important points.

When it comes to user interactions, you can be sure you have the event data you need. There is no “I don’t know, we never tracked that”. Worst case scenario, you just need to define the necessary event(s). This is how you avoid the “No Data” Problem.
Event names and definition criteria can be changed at any time. Because Heap separates the Autocapture of raw data from the organization layer of Virtual Events, Events can be modified at any time, so you can maintain a clean dataset, without risking lost data if you make a mistake (this includes the Combo Event example discussed above). Gone are the days of 5 different events that mean “Sign Up” (signup, Signup – New, Sign Up, Signup – 2018). This is how you avoid the “Unclear Data” and “Stale Data” Problems.
Autocapture is protected from the human error that comes with an old-school analytics implementation. In an environment with manual event tags, it’s common to have Events that don’t end up telling the full story that they intend to. Certain data points can be missed, code can be implemented incorrectly, and event names can get mixed up. With Autocapture, all interactions are captured regardless of where or how they happen. With no room for human error, Autocapture helps create the most accurate dataset out there. This is how you avoid the “Inaccurate Data” Problem.

#6 You have control over where your data lives, whether in our Analysis UI or in another data store.

Exporting data from Heap to other systems is a core value-add in our product, and for many of our customers, a core reason they decided to partner with us. Because Autocapture collects the most comprehensive user behavior dataset available, the value of Heap is multiplied when that data is easily made available elsewhere. In our quest to help teams make smart decisions, it would be a disservice to limit your use of Heap’s data to our Analysis Interface.

Today, Heap has direct tie-ins with a handful of cloud data solutions such as Snowflake, Redshift, BigQuery, and Amazon s3. These integrations allow you automatically push Virtualized Heap data into those solutions without API calls or code.

For anyone who works with data warehouses, there are a couple of important points related to how we handle pushing data into your EDW.

We only send Virtual Events to the warehouse that you both define and choose to send over. No need to worry about Heap jacking up your storage with the mountain of data that we Autocapture.
Heap pushes the data over in a highly structured manner, with tables for each layer of our data model – Users, Sessions, and Events. These objects are intuitively related to one another, allowing for more straightforward querying, and simpler integration with other datasets.
The data sync is retroactive, just like everything else in Heap. If you installed Heap 6 months ago, define an event today, and then choose to push that event into your data warehouse, you will receive a table with 6 months worth of data the next time a sync happens.
Heap manages the data sync. It shouldn’t be your job to wire up API calls or create ETL jobs for behavioral data. Heap fully manages the data transfer from our product into your Data Warehouse.

#7 Security and Compliance are the most critical part any analytics platform. This is our #1 focus.

Protection of our customers’ data is the #1 priority here at Heap. For this reason, we want to be very clear about the intentional limits of Autocapture and the precautions we take to avoid putting our customers and their users at risk.

In the context of security and compliance, it’s important to be very clear about what Autocapture covers. Out of the box, Heap only tracks behaviors that happen, and nothing about the users that perform those interactions, or anything sensitive about the behaviors themselves (remember – this is where our APIs and Snapshots come in).

For example, if a user enters their Social Security Number into a field within your app, Heap will automatically track that something was typed into the field, but will NOT capture the value. It is possible to send relevant, non-sensitive user-level data into Heap, but this is a proactive, deliberate process that is managed by the admin user on the account, not something that we attempt to get right automatically.

Part of our unrelenting effort to protect the data of our customers and their users has been making sure to meet industry privacy and security standards. This includes SOC 2 Types 1 and 2 compliance, as well as GDPR compliance, among others.

We have also built a handful of technical options into the product as an additional line of defense against privacy risks.

Any analytics product with APIs is prone to accidentally being sent sensitive information by the user. To mitigate this, we have automatic PII detection built into the product that is designed to recognize when sensitive data does end up in Heap so we can stop and eliminate any potential issues before they blow up. Additionally, we have a user deletion API that makes it simple for our users to handle user deletion requests under GDPR.

Being selective and intentional about the data we ingest has been a priority since day 1. Heap was built from the ground-up with both Autocapture and privacy top-of-mind, which is the only way to make both work concurrently. This is ostensibly hard to get right if Autocapture is created as an add-on to a traditional analytics platform, which some other analytics tools have attempted in the past. When this happens, disaster can strike.

Heap’s mission is to help teams make smarter decisions with truth, and we believe that Autocapture is the way that all product teams will eventually unlock the insights that fuel those decisions. Reach out to us here’d like to learn more!