How To Scale Your Data Practices: Startup Edition

Heap Product Manager Milene Darnis recently gave a talk on how startups can go about scaling their data practices. Here she summarizes her key points and offers some recommendations for startups aiming to align more of their business decisions around data. You can watch her talk (in French!) here, and read a shorter version of these ideas on the efounders Medium page.

As a data-engineer-turned-product-manager, I’ve been lucky enough to live on both sides of the data debate. While I’ve spent years promoting the data scientist’s conviction that the right data can answer everything, I’ve also come to appreciate how businesspeople who haven’t spent their lives thinking about data often see it. (In my experience, they’re generally open-minded, and know data is useful, but are often worried if they’re using it properly, and are often unsure how data should fit into their decision-making.)

I still believe that most startups tend to under-utilize their data, and recently I’ve spent time thinking about how people and companies can start to ramp up their data practices. At a company like Heap, whose mission is to spread the gospel of data-driven decision-making, it’s not so hard. But for other startups, it takes work to make data an integral part of decisions across the org.

In the spirit of helping companies scale their data practices, here are a few key mindsets around data, and some best practices for building a strong relationship with data.

3 key mindsets around scaling data

Mindset #1: Good (and less good) reasons to use data

Data isn’t always the answer. (Gasp!) There are certainly many opportunities for businesses to leverage data. But there are also misguided ones. Situations where data is a good idea include informing roadmap decisions, identifying bugs in your product, and understanding your customers and prospects. Situations where using data won’t result in better outcomes include: justifying decisions that are already made, avoiding customer discussions (particularly with a limited pool of users), and looking for confirmation in vanity metrics.

Before embarking on a data journey, it’s worth thinking about how you want to use data to help your business. For many people—particularly those used to making decisions from their gut—a commitment to data means major changes to how their business operates. Be prepared for these changes, particularly the psychological shifts!

Mindset #2: Start small

I see this all the time: companies can’t wait to “start using AI,” but haven’t built up the infrastructure or processes that make an advanced use-case like AI viable for them. Lots of work needs to happen before a data org is mature enough to tackle this kind of project. 

The good news is that the steps it takes to adopt a major project like AI still offer many benefits to your business. As a first step, work on supplying clean data to business teams: things like MRR, ARR, and other revenue metrics. No, they’re not AI, but it’s nearly impossible to run a successful business without keeping tabs on them. Once you’ve done this, you can start collecting product data to inform the product team’s decisions: roadmaps, new features, and feature improvements.

Knowing what long-term goals you’re shooting for (AI, predictive forecasts, etc.) can make sure your preliminary steps are put together properly.

Once you’ve put these processes in place, you can start consolidating business data sets to generate models for forecasting. And once these data functions are up and running, you’ll be ready to add AI and predictive features.

Mindset #3: Prerequisites for using data 

Before launching any major data initiatives, it’s important to establish some baseline standards. First, do you have enough data? Each of the steps above requires you to be collecting a certain amount of data; if you can’t reliably report on ARR, or product users, or sales, or customers, then you know what your immediate tasks should be.

Second, what are your goals? These don’t have to be fully fleshed-out, but knowing what long-term goals you’re shooting for (AI, predictive forecasts, etc.) can make sure your preliminary steps are put together properly. Whatever your big goals are, you’ll likely start by implementing reliable systems for collecting, storing, and exploring data. Knowing what these systems will ultimately be used for can guide they way you organize them now. 

Third, can you afford to hire data talent? This question involves more than budget; if you want to retain talent, you’ll need to offer them learning opportunities and career development. 

Best practices for scaling data at your startup

Best practice #1: User segmentation & key metrics

A good first step for getting started with data is to map out key metrics and offer basic user segmentation. (Basically, user segmentation involves dividing your users into groups based on relevant features—job title, actions they typically take in the product, platform, and so on.) Traditionally, user segmentation is done by looking at user actions through two lenses: time-based criteria and action-based criteria. Time-based criteria tell you when a user performed a specific action (i.e. they onboarded this week), while action-based criteria tell you that a user performed that action (i.e. they completed onboarding). Both are important; both can give you actionable information.

Another key set of metrics to start with involves those that capture activity at each stage of your customer journey or funnel. For these, I recommend the AARRR framework (also known as Pirate Metrics!). Besides being fun to say, “AARRR” helps you remember the stages of the user journey: awareness, activation, retention, revenue, referral.

Best practice #2: Building your data stack

If you’re a startup with 50-200 employees, you want to keep your data stack simple. I’d recommend starting with product data, using something like, oh, Heap. To this you want to plug in sources for business data: payment data from Stripe, customer information from Salesforce, and commerce information from Shopify. Lastly, you’ll want a simple server-side data solution like MySQL, which can help manage your database.

Once you’ve got these in place, you can think about data warehouse options. I recommend cloud solutions like Snowflake or Amazon Redshift. ETL tools like Heap Connect can help aggregate and transfer your data to them. Finally, you’ll need a top layer of tools to explore and analyze this data. Examples here include tools like Looker and Tableau for BI, or Mode and Jupyter if you need a more data science-based approach.

Best practice #3: Hiring for data

Data is a growing field with lots of new job titles. It’s worth having a sense of what skills each title requires, and what each role can bring to your team:

  • Business analysts: Excel is their weapon of choice. Business analysts track business metrics (monthly revenue, operating costs, etc.), tie data back to the business, and provide collateral for investors.
  • Data analysts: Data analysts use SQL to answer more advanced questions, and often help product teams. For instance, a data analyst could help segment users by behavior in the product to help a product manager orient their roadmap. 
  • Data scientists: Data scientists are typically more technical,l and often come from academia. They are statistics gurus who use their skills to analyze the past and to run experiments and design predictive models aimed at predicting the future.
  • Data engineers: Data engineers are responsible for building and maintaining data infrastructure. They connect different sources of data, model data, and keep data clean for users across the company.
  • Machine Learning engineers: ML engineers typically live more in eng than statistics, and are responsible for putting into production the predictive models given to them by the data scientists. 

Best practice #4: Centralized vs distributed data org

The proliferation of data-based practices across the business world has caused two major changes (at least): it’s raised expectations of data-literacy across all teams, and it’s started to change the ways data teams work. Whereas earlier teams tended to adopt a centralized approach in which data-oriented tasks would be handled by the data team alone, today this structure produces too many bottlenecks. It also deprives teams of the data they need to do their jobs.

As data practices grow, I recommend taking a more distributed approach. Have your data team set up and maintaining your data stack, but make sure the team also makes that stack available to teams across the business, and easy to use by them. Data teams should also partner with other teams on complex projects whose success requires a data analyst or data scientist.


I hope this information proves useful to those of you trying to expand data practices at your startup! For more information, check out Heap