Unlock 2025 Benchmark data → Access insights to stay ahead in the digital experience race.

Get the Report
skip to content
Loading...
    • Why Product Analytics And what can it do for you?
    • How Heap Works A video guide
    • How Heap Compares Heap vs. competitors
    • Product Analytics + Digital Experience Analytics A deeper dive
    • The Future of Insights A comic book guide
    Watch a Demo
  • Data Insights

    • Journeys Visual maps of all user flows
    • AI CoPilot Analytics for everyone
    • Web Analytics Integrate key web metrics
    • Session Replay Complete context with a single click
    • Heatmaps Visualize user behavior instantly
    • Heap Illuminate Data science that pinpoints unknown friction

    Data Analysis

    • Segments User cohorts for actionable insights
    • Dashboards Share insights on critical metrics
    • Charts Analyze everything about your users
    • Playbooks Plug-and-play templates and analyses

    Data Foundation

    • Capture Automatic event tracking and apis
    • Mobile Track and analyze your users across devices
    • Enrichment Add context to your data
    • Integrations Connect bi-directionally to other tools

    Data Management

    • Governance Keep data clean and trusted
    • Security & Privacy Security and compliance made simple
    • Infrastructure How we build for scale
    • Heap Connect Send Heap data directly to your warehouse
  • Solutions

    • Funnel Optimization Improve conversion in user flows
    • Product Adoption Maximize adoption across your site
    • User Behavior Understand what your users do
    • Product Led Growth Manage PLG with data

    Industries

    • SaaS Easily improve acquisition, retention, and expansion
    • Retail and eComm Increase purchases and order value
    • Healthcare Build better digital patient experiences
    • Financial Services Raise share of wallet and LTV

    Heap For Teams

    • Product Teams Optimize product activation, conversion and retention
    • Marketing Teams Optimize acquisition performance and costs
    • Data Teams Optimize behavioral data without code
  • Pricing
  • Support

    • Heap University Video Tutorials
    • Help Center How to use Heap
    • Heap Plays Tactical how-to guides
    • Professional Services

    Resources

    • Down the Funnel Our complete blog and content library
    • Webinars & Events Events and webinar recordings
    • Press News from and about Heap
    • Careers Join us

    Ecosystem

    • Customer Community Join the conversation
    • Partners Technology and Solutions Partners
    • Developers
    • Customers Stories from over 9,000 successful companies
  • Free TrialRequest Demo
  • Log In
  • Free Trial
  • Request Demo
  • Log In

All Blogs

Working Around a Case Where the Postgres Planner Is "Not Very Smart"

Matt Dupree
August 2, 20216 min read
  • Facebook
  • Twitter
  • LinkedIn

Heap is a digital insights platform that automatically captures web and mobile behavior like page views, clicks, and taps. We recently shipped Effort Analysis, a way for Heap customers to see the median number of interactions and seconds engaged between each step within a funnel. Here’s what it looks like:

Example of Effort Analysis

To build this feature, we needed to write a query that could quickly scan more than a billion rows of event data. But when we first started working on this feature, it took too long to scan the data.

Fortunately, we found a way to double the speed of this feature's p90 performance. To do this, we had to work around a case where the Postgres planner is, according to the Postgres docs, “not very smart.” This Postgres quirk surprisingly prevented an existing index from supporting an index-only scan. This post is about the quirk that caused our performance problem and the workaround we leveraged to achieve a 2x performance win.

What’s an index-only scan?

First, a quick refresher on index-only scans. An index is a secondary data structure in your database that the query planner can use to make some queries faster. For example, an index on the user_id column can make lookups of a specific user_id faster. An index-only scan is a special kind of operation in which the database can satisfy a query with just the information that’s in the index, without having to read the table’s actual rows.

For example, a typical index at Heap contains the time of an event and the user_id of the user who performed that event:

That index is created on a table with a schema like this one:

Given this table and index, the query planner would use an index-only scan for a query like SELECT user_id FROM events WHERE time > now() - ‘7 days’::interval because the only value we need, user_id, is in the index.

For a slightly different query like SELECT * FROM events WHERE time > now() - ‘7 days’::interval, the query planner couldn’t use an index-only scan because we need columns that are not in this index. The planner may* use an index scan instead.

An index scan, as the name implies, scans the index for rows that satisfy the query predicate. However, it also reads in pages of data from the table itself to return and/or perform computations on values that aren’t in the index. This additional step often makes index scans slower than index-only scans.

Our performance problem

Many of our clients record more than 100 million events a week, meaning a 90-day funnel would require scanning more than a billion rows. As a result, the p90 for the query was about 20 seconds, much slower than we wanted for our initial launch. Here’s a simplified version of the initial query that powered Effort Analysis:

Initially, the query that powered Effort Analysis wasn’t using an index-only scan. This surprised us because we had an index that looked like the following:

The reason this index didn’t yield an index-only scan is because of a surprising Postgres quirk.

The quirk: functions and operators can prevent index-only scans

Here’s what was happening: although data ->> ‘type’ is in the index, the query planner thinks it also needs to have data in the index, even though data itself isn’t ever referenced in the query except as a part of the data ->> ‘type’ expression. If that doesn’t seem like a smart way for the planner to work, the Postgres documentation agrees with you:

“PostgreSQL's planner is currently not very smart about such cases.”

This problem arises any time a query uses a function or operator on a column. (Operators are just syntactic sugar for functions.*) If your query references f(x), you must include f(x) and x in your index to get an index-only scan. This isn’t a random hack; it's the suggested workaround for this issue in the Postgres docs.

However, in our case, x is data, and because data contains most of the data associated with an event (e.g., type, page url where event occurred, etc.),*** this workaround would result in an unacceptably large index.

The workaround: predicates of partial indexes can enable index-only scans

We’ve just seen that if your query includes a column as a function parameter or operand, it doesn’t matter if that column is in the index; you still won’t get an index-only scan. If, however, your column is only used in a predicate that matches the predicate of the partial index, the planner is smart enough to realize that it doesn’t need to visit the table to filter down the result, even if your column is a function parameter or operand.****

This is the workaround we leveraged for our performance win. To better understand it, let’s look at how it applies to our above example query. Notice that in our example query, we only reference data ->> ‘type’ in the WHERE clause of the query. We aren’t actually returning the value of data ->> ‘type’. So the only reason Postgres would need to visit the table would be to ensure that data ->> ‘type’ IN (‘click’, ‘change’, ‘touch’) is true for each row of the returned result, but because of how partial indexes are constructed, this visit isn’t necessary.

As a refresher, partial indexes only contain entries for rows that match a predicate. For example, this partial index only contains entries where data ->> ‘type’ IN (‘click’, ‘change’, ‘touch’):

Because Postgres knows that ea_index only has entries where data ->> ‘type’ IN (‘click’, ‘change’, ‘touch’), a query like: SELECT time FROM funnel_events WHERE data ->> ‘type’ IN (‘click’, ‘change’, ‘touch’) would result in an index-only scan.

Our example query references the columns step_num, time, num_actions, time_engaged, and data. step_num, time, num_actions, and time_engaged are in the index, and data is only referenced in the where clause via data ->> ‘type’ IN (‘click’, ‘change’). Because the query predicate matches the partial index predicate, the planner knows that the index only contains values that satisfy the query predicate, and it doesn’t need to visit the table to filter down the results of scanning the index. Thus, we wind up with an index-only scan for our example query.

The performance win

This workaround lead to a 2x p90 performance improvement for Effort Analysis, and it made our p70 and p50 query faster as well:

Index brings global Effort p90 down to 10-14 seconds. Comparing 592 funnels with index to 284 without.

Only after looking at these results did we fully understand the quirk that caused this performance issue and the workaround that solved it. In some sense, we got lucky. We hope that by sharing these Postgres facts, other teams can be more deliberate about improving the performance of their queries.

If you like nerding out about optimizing Postgres and/or citus queries, @-me on Twitter @philosohacker, and if you’re not getting enough opportunities to optimize queries at your current gig, we’re hiring! Check out our team and open roles.

I’d like to thank John Krauss, Dave Robinson, and Dan Robinson for their help with reviewing early drafts of this post.


*Whether Postgres uses an index scan or a sequential scan depends on the percentage of rows it anticipates it will need to fetch from the table.

**This is noted in the “User Defined Operators” chapter of the Postgres docs.

***We know that there are lots of downsides to relying this heavily on a JSONB column.

****The docs have a nice example demonstrating this fact.

Matt Dupree

Was this helpful?
PreviousNext

Related Stories

See All

  • Creative visualization of AI CoPilot capability
    article

    Heap announces new generative AI CoPilot

    Heap, the leader in product analytics, unveils AI CoPilot’s open beta today.

  • Heap.io
    article

    What’s Next in Experience Analytics?

    What does the future of analytics hold, and what does it mean for you?

  • Heap.io
    article

    Building a Retention Strategy, Part 2: Connecting Activities to Revenue with a Metrics Tree

    If you read one post from this series, it should be this one.

Better insights. Faster.

Request Demo
  • Platform
  • Capture
  • Enrichment
  • Integrations
  • Governance
  • Security & Privacy
  • Infrastructure
  • Heap Illuminate
  • Segments
  • Charts
  • Dashboards
  • Playbooks
  • Use Cases
  • Funnel Optimization
  • Product Adoption
  • User Behavior
  • Product Led Growth
  • Customer 360
  • SaaS
  • Retail and eComm
  • Financial Services
  • Why Heap
  • Why Product Analytics
  • How Heap Works
  • How Heap Compares
  • ROI Calculator
  • The Future of Insights
  • Resources
  • Blog
  • Content Library
  • Events
  • Topics
  • Heap University
  • Community
  • Professional Services
  • Company
  • About
  • Partners
  • Press
  • Careers
  • Customers
  • DEI
  • Support
  • Request Demo
  • Help Center
  • Contact Us
  • Pricing
  • Social
    • Twitter
    • Facebook
    • LinkedIn
    • YouTube

© 2025 Heap Inc. All Rights Reserved.

  • Legal
  • Privacy Policy
  • Status
  • Trust