Text Only: How We Use Pseudo R2 to Automate Analysis Suggestions

David Robinson

August 24, 20213 min read

Null Model funnel results

Table with header, 1 row, and 3 columns

| Starts | Successes | Conversion Rate |

| 1000 | 200 | 20% |

Line plot of likelihood of 200/1000 successes

Line plot with title: “Likelihood of 200/1000 successes as a function of probability p”

x-axis is p; ranges from 0.1% to 90%

y-axis is 200 * log(p) + 800 * log(1 - p)

The curve has a single maximum when p = 20% and the log-likelihood is -500.4, and reaches minima at .1% and at 90%

Grouped funnel results

Table with header, 2 rows, and 4 columns

| Desktop | 500 | 150 | 30% |

| Mobile | 500 | 50 | 10% |

Table results of a 200/1000 conversion rate

Table with header, 2 rows, and 4 columns

| Desktop | 200 | 200 | 100% |

| Mobile | 800 | 0 | 0% |

Table results for our three funnels

Table with header, 5 rows, and 7 columns. First row is gray, next 2 rows ("Partially explanatory") are light blue, last 2 rows ("Perfectly explanatory") are green.

| Null model | Overall | 200 | 1000 | 20% | -500.4 | 0% |

| Partially explanatory | Desktop | 500 | 150 | 30% | -468.0 | 6.40% |

| Partially explanatory | Mobile | 500 | 50 | 20% | -468.0 | 6.40% |

| Perfectly explanatory | Desktop | 200 | 200 | 100% | 0.0 | 100% |

| Perfectly explanatory | Mobile | 800 | 0 | 0% | 0.0 | 100% |

Table results when conversion rates across groups are similar

Table with header, 2 rows, and 7 columns

| Similar success rates | Desktop | 500 | 105 | 21% | -500.1 | 0.06% |

| Similar success rates | Mobile | 500 | 95 | 19% | -500.1| 0.06% |

Heatmap of how the difference in conversion rates affects pseudo-R²

Heatmap with title “Pseudo-R^2 across two groups depends on the difference in conversion rate”

Subtitle: For two groups of equal size

x-axis: Conversion rate in Group 1

y-axis: Conversion rate in Group 2

Color scale is a rainbow with label “Pseudo-R^2”, ranging from blue to red

The graph is blue along the x=y diagonal (pseudo-R^2 is low), then grows warmer as the conversion rates of x and y differ, until the maximum of pseudo-R^2 = 100% at the top left (0%, 100%) and the bottom right (100%, 0%) of the graph.

Table results when one group is far more common

Table with header, 2 rows, and 7 columns

| Uneven composition | Desktop | 980 | 200 | 20.20% | -498.2 | 0.44%|

| Uneven composition | Mobile | 20 | 0 | 0% | -498.2 | 0.44% |

Outcomes of all five models

The five outcomes listed above in order: Null model, similar success rates, uneven composition, partially explanatory, and perfectly explanatory.

As the log likelihoods increase from -500.4 to 0 across these, the Pseudo-R^2 goes from 0% to 100%.

Heatmap of pseudo-R² across two groups

Heatmap with title “Pseudo-R^2 across two groups depends on the composition and difference in conversion rate”

Subtitle: Subplots each show a different composition, from 1:99 to 99:1.

x-axis: Conversion rate in Group 1

y-axis: Conversion rate in Group 2

Color scale is a rainbow with label “Pseudo-R^2”, ranging from blue (0%) to red (100%)

The graph is divided into 9 subplots, from “1% in Group 1” on the top left, then to “10% / 20% / 30% / 50% / 70% / 80% / 90% / 99%.”

Every graph is blue along the x=y diagonal and grows warmer farther from it. (pseudo-R^2 is low). However, the closer the composition is to 1% or to 99%, the more of the graph is blue (pseudo-R^2 is low no matter the conversion rate).

How pseudo-R² handles three groups

Table with header, 3 rows, and 6 columns

| Desktop | 480 | 91 | 19.00% | -496 | 0.88% |

| Mobile | 480 | 93 | 19.40% | -496 | 0.88% |

| Tablet | 40 | 16 | 40% | -496 | 0.88% |