DEV Community

Cover image for The Easy Way to Find Customers Likely to Churn
Bill Babeaux for PopSQL

Posted on • Originally published at popsql.com

The Easy Way to Find Customers Likely to Churn

Now more than ever, you have to hold onto every customer you can.

Customers don’t simply love your product one minute and fall out of love the next. Normally there’s a declining trend in usage of your product over time.

The good news: a single proactive outreach can reverse a downward trend.

The challenge: efficiently finding customers with declining usage.

Why SQL?

With just graphing tools, it’s near impossible to spot trends in customer usage. It’s neon spaghetti:

And examining customers one by one? That would take ages.

Using statistical functions

PostgreSQL’s statistical functions help you rapidly sift through the noise. We explore these functions further in our template on Linear Regression in SQL.

Below we’ll use just the regr_slope() function:

| Function         | Argument Type    | Return Type      | Description                                                                   |
|------------------|------------------|------------------|-------------------------------------------------------------------------------|
| regr_slope(Y, X) | double precision | double precision | slope of the least-squares-fit linear equation determined by the (X, Y) pairs |
Enter fullscreen mode Exit fullscreen mode

This function first creates a trendline that fits your data. It then tells you the slope of that line (your algebra teacher was right! You will use this stuff in your real job 🤓 ).

What regr_slope() is doing:

SQL you can copy / paste

Here’s the query (we’ll break it down below):

-- DATA PREP
with action_data AS (
  select
    extract('week' from time) as week, -- quick hack to turn week into a numeric so we can use regr_slope() function
    team_id, -- we're looking at team usage
    (count(name) / count(distinct user_id))::numeric as actions_per_user -- we want to see actions per user to normalize
  from events
  where time between '{{start_date}}' and '{{end_date}}' -- your date range, make sure your start_date is a Monday and end_date is a Sunday
  group by 1,2
)

-- FINDING SLOPE OF ALL CUSTOMER USAGE TRENDLINES
select
  team_id,
  count(week) as weeks_considered,
  round(regr_slope(actions_per_user,week)::numeric,2) as slope
from action_data
group by 1
having count(week) >= 5 -- let's say we want at least 5 weeks of data
  and regr_slope
Enter fullscreen mode Exit fullscreen mode

Data prep

The first part of the query in the CTE action_data is just data prep.

The extract() function turns our timestamp into a numeric, as regr_slope() doesn’t accept timestamps as parameters. This hack works great except at the change of calendar years (workaround at the bottom of the post).

For the extract() function, a week goes from Monday to Sunday. Avoid partial weeks by starting your date range with a Monday and ending on a Sunday 👍

The output of the CTE action_data looks like this:

| week | team_id | actions_per_user |
|------|---------|------------------|
| 12   | 93336   | 28               |
| 13   | 93336   | 6                |
| 14   | 93336   | 10               |
| ...  | ...     | ...              |
| 12   | 92982   | 26               |
| 13   | 92982   | 1                |
| 14   | 92982   | 2                |
| ...  | ...     | ...              |
Enter fullscreen mode Exit fullscreen mode

Finding slope of all customer usage trendlines

In the second part of the query, the regr_slope() creates a trendline between actions_per_user and week for each team, then returns the slope of that trendline. Again, visualized:

If the slope is negative, then the usage trend is negative. The more negative the output, the steeper the decline in their usage. We added the weeks_considered column to ensure we had enough data points to see a trend. You can see that in the output:

| team_id | weeks_considered | slope |
|---------|------------------|-------|
| 97003   | 5                | -5.70 |
| 77503   | 9                | -4.93 |
| 95535   | 5                | -4.23 |
| 92982   | 5                | -3.11 |
| …       | ...              | …     |
Enter fullscreen mode Exit fullscreen mode

☎️ Contact the customers with the worst slope! Their usage of your product is plummeting. You could be letting great customers slip away!

Seeing the SQL

Once more, to help you visualize what PostgreSQL is doing, we've graphed the trends for each of these teams in the output above:

If you'd like to see an individual customer's behavior, here’s the query we used:

select
    extract('week' from DATE_TRUNC('week',time)) as week,
    (COUNT(name) / COUNT(distinct user_id))::integer as actions_per__user
from events
where team_id = 97003 -- or 77503, 95535, 92982
group by 1;
Enter fullscreen mode Exit fullscreen mode

Try it yourself?

Run this template against our sample database that mirrors real startup data. You can also run this query on your own data, so long as you have a table that tracks events and includes a timestamp and a user/team ID.

Bonus Content

  • The inverse is also true: customers with strong positive slopes are growing 📈
  • Here's the aforementioned workaround if your data spans a changing calendar year involves casting from integer to text and back to integer. But it works!
with action_data AS (
 select
  (extract('year' from time)::text || extract('week' from time)::text)::integer as yearweek,
... -- rest of query continues as above
Enter fullscreen mode Exit fullscreen mode

Photo by Sarah Brown on Unsplash

Top comments (0)