PopSQL

# The Easy Way to Find Customers Likely to Churn

Bill Babeaux Originally published at popsql.com ・4 min read

Now more than ever, you have to hold onto every customer you can.

Customers don’t simply love your product one minute and fall out of love the next. Normally there’s a declining trend in usage of your product over time.

The good news: a single proactive outreach can reverse a downward trend.

The challenge: efficiently finding customers with declining usage.

## Why SQL?

With just graphing tools, it’s near impossible to spot trends in customer usage. It’s neon spaghetti:

And examining customers one by one? That would take ages.

## Using statistical functions

PostgreSQL’s statistical functions help you rapidly sift through the noise. We explore these functions further in our template on Linear Regression in SQL.

Below we’ll use just the `regr_slope()` function:

``````| Function         | Argument Type    | Return Type      | Description                                                                   |
|------------------|------------------|------------------|-------------------------------------------------------------------------------|
| regr_slope(Y, X) | double precision | double precision | slope of the least-squares-fit linear equation determined by the (X, Y) pairs |
``````

This function first creates a trendline that fits your data. It then tells you the slope of that line (your algebra teacher was right! You will use this stuff in your real job 🤓 ).

What regr_slope() is doing:

## SQL you can copy / paste

Here’s the query (we’ll break it down below):

``````-- DATA PREP
with action_data AS (
select
extract('week' from time) as week, -- quick hack to turn week into a numeric so we can use regr_slope() function
team_id, -- we're looking at team usage
(count(name) / count(distinct user_id))::numeric as actions_per_user -- we want to see actions per user to normalize
from events
where time between '{{start_date}}' and '{{end_date}}' -- your date range, make sure your start_date is a Monday and end_date is a Sunday
group by 1,2
)

-- FINDING SLOPE OF ALL CUSTOMER USAGE TRENDLINES
select
team_id,
count(week) as weeks_considered,
round(regr_slope(actions_per_user,week)::numeric,2) as slope
from action_data
group by 1
having count(week) >= 5 -- let's say we want at least 5 weeks of data
and regr_slope
``````

### Data prep

The first part of the query in the CTE `action_data` is just data prep.

The `extract()` function turns our timestamp into a numeric, as `regr_slope()` doesn’t accept timestamps as parameters. This hack works great except at the change of calendar years (workaround at the bottom of the post).

For the `extract()` function, a week goes from Monday to Sunday. Avoid partial weeks by starting your date range with a Monday and ending on a Sunday 👍

The output of the CTE `action_data` looks like this:

``````| week | team_id | actions_per_user |
|------|---------|------------------|
| 12   | 93336   | 28               |
| 13   | 93336   | 6                |
| 14   | 93336   | 10               |
| ...  | ...     | ...              |
| 12   | 92982   | 26               |
| 13   | 92982   | 1                |
| 14   | 92982   | 2                |
| ...  | ...     | ...              |
``````

### Finding slope of all customer usage trendlines

In the second part of the query, the `regr_slope()` creates a trendline between `actions_per_user` and `week` for each team, then returns the slope of that trendline. Again, visualized:

If the slope is negative, then the usage trend is negative. The more negative the output, the steeper the decline in their usage. We added the `weeks_considered` column to ensure we had enough data points to see a trend. You can see that in the output:

``````| team_id | weeks_considered | slope |
|---------|------------------|-------|
| 97003   | 5                | -5.70 |
| 77503   | 9                | -4.93 |
| 95535   | 5                | -4.23 |
| 92982   | 5                | -3.11 |
| …       | ...              | …     |
``````

☎️ Contact the customers with the worst slope! Their usage of your product is plummeting. You could be letting great customers slip away!

### Seeing the SQL

Once more, to help you visualize what PostgreSQL is doing, we've graphed the trends for each of these teams in the output above:

If you'd like to see an individual customer's behavior, here’s the query we used:

``````select
extract('week' from DATE_TRUNC('week',time)) as week,
(COUNT(name) / COUNT(distinct user_id))::integer as actions_per__user
from events
where team_id = 97003 -- or 77503, 95535, 92982
group by 1;
``````

## Try it yourself?

Run this template against our sample database that mirrors real startup data. You can also run this query on your own data, so long as you have a table that tracks events and includes a timestamp and a user/team ID.

## Bonus Content

• The inverse is also true: customers with strong positive slopes are growing 📈
• Here's the aforementioned workaround if your data spans a changing calendar year involves casting from integer to text and back to integer. But it works!
``````with action_data AS (
select
(extract('year' from time)::text || extract('week' from time)::text)::integer as yearweek,
... -- rest of query continues as above
``````

Photo by Sarah Brown on Unsplash

Posted on by:

### Bill Babeaux

Hi! I'm Bill and I love SQL.