<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: PopSQL</title>
    <description>The latest articles on DEV Community by PopSQL (@popsql).</description>
    <link>https://dev.to/popsql</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F2664%2F968ff586-a247-4441-8972-a096d6958f11.png</url>
      <title>DEV Community: PopSQL</title>
      <link>https://dev.to/popsql</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/popsql"/>
    <language>en</language>
    <item>
      <title>The Easy Way to Find Customers Likely to Churn</title>
      <dc:creator>Bill Babeaux</dc:creator>
      <pubDate>Thu, 03 Sep 2020 20:47:13 +0000</pubDate>
      <link>https://dev.to/popsql/the-easy-way-to-find-customers-likely-to-churn-4ljg</link>
      <guid>https://dev.to/popsql/the-easy-way-to-find-customers-likely-to-churn-4ljg</guid>
      <description>&lt;p&gt;Now more than ever, you have to hold onto every customer you can.&lt;/p&gt;

&lt;p&gt;Customers don’t simply love your product one minute and fall out of love the next. Normally there’s a declining trend in usage of your product over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The good news:&lt;/strong&gt; a single proactive outreach can reverse a downward trend. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The challenge:&lt;/strong&gt; efficiently finding customers with declining usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why SQL?
&lt;/h2&gt;

&lt;p&gt;With just graphing tools, it’s near impossible to spot trends in customer usage. It’s neon spaghetti:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpopsql.com%2Fstatic%2Fimages%2Ftemplates%2Ftoo_many_lines.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpopsql.com%2Fstatic%2Fimages%2Ftemplates%2Ftoo_many_lines.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And examining customers one by one? That would take ages.&lt;/p&gt;
&lt;h2&gt;
  
  
  Using statistical functions
&lt;/h2&gt;

&lt;p&gt;PostgreSQL’s &lt;a href="https://www.postgresql.org/docs/9.5/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE" rel="noopener noreferrer"&gt;statistical functions&lt;/a&gt; help you rapidly sift through the noise. We explore these functions further in our &lt;a href="https://popsql.com/sql-templates/analytics/linear-regression-in-sql" rel="noopener noreferrer"&gt;template on Linear Regression in SQL&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Below we’ll use just the &lt;code&gt;regr_slope()&lt;/code&gt; function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| Function         | Argument Type    | Return Type      | Description                                                                   |
|------------------|------------------|------------------|-------------------------------------------------------------------------------|
| regr_slope(Y, X) | double precision | double precision | slope of the least-squares-fit linear equation determined by the (X, Y) pairs |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function first creates a trendline that fits your data. It then tells you the slope of that line (your algebra teacher was right! You will use this stuff in your real job 🤓 ).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What regr_slope() is doing:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpopsql.com%2Fstatic%2Fimages%2Ftemplates%2Fchurn_trendline.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpopsql.com%2Fstatic%2Fimages%2Ftemplates%2Fchurn_trendline.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL you can copy / paste
&lt;/h2&gt;

&lt;p&gt;Here’s the query (we’ll break it down below):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- DATA PREP&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;action_data&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="k"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- quick hack to turn week into a numeric so we can use regr_slope() function&lt;/span&gt;
    &lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- we're looking at team usage&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;))::&lt;/span&gt;&lt;span class="nb"&gt;numeric&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;actions_per_user&lt;/span&gt; &lt;span class="c1"&gt;-- we want to see actions per user to normalize&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
  &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt; &lt;span class="k"&gt;between&lt;/span&gt; &lt;span class="s1"&gt;'{{start_date}}'&lt;/span&gt; &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="s1"&gt;'{{end_date}}'&lt;/span&gt; &lt;span class="c1"&gt;-- your date range, make sure your start_date is a Monday and end_date is a Sunday&lt;/span&gt;
  &lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;-- FINDING SLOPE OF ALL CUSTOMER USAGE TRENDLINES&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;weeks_considered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regr_slope&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actions_per_user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;slope&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;action_data&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;having&lt;/span&gt; &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="c1"&gt;-- let's say we want at least 5 weeks of data&lt;/span&gt;
  &lt;span class="k"&gt;and&lt;/span&gt; &lt;span class="n"&gt;regr_slope&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Data prep
&lt;/h3&gt;

&lt;p&gt;The first part of the query in the CTE &lt;code&gt;action_data&lt;/code&gt; is just data prep.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;extract()&lt;/code&gt; function turns our timestamp into a numeric, as &lt;code&gt;regr_slope()&lt;/code&gt; doesn’t accept timestamps as parameters. This hack works great &lt;strong&gt;except at the change of calendar years&lt;/strong&gt; (workaround at the bottom of the post).&lt;/p&gt;

&lt;p&gt;For the &lt;code&gt;extract()&lt;/code&gt; function, a week goes from Monday to Sunday. Avoid partial weeks by starting your date range with a Monday and ending on a Sunday 👍&lt;/p&gt;

&lt;p&gt;The output of the CTE &lt;code&gt;action_data&lt;/code&gt; looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| week | team_id | actions_per_user |
|------|---------|------------------|
| 12   | 93336   | 28               |
| 13   | 93336   | 6                |
| 14   | 93336   | 10               |
| ...  | ...     | ...              |
| 12   | 92982   | 26               |
| 13   | 92982   | 1                |
| 14   | 92982   | 2                |
| ...  | ...     | ...              |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Finding slope of all customer usage trendlines
&lt;/h3&gt;

&lt;p&gt;In the second part of the query, the &lt;code&gt;regr_slope()&lt;/code&gt; creates a trendline between &lt;code&gt;actions_per_user&lt;/code&gt; and &lt;code&gt;week&lt;/code&gt; for each team, then returns the slope of that trendline. Again, visualized:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpopsql.com%2Fstatic%2Fimages%2Ftemplates%2Fchurn_trendline.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpopsql.com%2Fstatic%2Fimages%2Ftemplates%2Fchurn_trendline.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the slope is negative, then the usage trend is negative. The more negative the output, the steeper the decline in their usage. We added the &lt;code&gt;weeks_considered&lt;/code&gt; column to ensure we had enough data points to see a trend. You can see that in the output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;| team_id | weeks_considered | slope |
|---------|------------------|-------|
| 97003   | 5                | -5.70 |
| 77503   | 9                | -4.93 |
| 95535   | 5                | -4.23 |
| 92982   | 5                | -3.11 |
| …       | ...              | …     |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;☎️  Contact the customers with the worst slope! Their usage of your product is plummeting. You could be letting great customers slip away!&lt;/p&gt;

&lt;h3&gt;
  
  
  Seeing the SQL
&lt;/h3&gt;

&lt;p&gt;Once more, to help you visualize what PostgreSQL is doing, we've graphed the trends for each of these teams in the output above:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpopsql.com%2Fstatic%2Fimages%2Ftemplates%2Fchurn_detection_per_team.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpopsql.com%2Fstatic%2Fimages%2Ftemplates%2Fchurn_detection_per_team.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you'd like to see an individual customer's behavior, here’s the query we used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="k"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;DATE_TRUNC&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;distinct&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;))::&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;actions_per__user&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;team_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;97003&lt;/span&gt; &lt;span class="c1"&gt;-- or 77503, 95535, 92982&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Try it yourself?
&lt;/h2&gt;

&lt;p&gt;Run this template against our &lt;a href="https://popsql.com/sql-templates/analytics/exploring-sample-dataset?utm_source=devto&amp;amp;utm_medium=social&amp;amp;utm_term=user-churn&amp;amp;utm_campaign=sql-templates" rel="noopener noreferrer"&gt;sample database&lt;/a&gt; that mirrors real startup data. You can also run this query on your own data, so long as you have a table that tracks events and includes a timestamp and a user/team ID.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus Content
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The inverse is also true: customers with strong positive slopes are growing 📈&lt;/li&gt;
&lt;li&gt;Here's the aforementioned workaround if your data spans a changing calendar year involves casting from integer to text and back to integer. But it works!
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;action_data&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
 &lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'year'&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'week'&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)::&lt;/span&gt;&lt;span class="nb"&gt;integer&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;yearweek&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="c1"&gt;-- rest of query continues as above&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;span&gt;Photo by &lt;a href="https://unsplash.com/@sweetpagesco?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Sarah Brown&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/churn?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

</description>
      <category>sql</category>
      <category>startup</category>
      <category>database</category>
      <category>learning</category>
    </item>
    <item>
      <title>How to Measure Cohort Retention using BigQuery</title>
      <dc:creator>Abdisalan</dc:creator>
      <pubDate>Fri, 28 Aug 2020 22:02:27 +0000</pubDate>
      <link>https://dev.to/popsql/how-to-measure-cohort-retention-using-bigquery-29ig</link>
      <guid>https://dev.to/popsql/how-to-measure-cohort-retention-using-bigquery-29ig</guid>
      <description>&lt;p&gt;With some simple visitor data, you can tell how you're retaining your customers on a month by month basis. &lt;strong&gt;That's super valuable!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can follow along at home. On the Google Cloud Platform, we'll use one of BigQuery's public datasets (details below). It's real visitor data from the Google Store in 2017.&lt;/p&gt;

&lt;p&gt;If you want to use your own data, the queries in this guide can be converted to work with your favorite SQL dialect. All your data needs is a timestamp of when your users visit your site and a visitor ID to track them.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;What This Analysis Will Produce&lt;br&gt;
Cohort Retention Definition&lt;br&gt;
Cohort Retention Framework&lt;br&gt;
BigQuery Data&lt;br&gt;
SQL&lt;br&gt;
Part One: Find Our Cohorts&lt;br&gt;
Part Two: Measure Activity After Cohort Month&lt;br&gt;
Final: Divide the Number of Remaining Users by the Cohort Size&lt;br&gt;
Conclusion&lt;br&gt;
Future Use and Modifications&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What This Analysis Will Produce
&lt;/h2&gt;

&lt;p&gt;Here's a graph showing what percent of users came back to the Google Store after visiting for the first time. It's grouped by what month they first visited and shows what percent remain after subsequent months.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsqmoc6rykodqzxoxh5il.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsqmoc6rykodqzxoxh5il.png" alt="Chart showing Google Store Cohort Retention in 2017"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here are the cohort sizes, for the more detail oriented folks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbakyby5fzr83vl7f6oal.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbakyby5fzr83vl7f6oal.png" alt="Another Table showing Cohort Retention"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Cohort Retention Definition
&lt;/h1&gt;

&lt;p&gt;What's a cohort? Essentially a group of people.&lt;/p&gt;

&lt;p&gt;How are we grouping our customers? I've decided to group them by the month that they visited the site.&lt;/p&gt;

&lt;p&gt;How do we know if we retained a user? We can know if they visit the site again in a subsequent month. For example if they visited in June then again in July, they were retained. But the months are independent — They could visit in June, and not show up again until August. We wouldn't count July, but would count August.&lt;/p&gt;

&lt;h1&gt;
  
  
  Cohort Retention Framework
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Group all our users into cohorts based on the month they first came to the store and count the total.&lt;/li&gt;
&lt;li&gt;Count how many users came back each month, starting from their cohort month.&lt;/li&gt;
&lt;li&gt;Calculate the percentage of cohort remaining after each month&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  BigQuery Data
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Feel free to skip this section if you don't want to use the example data from BigQuery.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;BigQuery has a large number of public datasets and Google Store Analytics from 2017 is one of them.&lt;/p&gt;

&lt;p&gt;You'll need to make a Google Cloud Platform account, create a project, and &lt;a href="https://popsql.com/learn-sql/bigquery/accessing-bigquery-public-datasets" rel="noopener noreferrer"&gt;add the public dataset to your project.&lt;/a&gt; Next, since we can't edit the public dataset, you'll need to &lt;a href="https://cloud.google.com/bigquery/docs/copying-datasets" rel="noopener noreferrer"&gt;copy the data from the public dataset to your project.&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  SQL
&lt;/h1&gt;

&lt;p&gt;In BigQuery, I named my project "Cohort Analysis" and named my duplicate table "Google Store Analytics" so you'll see the prefix &lt;code&gt;cohort-analysis-284422.Google_Store_Analytics&lt;/code&gt; often in this article. To follow along, replace this prefix with the name of your project and table in BigQuery.&lt;/p&gt;

&lt;p&gt;For each part, I'll also be creating views to make the queries reusable and easier to work with!&lt;/p&gt;

&lt;h2&gt;
  
  
  Part One: Find Our Cohorts
&lt;/h2&gt;

&lt;p&gt;First we find out the first time each customer visited the site this year and create a view so that we can reference this query later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;view&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.cohort`&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="k"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;timestamp_seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;visitStartTime&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cohort_month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;fullVisitorId&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.ga_sessions_2017*`&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;fullVisitorId&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example Results:
cohort_month    fullVisitorId   
2017-05-12    0206596068278667912
2017-05-12    3310123534725675314
2017-04-25    3153380067864919818
2017-06-23    1216639212736323098
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also find out the size of each cohort by counting the number of unique ids that show up for the first time in a month.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;view&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.cohort_size`&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="k"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;month&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;cohort_month&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cohort_month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;num_users&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.cohort`&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;cohort_month&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example Results:
cohort_month    num_users   
1             52613
2             49133
3             55346
4             52873
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Part Two: Measure Activity After Cohort Month
&lt;/h2&gt;

&lt;p&gt;Next, we find what months there's been activity after their cohort month. For example If their cohort was January and they came to the site on February, then it notes activity on month one — one month after their cohort month.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;view&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.user_activities`&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;date_diff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;timestamp_seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;visitStartTime&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="k"&gt;C&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cohort_month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;MONTH&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;month_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;GA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fullVisitorId&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.ga_sessions_2017*`&lt;/span&gt; &lt;span class="n"&gt;GA&lt;/span&gt;
&lt;span class="k"&gt;left&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.cohort`&lt;/span&gt; &lt;span class="k"&gt;C&lt;/span&gt;
&lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;GA&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fullVisitorId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;C&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fullVisitorId&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example Results:
month_number    fullVisitorId
0             7868243379843705173
0             5418231530547311489
0             4704421551407457903
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we count how many users were retained in each month after their cohort month and name it retention_table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;view&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.retention_table`&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="k"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MONTH&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="k"&gt;C&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cohort_month&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cohort_month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;num_users&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.user_activities`&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;
&lt;span class="k"&gt;left&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.cohort`&lt;/span&gt; &lt;span class="k"&gt;C&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fullVisitorId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;C&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fullVisitorId&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example Results:
cohort_month    month_number    num_users
1             0             52613
1             1             2198
1             2             908
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Final: Divide the Number of Remaining Users by the Cohort Size
&lt;/h2&gt;

&lt;p&gt;Finally we get the cohort percent retention after each month.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;view&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.final`&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cohort_month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_users&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;total_users&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_users&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_users&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;percentage&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.retention_table`&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;
&lt;span class="k"&gt;left&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="nv"&gt;`cohort-analysis-284422.Google_Store_Analytics.cohort_size`&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cohort_month&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cohort_month&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cohort_month&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should end up with a final result that looks like this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Foctang4f78enp1dtvyzh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Foctang4f78enp1dtvyzh.png" alt="Table showing final cohort retentions"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this data I also made this bar graph to show the drop off in users each month.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsqmoc6rykodqzxoxh5il.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsqmoc6rykodqzxoxh5il.png" alt="Chart showing Google Store Cohort Retention in 2017"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Again, figuring out how you're retaining your users is super valuable! You can see in this chart that the January cohort is consistently more retained than other cohorts. This could be for various reasons and knowing why could help your business capitalize on this fact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Use and Modifications
&lt;/h2&gt;

&lt;p&gt;Anyone who has used Google Analytics on their site knows that this cohort analysis is immediately available to them for free. But this manual method is much more flexible!&lt;/p&gt;

&lt;p&gt;Imagine that you also want to find retention numbers for customers who have paid for a product? What about customers who have used your latest feature or were referred by a friend? There are tons of insights you can get from visitor data which makes this data super powerful for guiding your business.&lt;/p&gt;

&lt;p&gt;Thank you for reading and I'll see you in the next article!&lt;/p&gt;

&lt;p&gt;✌🏾&lt;/p&gt;

</description>
      <category>sql</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
