loading...
Cover image for Why you need SQL window functions (part 2)

Why you need SQL window functions (part 2)

helenanders26 profile image Helen Anderson Originally published at helenanderson.co.nz ・3 min read

Data Analysis (14 Part Series)

1) Speed up your queries with indexes 2) Resources for beginner data analysts 3 ... 12 3) SQL concepts from A to Z 4) 6 ways to search with System Tables 5) Pearl Jam, Nirvana and Beautiful Data Visualisations 6) Entity Relationship Diagrams explained by Sonic the Hedgehog 7) How to use the VLOOKUP Excel function 8) SQL Joins without the Venn diagrams 9) SQL aggregate functions and Mariah Carey 10) Why I use SQL CTEs 11) How to customise SQL Server Management Studio (SSMS) 12) Have you VACUUMed your tables lately? 13) Why you need SQL window functions (part 1) 14) Why you need SQL window functions (part 2)

Welcome back to the world of window functions.

Window functions are closely related to aggregate functions. But rather than collapsing all the rows we want into one row, we keep them and add a new column with a running total, rank or moving average. This becomes our ‘window frame.’

It becomes clearer with examples and pictures, so if you’re interested in using this powerful tool, keep reading.


1 – Introduction
2 – Setting the scene
3 – Bucket rows into groups
4 – Access a total from an earlier row
5 – Use Window Functions to perform calculations


Introduction

Window functions come in three main types. These are:

Aggregate Window Functions

These use aggregate functions like SUM, COUNT, MAX, MIN over a set of rows and return a single result from the query.

Ranking Window Functions

These assign a ‘rank’ to a set of rows and using RANK, DENSE_RANK, ROW_NUMBER, NTILE

Value Window Functions

These use LAG, LEAD, FIRST_VALUE, LAST_VALUE to access a previous row without having to do a self-join.


Setting the scene

This post is picking up where we left off in Part One, where we used window functions to answer questions from a Sales Manager who is setting targets for the year.

In the last post we covered aggregate window functions and how to use RANK(). In this post, we’ll be exploring how to use NTILE(), then exploring how we can use value window functions to answer questions using sales data.


Alt Text


Bucket rows into groups

When we left her in the last post we had provided the Sales Manager two tables. One showing daily sales with a running total, and all sales ranked by their dollar value.

This time the Sales Manager would like to give a bonus to the territory who has bought in the top 25% of customers over the last year.

To do this we can use NTILE() to bucket the dollar values into four groups that represent four quartiles.

select
  territoryid,
  customerid,
  sum(subtotal) as subtotal,
  ntile(4) over(order by sum(subtotal)) as bucket
from 
  sales.salesorderheader 
where sales_date between 
'2018-01-01' and '2019-01-01'
group by 
  territoryid, 
  customerid

NTILE() can be used to bucket groups of rows and can be adjusted by changing the number in the brackets.


Alt Text


Access a total from an earlier row

The Sales Manager needs one last thing before the targets for next year can be locked in. A table that shows the ‘big customers’ identified earlier and the dates they are making orders.

What the Sales Manager needs for this last piece of work is an idea of how many days there are between orders for these big customers. If she can anticipate when they are going to buy next we can get in touch with them earlier.

select
  customerid, 
  subtotal, 
  sale_date,
  lag(sale_date) over(order by sale_date) as last_order,
from sales.salesorderheader 
where 
  customerid = 11078
  and sale_date between 
'2018-01-01' and '2019-01-01'

By using LAG() we are able to reach up and return the result from the previous row and add it to a new column. LEAD() works in the same way but in reverse.


Alt Text


Use Window Functions to perform calculations

To make this useful for the Sales Manager we can use CAST and a new column to calculate the difference between the dates.

select
  customerid, 
  subtotal, 
  sale_date,
  lag(sale_date) over(order by sale_date) as last_order,
  cast(sale_date - (lag(sale_date) 
over(order by sale_date)) as int) as days_between
from sales.salesorderheader 
where 
  customerid = 11078
  and sale_date between '2018-01-01' and '2019-01-01'

Alt Text


I hope you’ve enjoyed these two posts on SQL window functions. How do you use these in your work? Do you find them a useful and powerful tool?


Read more


This post originally appeared on helenanderson.co.nz

Data Analysis (14 Part Series)

1) Speed up your queries with indexes 2) Resources for beginner data analysts 3 ... 12 3) SQL concepts from A to Z 4) 6 ways to search with System Tables 5) Pearl Jam, Nirvana and Beautiful Data Visualisations 6) Entity Relationship Diagrams explained by Sonic the Hedgehog 7) How to use the VLOOKUP Excel function 8) SQL Joins without the Venn diagrams 9) SQL aggregate functions and Mariah Carey 10) Why I use SQL CTEs 11) How to customise SQL Server Management Studio (SSMS) 12) Have you VACUUMed your tables lately? 13) Why you need SQL window functions (part 1) 14) Why you need SQL window functions (part 2)

Posted on May 26 by:

helenanders26 profile

Helen Anderson

@helenanders26

Making applications go faster at Raygun, AWS Data Hero, and tag moderator on Dev.to. Database concept you don’t understand? Let me know, I’ll write a post!

Discussion

markdown guide
 

One issue with window functions is that you can't use the resulting value in the WHErE subclause or as a JOIN predicate. However you can get around this by storing the original query in a common table expression and then you CAN use the value as it is now encapsulated in the driving dataset

 

Thanks Aaron, that's a great tip :D

 

The best part of ntile being I use it a lot to provide custom names to the rankings. For e.g. giving names like HIGH, AVERAGE and LOW based on the groupings done by ntile. A combination of case, ntile and sum covers a lot of use-cases.