Lessons learnt from the 8 week sql challenge: window functions & ranking

#sql #bigquery #8weeksqlchallenge

👩🏼‍💻 using Google BigQuery SQL syntax
📁 check the Github repository for context

For the longest time I've had a love/hate relationship with WINDOW functions in SQL but recently had a moment where I finally started seeing some of the light.

DENSE_RANK RANK and ROW_NUMBER are amongst the more powerful WINDOW functions and it's important to understand how they behave differently so you make the right choice for your use case.

Let's take a closer look at an example to help frame things up.

The query below ...

SELECT 
s.customer_id,
s.order_date,
m.product_name, 
ROW_NUMBER () OVER (PARTITION 
  BY customer_id ORDER BY order_date ASC) AS row_number,
RANK () OVER (PARTITION 
  BY customer_id ORDER BY order_date ASC) AS rank,
DENSE_RANK () OVER(PARTITION 
  BY customer_id ORDER BY order_date ASC) AS dense_rank
FROM sqlchallenge_week1.sales AS s
  JOIN sqlchallenge_week1.menu AS m
    ON s.product_id = m.product_id
ORDER BY customer_id;

... will generate the output below.

Can you tell from the image above how a WINDOW function actually behaves and how it's different from a standard AGGREGATE function?

WINDOW functions create a partition of the table based on a set of parameters. In our the case the partition is created for every customer id, the records in that partition are sorted by order date ascending.
Next we're calling a function on that partition where for every row in a value is generated that will go into a new column that gets added to the table.

When you look at the example in our table you'll notice that results are not grouped meaning the same amount of rows (14) went in as came out. And that's how WINDOW functions (also called analytical functions) are different from AGGREGATE functions. In case of our example an AGGREGATE function like COUNT() would have grouped all the rows per customer id to then count the rows in each group and return a single value per customer id resulting in the output below.

Now let's go back to our WINDOW functions and take a look at how RANK, DENSE_RANK and ROW_NUMBER are different from each other.

ROW_NUMBER
Every row in the partition or window gets assigned a number which starts at 1 being the first row in the ordered partition and so on. You'll have noticed every row in the window is assigned a unique number and the order of those numbers is sequential with increments of 1.

RANK
When assigning a number to each row in the partition or window we follow the same logic as ROW_NUMBER. The only difference being that rows with the same order date receive the same rank number, and the rank number for every row in the next subset isn't incremented by 1 but 1 + number of rows in the previous subset.

DENSE_RANK
For DENSE_RANK every row in our subset in the partition gets assigned a unique number which is incremented by 1 for next subset.

Hope you find this useful, let me know in the comments if any questions.

📚 For a broader view on windows functions I can recommend the following blog posts:

An Easy Guide to Advanced SQL Window Functions

Take your SQL from Good to Great: Part 4 (Window Functions)