Data Analysis (14 Part Series)
Welcome to the world of window functions.
Window functions are closely related to aggregate functions. But rather than collapsing all the rows we want into one row, we keep them. Then we add a new column with a running total, rank or moving average. This becomes our ‘window frame.’
It becomes clearer with examples and pictures, so if you’re interested in using this powerful tool, keep reading.
Window functions come in three main types. These are:
These use aggregate functions like SUM, COUNT, MAX, MIN over a set of rows and return a single result from the query.
These assign a ‘rank’ to a set of rows and using RANK, DENSE_RANK, ROW_NUMBER, NTILE
These use LAG, LEAD, FIRST_VALUE, LAST_VALUE to access a previous row without having to do a self-join.
To get started with Window Functions lets begin with the syntax for an aggregate window function and how each part works.
You will need:
- The function you want to perform: AVG, SUM, COUNT
- An indication you want to use this Function over multiple rows: OVER
- How you want to group your rows – PARTITION
- How you want to order your rows – ORDER BY
- To make sense of the syntax and show the use cases I’m going to tackle some real-life problems faced by a Sales Analyst.
In this example, the Sales Manager has come to us with a request.
She is setting targets for the team for next year and needs historic data for her decision making. We have data available in the sales database but need it in a format that’s easier to use than the raw table.
First, she’d like to see daily sales totals but also wants to be able to see each orderid, so she can ‘drill down’ if need be.
We could tackle this using SUM to total up all the rows on the Orders table. But this collapses down the order details. A window function will allow us to see each order with a running total for the Sales Manager.
select sale_date, salesorderid, subtotal, sum(subtotal) over(partition by sale_date order by salesorderid) as total_sales from sales.salesorderheader where orderdate between '2018-01-01 00:00:00:000' and '2018-12-31 00:00:00:000' order by sale_date
By using a window function, we can see each order for each day, and it’s total, with a running total along the side. Other aggregate functions work the same way, so you can use COUNT, AVG, MIN or MAX, or in combination.
The Sales Manager is back.
She was happy with the table created for setting targets but now needs some strategies to increase sales. This time she wants to see sales for 2018 by the customer and dollar value. If she can spot when those big sales were in the past, maybe she could go knock on their door again?
To give her the data she needs we need to use a Window Function with a RANK.
select sale_date, salesorderid, subtotal, rank() over(order by subtotal desc) as sales_rank from sales.salesorderheader where orderdate between '2018-01-01 00:00:00:000' and '2018-12-31 00:00:00:000' order by sale_date
Using RANK() in a window function ranks each row, in this case, by subtotal. The Sales Manager can now decide if she wants to target those customers again in the coming year.
If there are identical rows, like in the example above, they all receive the same rank and places 2 and 3 are skipped. To get around this we can use DENSE_RANK which would have kept rank 2 and 3 before moving on to fourth place.
There’s much more to learn and useful ways to use window functions, so check out part two in this series.
This post originally appeared on helenanderson.co.nz