Nick

Posted on Jan 4, 2021 • Edited on Jan 7, 2021 • Originally published at wasabigeek.com

Window Functions, Visualized - Rankings

#postgres #sql

In the previous post, we talked about how Window Function Calls worked. Today, we'll dive deeper into the different Window Functions that are available, starting with row_number, rank, dense_rank and percent_rank.

We'll be using the same expenses table from the previous post, which had entries like:

description	cost
bus ride	3
lunch	15
...

In the following examples, our window frame will be the whole expenses table, ordered by highest to lowest cost:

SELECT
  [ window function ] OVER(ORDER BY cost DESC),
  expenses.description,
  expenses.cost
FROM expenses

row_number

Let's start with row_number:

... ROW_NUMBER() OVER(ORDER BY cost DESC), ...

All row_number does is add a running number for each row in the frame, starting from 1:

row_number	description	cost
1	groceries	60
2	dinner	35
3	taxi	20
4	lunch	15
5	lunch	15
6	supper	15
7	tea break	5
8	bus ride	4
9	bus ride	3
10	bus ride	3

dense_rank

dense_rank is more interesting. Reviewing the previous example, we can see that there were a few entries that have the same cost. The Postgres documentation refers to these as peer groups, and dense_rank adds a running number counting by the groups instead of rows:

So changing the SQL to this:

... DENSE_RANK() OVER(ORDER BY cost DESC), ...

Gives us this:

dense_rank	description	cost
1	groceries	60
2	dinner	35
3	taxi	20
4	lunch	15
4	lunch	15
4	supper	15
5	tea break	5
6	bus ride	4
7	bus ride	3
7	bus ride	3

Note: I'm actually not sure how Postgres orders the rows within a peer group - in my small example, it looks like an implicit ORDER BY ID DESC was added, but the Postgres docs also say for general ordering, if sorting is not chosen, the rows will be returned in an unspecified order 🤷‍♂️.

rank

rank has one big difference from dense_rank - it counts the "gaps" in the previous peer group:

Let's compare the results of each function to show the difference:

dense_rank	rank	description	cost
1	1	...	...
2	2	...	...
3	3	...	...
4	4	...	...
4	4	...	...
4	4	...	...
5	7	...	...
6	8	...	...
7	9	...	...
7	9	...	...

percent_rank

This is an interesting one - how it's derived is a bit of a mouthful, so let's show by example, starting with the query:

... PERCENT_RANK() OVER(ORDER BY cost DESC), ...

And the result:

percent_rank	description	cost
0	groceries	60
0.11...	dinner	35
0.22...	taxi	20
0.33...	lunch	15
0.33...	lunch	15
0.33...	supper	15
0.66...	tea break	5
0.77...	bus ride	4
0.88...	bus ride	3
0.88...	bus ride	3

Instead of a running number, we get the "relative rank" from 0 to 1 inclusive. This is useful if you need to know how close something is to the highest rank - though, as the above example shows, it may not end with 1 if there are ties for the highest rank (a deeper explanation here).

side-by-side

Finally, let's look at results side by side:

row_number	dense_rank	rank	percent_rank	description	cost
1	1	1	0	groceries	60
2	2	2	0.11...	dinner	35
3	3	3	0.22...	taxi	20
4	4	4	0.33...	lunch	15
5	4	4	0.33...	lunch	15
6	4	4	0.33...	supper	15
7	5	7	0.66...	tea break	5
8	6	8	0.77...	bus ride	4
9	7	9	0.88...	bus ride	3
10	7	9	0.88...	bus ride	3

I hope that helped! Here's an sqlfiddle you can play around with.

In future articles, we'll look into more window functions. Follow me on Twitter to be informed of the next one!

DEV Community

Window Functions, Visualized - Rankings

row_number

dense_rank

rank

percent_rank

side-by-side

Top comments (0)

Read next

ChartDB: New Functionalities Unlocked! 🎉

How to Detect and Defend Against SQL Injection Attacks - Part 2

🚀 Go-ing Full-Stack: Building Dynamic Web Apps with Go 🐹, PostgreSQL 🐘, Docker 🐳, and HTTP Servers 🌐

Geolocation in web apps made easy with haversine