Brian Muriithi

Posted on Apr 21

SQL Window Functions and CTEs

#sql #beginners #database #postgressql

You can do quite a lot with some of the most popular SQL queries such as SELECT, WHERE, and JOIN. But there exists some question that you cannot answer with these queries such as What is each student's mark compared to their own average?" or "Can we rank results without losing any rows?"

That is where window functions and CTEs come in. They sound intimidating, but once they click, you will wonder how you ever wrote SQL without them.

Part 1 - Window Functions

What even is a "window"?

Think about a regular GROUP BY query. When you group data, the individual rows collapse into summary rows. You lose the detail.

A window function is different. It lets you run a calculation across a set of rows while still keeping every single row in your result. The "window" is just the group of rows the function is looking at for each calculation.

Here is a simple way to picture it: imagine you have a classroom of students with exam scores. You want to see each student's score AND the class average on the same row. With GROUP BY alone, you cannot do that - you get one row for the average, not one row per student. A window function gives you both.

The basic syntax

SELECT
    column_name,
    FUNCTION_NAME() OVER (PARTITION BY column ORDER BY column)
FROM table_name;

The OVER() clause is what makes it a window function. Everything inside OVER() defines the window - which rows to include and in what order.

PARTITION BY - splits rows into groups (like GROUP BY, but rows are kept)
ORDER BY - sets the order within each group

ROW_NUMBER, RANK, and DENSE_RANK

These three are probably the most common window functions you will use early on. They all assign numbers to rows, but they handle ties differently.

SELECT
    student_id,
    marks,
    ROW_NUMBER()  OVER (ORDER BY marks DESC) AS row_num,
    RANK()        OVER (ORDER BY marks DESC) AS rank_num,
    DENSE_RANK()  OVER (ORDER BY marks DESC) AS dense_rank_num
FROM exam_results;

Say two students both scored 85. Here is what each function does:

marks	ROW_NUMBER	RANK	DENSE_RANK
90	1	1	1
85	2	2	2
85	3	2	2
78	4	4	3

ROW_NUMBER just counts - every row gets a unique number, no matter what
RANK gives tied rows the same number, then skips ahead (no rank 3 here)
DENSE_RANK gives tied rows the same number but does not skip - the next rank is always one higher Which one to use depends on the context. For leaderboards where gaps matter, use RANK. For position tracking where you need every position filled, use DENSE_RANK.

PARTITION BY - running calculations per group

This is where window functions really shine. PARTITION BY lets you do calculations per group without collapsing your data.

Here is a real example - showing each exam result alongside that student's personal average:

SELECT
    student_id,
    marks,
    ROUND(AVG(marks) OVER (PARTITION BY student_id), 2) AS student_avg
FROM exam_results;

Result:

student_id	marks	student_avg
1	72	79.00
1	86	79.00
2	91	88.50
2	86	88.50

Every row stays. Each student's average is calculated only across their own rows.

LAG - looking at the previous row

LAG() lets you pull a value from a previous row into the current row. It is very useful for tracking progress over time.

SELECT
    student_id,
    marks,
    LAG(marks) OVER (PARTITION BY student_id ORDER BY exam_date) AS previous_mark,
    marks - LAG(marks) OVER (PARTITION BY student_id ORDER BY exam_date) AS improvement
FROM exam_results;

student_id	marks	previous_mark	improvement
1	72	NULL	NULL
1	86	72	14
2	86	NULL	NULL
2	91	86	5

The first result per student is NULL because there is no previous row to look back at. That is expected and totally fine.

NTILE - splitting data into bands

NTILE(n) divides your rows into n roughly equal groups. It is great for performance banding.

SELECT
    result_id,
    marks,
    NTILE(3) OVER (ORDER BY marks DESC) AS band
FROM exam_results;

This gives you:

Band 1 - top third of scores
Band 2 - middle third
Band 3 - bottom third

Part 2 - CTEs (Common Table Expressions)

What is a CTE?

A CTE is basically a named, temporary result set that you define at the top of your query and then use below it - like giving a subquery a proper name.

The syntax looks like this:

WITH cte_name AS (
    SELECT ...
    FROM ...
)
SELECT *
FROM cte_name;

That WITH keyword kicks it off. Everything inside the parentheses is the CTE - it runs first, gets a name, and then your main query can reference it like a regular table.

Why bother? Can't I just use a subquery?

You can. But CTEs are much easier to read and debug, especially when queries get long. Compare these two approaches:

With a subquery:

SELECT student_id, avg_mark
FROM (
    SELECT student_id, ROUND(AVG(marks), 2) AS avg_mark
    FROM exam_results
    GROUP BY student_id
) AS student_averages
WHERE avg_mark > 75;

With a CTE:

WITH student_averages AS (
    SELECT student_id, ROUND(AVG(marks), 2) AS avg_mark
    FROM exam_results
    GROUP BY student_id
)
SELECT student_id, avg_mark
FROM student_averages
WHERE avg_mark > 75;

Same result. But the CTE version reads like a story - first we build the averages, then we filter them. When you come back to this query a week later, you will thank yourself.

A practical CTE example

Say we want to find patients from our hospital database who have had more than one appointment, and we want to show their name alongside the count.

WITH appointment_counts AS (
    SELECT
        patient_id,
        COUNT(appointment_id) AS total_appointments
    FROM appointments
    GROUP BY patient_id
)
SELECT
    p.full_name,
    ac.total_appointments
FROM patients p
JOIN appointment_counts ac
    ON p.patient_id = ac.patient_id
WHERE ac.total_appointments > 1
ORDER BY ac.total_appointments DESC;

The CTE handles the counting. The main query handles the joining and filtering. Each part does one job - and you can read it top to bottom without getting lost.

Quick recap

Window functions:

Run calculations across rows without collapsing them
OVER() is what makes a function a window function
PARTITION BY groups rows, ORDER BY sets the order within the group
ROW_NUMBER, RANK, DENSE_RANK for ranking
LAG for comparing to previous rows
NTILE for banding data into groups CTEs:
Named temporary result sets defined with WITH
Make long queries much easier to read and debug
Can be chained - one CTE can reference a previous one
Great for separating logic into clear steps Both tools do one important thing - they make your SQL easier to reason about. And when queries are easy to read, they are easy to fix, easy to share, and easy to build on.

Follow me along on this journey as we explore Data Science and Analytics

DEV Community