Forem

Cover image for Window Functions in PostgreSQL using DBeaver
Sharon M.
Sharon M.

Posted on • Edited on

Window Functions in PostgreSQL using DBeaver

Overview


In the previous article, we saw how joins allow us to combine data from multiple related tables. Using a small retail store database, we connected the departments, employees and sales tables to answer various questions about the business.

Joins are essential in relational databases in that, when information is stored across several tables, joins bring the data together so that it can be analyzed and actually make sense.

However, many analytical questions require something slightly different. Sometimes we first combine tables using joins, and other times we work directly with a single table. In both cases, the goal is the same: to analyze how rows relate to other rows in the dataset.

For example, using our data we might want to answer questions like:

  • Which employee has the highest sales within each department?
  • What is the running total of sales over time?
  • How does each sale compare to the previous sale?
  • How can employees be ranked based on their performance?

These types of questions require calculations that look at multiple rows at the same time while still keeping every row visible in the result. This is where window functions become useful.

Window functions allow SQL to perform calculations across a group of related rows without collapsing the results into a single summary row. Unlike traditional aggregate functions such as SUM() or AVG(), which return one result for a group of rows, window functions perform calculations across multiple rows, but they keep every row visible in the result.

In PostgreSQL, a function becomes a window function when it is used together with the OVER() clause. The OVER() clause defines the window, which is the set of rows the function will use during its calculation.

Window functions are commonly used to:

  • Rank rows within groups (assign ranking positions to rows based on values such as sales performance or salary).
  • Calculate running totals (compute cumulative values across ordered rows, such as tracking total sales over time).
  • Compare values between consecutive rows.
  • Divide data into performance groups (divide rows into buckets based on metrics like revenue or sales volume).
  • Analyze trends over time (examine how values change across an ordered sequence such as dates or transactions).

We will see these as we demonstrate each window function using the store dataset.


Key Components of Window Functions

Window functions rely on several components that determine how rows are grouped and ordered during a calculation. The most important element is the OVER() clause which may also include:

  • PARTITION BY
  • ORDER BY
  • Window frame clauses (Optional)

OVER()

As we earlier stated, a regular SQL function becomes a window function when it is used together with OVER(). OVER() defines the set of rows (the window) that the function should consider during the calculation.

General syntax:

function_name(expression)
OVER (
    PARTITION BY column
    ORDER BY column
)
Enter fullscreen mode Exit fullscreen mode

PARTITION BY

PARTITION BY divides rows into groups (partitions). Each partition is processed independently by the window function.

Example:

PARTITION BY department_id
Enter fullscreen mode Exit fullscreen mode

This ensures calculations are performed separately for each department.

ORDER BY

The ORDER BY inside the OVER() statement defines the order of rows within each partition.

Example:

ORDER BY sale_date
Enter fullscreen mode Exit fullscreen mode

With time you will observe that ordering is important when performing calculations such as ranking, running totals or comparisons between rows.

Frame Clause (Optional)

A window frame further limits which rows are used in a calculation within a partition. Frame clauses define which rows relative to the current row should be included in the calculation.

Example:

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Enter fullscreen mode Exit fullscreen mode

This frame starts from the first row of the partition and continues up to the current row. It is commonly used when calculating cumulative totals.


The Store Database

To demonstrate window functions, we will continue using the same retail store dataset introduced in the previous article on joins.

The database contains three tables:

  • departments
    departments table Each department has multiple employees.

  • employees
    employees table Each employee belongs to one department.

  • sales
    sales table Employees generate sales that are recorded in this table.


Demonstrating Window Functions Using the Store Tables

In PostgreSQL, there is no completely separate window function library. Instead, several built-in functions can operate as window functions when they are used with the OVER() clause. These functions generally fall into two categories:

1. Dedicated Window Functions

Dedicated window functions are specialized analytical functions designed to operate across a set of rows related to the current row without collapsing the result into a single output row.

We have:

- Ranking Functions

Ranking functions assign a ranking value to each row within a partition. They include:


a. ROW_NUMBER()

Assigns a unique, sequential number to each row within a partition. The numbering starts at 1 and increases based on the order specified in the ORDER BY clause. If PARTITION BY is used, the numbering restarts for each partition.

This function is often used to:

  • Determine the order of events.
  • Retrieve the first or latest record in a group.
  • Rank rows uniquely even when values are identical.

Example:

Question: In what order did each employee record their sales transactions?

SELECT e.employee_name, s.sale_date, s.amount,
ROW_NUMBER() OVER (
        PARTITION BY e.employee_id
        ORDER BY s.sale_date
    ) AS sale_sequence
FROM employees e
JOIN sales s
ON e.employee_id = s.employee_id;
Enter fullscreen mode Exit fullscreen mode

JOIN - The employees and sales tables are joined so that each sale can be associated with the employee who recorded it.

PARTITION BY - This divides the result set into partitions for each employee. Each employee’s sales are processed independently and the row numbering resets for each employee.

ORDER BY - This sorts the sales chronologically within each employee’s partition.

Output:
ROW_NUMBER()


b. RANK()

Assigns a ranking value to each row based on the order defined in the ORDER BY clause. If two or more rows have the same value, they receive the same rank. The next rank number is then skipped, creating a gap in the ranking sequence. For example, if two rows share rank 1, the next row will receive rank 3 instead of 2.

This function is often used to:

  • Rank employees by performance.
  • Identify top-selling employees.
  • Determine position within a leaderboard.

Currently, there are no ties in my dataset. To demonstrate how RANK() behaves when ties occur, I inserted an additional sale for Daenerys:

INSERT INTO sales (employee_id, sale_date, amount)
VALUES (4, '2025-01-04', 4000);
Enter fullscreen mode Exit fullscreen mode

Output:
inserted an additional sale for Daenerys

Example

Question: Which sales transactions generated the highest revenue?

SELECT 
    e.employee_name,
    s.amount,
    RANK() OVER (
        ORDER BY s.amount DESC
    ) AS sales_rank
FROM employees e
JOIN sales s
ON e.employee_id = s.employee_id;
Enter fullscreen mode Exit fullscreen mode

Output:
RANK()


c. DENSE_RANK()

Also assigns a rank to each row but without gaps in the sequence, even if there are ties. This function is useful when continuous ranking positions are required in reports or analyses.

Example:

Question: Which sales transactions generated the highest revenue (while maintaining a continuous ranking sequence)?

SELECT 
    e.employee_name,
    s.amount,
    DENSE_RANK() OVER (
        ORDER BY s.amount DESC
    ) AS sales_dense_rank
FROM employees e
JOIN sales s
ON e.employee_id = s.employee_id;
Enter fullscreen mode Exit fullscreen mode

Output:
DENSE_RANK()

NOTE: The difference between RANK() and DENSE_RANK() becomes clear when two rows share the same value. In such cases, RANK() introduces gaps in the ranking sequence, while DENSE_RANK() maintains a continuous ranking order.


d. NTILE(n)

Divides rows into n roughly equal-sized groups (tiles or buckets). Then it returns the bucket number assigned to each row.

This function is often used in:

  • Grouping customers into spending levels.
  • Dividing employees into performance groups.
  • Creating quartiles (NTILE(4)), deciles (NTILE(10)) or percentiles for analysis.

Example:

Question: How can employees be divided into performance groups based on their total sales?

SELECT e.employee_name,
    SUM(s.amount) AS total_sales,
    NTILE(2) OVER (
        ORDER BY SUM(s.amount) DESC
    ) AS performance_group
FROM employees e
JOIN sales s
ON e.employee_id = s.employee_id
GROUP BY e.employee_name;
Enter fullscreen mode Exit fullscreen mode

Output:
NTILE(n)


- Offset (Navigation) Functions

In the next part we will look into navigation (offset) functions such as LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE() and NTH_VALUE().

I hope this article helps you to better understand how window functions are used in SQL.

Top comments (0)