DEV Community

Cover image for How and When to Use SQL Window Functions
Musa Atlıhan
Musa Atlıhan

Posted on • Edited on

3 1

How and When to Use SQL Window Functions

You may think SQL window functions are similar to aggregating but it actually defines a window of rows with a given length around the current row.

You can do,

  1. aggregate operations: SUM, COUNT, MAX, MIN
  2. ranking: RANK, DENSE_RANK, ROW_NUMBER, NTILE
  3. value: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Let's give a concrete example to understand what exactly a window function is doing.

Consider you are a data scientist working on a classification task and you want to predict if an event will occur within the next three hours. In your data, the label is 1 if the event occurred at that exact time and 0 otherwise. Therefore you need a backfill operation to build the target. You can do it by using SQL window functions. Let's create the data frame first,

We don't need to create a database, we will use the pandasql library for executing SQL queries over pandas data frames (download via pip install pandas and pip install pandasql).

# import libraries 
import pandas as pd

data = [
    ("2020-01-01 06:00:00", 0),
    ("2020-01-01 07:00:00", 0),
    ("2020-01-01 08:00:00", 0),
    ("2020-01-01 09:00:00", 0),
    ("2020-01-01 10:00:00", 0),
    ("2020-01-01 11:00:00", 1),
    ("2020-01-01 12:00:00", 0),
    ("2020-01-01 13:00:00", 0),
    ("2020-01-01 14:00:00", 0),
    ("2020-01-01 15:00:00", 0),
]

# create data frame
df = pd.DataFrame(data, columns=["datetime", "event"]) 

df.head(10)
Enter fullscreen mode Exit fullscreen mode

Here is the dataframe:

dataframe head

Let's solve the problem:

from pandasql import sqldf

pysqldf = lambda q: sqldf(q, globals())

solution = pysqldf(
"""
SELECT datetime, event, MAX(event) OVER
(
    ORDER BY datetime ASC
    ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING
) as target
FROM df
"""
)

solution.head(10)
Enter fullscreen mode Exit fullscreen mode

We ordered the rows using datetime column in ascending order and filled the window rows with the MAX value. This is the data frame we get with the target column:

solution

We labeled the previous 3 rows before 11am in the target column.

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry 🕒

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay