How and When to Use SQL Window Functions

#sql #datascience #postgres #database

You may think SQL window functions are similar to aggregating but it actually defines a window of rows with a given length around the current row.

You can do,

aggregate operations: SUM, COUNT, MAX, MIN
ranking: RANK, DENSE_RANK, ROW_NUMBER, NTILE
value: LAG, LEAD, FIRST_VALUE, LAST_VALUE

Let's give a concrete example to understand what exactly a window function is doing.

Consider you are a data scientist working on a classification task and you want to predict if an event will occur within the next three hours. In your data, the label is 1 if the event occurred at that exact time and 0 otherwise. Therefore you need a backfill operation to build the target. You can do it by using SQL window functions. Let's create the data frame first,

We don't need to create a database, we will use the pandasql library for executing SQL queries over pandas data frames (download via pip install pandas and pip install pandasql).

# import libraries 
import pandas as pd

data = [
    ("2020-01-01 06:00:00", 0),
    ("2020-01-01 07:00:00", 0),
    ("2020-01-01 08:00:00", 0),
    ("2020-01-01 09:00:00", 0),
    ("2020-01-01 10:00:00", 0),
    ("2020-01-01 11:00:00", 1),
    ("2020-01-01 12:00:00", 0),
    ("2020-01-01 13:00:00", 0),
    ("2020-01-01 14:00:00", 0),
    ("2020-01-01 15:00:00", 0),
]

# create data frame
df = pd.DataFrame(data, columns=["datetime", "event"]) 

df.head(10)

Here is the dataframe:

Let's solve the problem:

from pandasql import sqldf

pysqldf = lambda q: sqldf(q, globals())

solution = pysqldf(
"""
SELECT datetime, event, MAX(event) OVER
(
    ORDER BY datetime ASC
    ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING
) as target
FROM df
"""
)

solution.head(10)

We ordered the rows using datetime column in ascending order and filled the window rows with the MAX value. This is the data frame we get with the target column:

We labeled the previous 3 rows before 11am in the target column.

DEV Community

How and When to Use SQL Window Functions

Top comments (0)