This post shows how to use the FIRST_VALUE()
window function in SQL to find the first occurrence in a partitioned group. We'll use a practical example with an employees
table.
Setup
First, let's create and populate our employees
table.
CREATE TABLE employees (emp_name VARCHAR, department VARCHAR, hire_date DATE);
INSERT INTO employees (emp_name, department, hire_date)
VALUES ('Alice', 'Sales', '2022-03-15'),
('Bob', 'Sales', '2023-01-20'),
('Charlie', 'Sales', '2022-08-01'),
('David', 'Engineering', '2021-07-10'),
('Eve', 'Engineering', '2021-05-22'),
('Frank', 'HR', '2024-01-05');
Finding the First Hired Employee
Now, we use FIRST_VALUE()
to find the first employee hired in each department. The PARTITION BY department
clause groups our data, and ORDER BY hire_date
sorts it to find the "first" one.
SELECT emp_name,
department,
hire_date,
FIRST_VALUE(emp_name) OVER (PARTITION BY department
ORDER BY hire_date ASC) AS first_hired_in_dept
FROM employees;
Output
emp_name | department | hire_date | first_hired_in_dept |
---|---|---|---|
Alice | Sales | 2022-03-15 | Alice |
Charlie | Sales | 2022-08-01 | Alice |
Bob | Sales | 2023-01-20 | Alice |
Eve | Engineering | 2021-05-22 | Eve |
David | Engineering | 2021-07-10 | Eve |
Frank | HR | 2024-01-05 | Frank |
The first_hired_in_dept
column is added to every row, showing who was hired first in that row's department.
Extracting a Unique List Per Department
To get a clean list of just the first-hired employee for each department, you can use a Common Table Expression (CTE) with DISTINCT
.
WITH DepartmentFirstHires AS
(SELECT department,
FIRST_VALUE(emp_name) OVER (PARTITION BY department
ORDER BY hire_date ASC) AS first_hired_in_dept
FROM employees)
SELECT DISTINCT department,
first_hired_in_dept
FROM DepartmentFirstHires;
Output
department | first_hired_in_dept |
---|---|
Sales | Alice |
Engineering | Eve |
HR | Frank |
Alternatives to FIRST_VALUE()
Here are a couple of other common ways to get the same result.
Using ROW_NUMBER()
The ROW_NUMBER()
window function can rank employees in each department by their hire date. You then select the one ranked first (rn = 1
).
WITH RankedEmployees AS
(SELECT emp_name,
department,
hire_date,
ROW_NUMBER() OVER (PARTITION BY department
ORDER BY hire_date ASC) AS rn
FROM employees)
SELECT emp_name,
department,
hire_date
FROM RankedEmployees
WHERE rn = 1;
Using a JOIN
with a Subquery
Another method is to find the minimum hire date for each department in a subquery and then join the table back to itself to find the matching employee.
SELECT e.department,
e.emp_name
FROM employees e
JOIN
(SELECT department,
MIN(hire_date) AS min_hire_date
FROM employees
GROUP BY department) AS first_hires_per_dept ON e.department = first_hires_per_dept.department
AND e.hire_date = first_hires_per_dept.min_hire_date;
Both of these are effective alternatives, though window functions like FIRST_VALUE()
and ROW_NUMBER()
are often more readable and performant for this type of problem.
Key Takeaways
-
FIRST_VALUE
is a window function that requires anOVER()
clause. - Use
PARTITION BY
to define your groups andORDER BY
to define what "first" means. - Window functions enrich every row; they don't aggregate them. Use
DISTINCT
or a subquery if you need a summary. - Be mindful of ties in your
ORDER BY
clause. Add more columns to theORDER BY
to ensure deterministic results if needed.
Which approach do you prefer? Why?
Let's discuss in the comments!
Bonus
For your convenice, here is a Python script that tests these queries using duckdb
:
import duckdb
import pandas
# Connect to an in-memory DuckDB database
con = duckdb.connect(database=":memory:", read_only=False)
# 1. Setup: Create and populate the employees table
setup_sql = """
CREATE TABLE employees (emp_name VARCHAR, department VARCHAR, hire_date DATE);
INSERT INTO employees (emp_name, department, hire_date)
VALUES ('Alice', 'Sales', '2022-03-15'),
('Bob', 'Sales', '2023-01-20'),
('Charlie', 'Sales', '2022-08-01'),
('David', 'Engineering', '2021-07-10'),
('Eve', 'Engineering', '2021-05-22'),
('Frank', 'HR', '2024-01-05');
"""
con.execute(setup_sql)
print("--- Table created and populated. ---")
print("")
# 2. Test FIRST_VALUE() query
first_value_sql = """
SELECT emp_name,
department,
hire_date,
FIRST_VALUE(emp_name) OVER (PARTITION BY department
ORDER BY hire_date ASC) AS first_hired_in_dept
FROM employees;
"""
print("--- Testing FIRST_VALUE() ---")
result = con.execute(first_value_sql).fetchdf()
print(result)
print("")
# 3. Test FIRST_VALUE() with DISTINCT to get a unique list
first_value_distinct_sql = """
WITH DepartmentFirstHires AS
(SELECT department,
FIRST_VALUE(emp_name) OVER (PARTITION BY department
ORDER BY hire_date ASC) AS first_hired_in_dept
FROM employees)
SELECT DISTINCT department,
first_hired_in_dept
FROM DepartmentFirstHires;
"""
print("--- Testing FIRST_VALUE() with DISTINCT ---")
result = con.execute(first_value_distinct_sql).fetchdf()
print(result)
print("")
# 4. Test ROW_NUMBER() alternative
row_number_sql = """
WITH RankedEmployees AS
(SELECT emp_name,
department,
hire_date,
ROW_NUMBER() OVER (PARTITION BY department
ORDER BY hire_date ASC) AS rn
FROM employees)
SELECT emp_name,
department,
hire_date
FROM RankedEmployees
WHERE rn = 1;
"""
print("--- Testing ROW_NUMBER() alternative ---")
result = con.execute(row_number_sql).fetchdf()
print(result)
print("")
# 5. Test JOIN with subquery alternative
join_subquery_sql = """
SELECT e.department,
e.emp_name
FROM employees e
JOIN
(SELECT department,
MIN(hire_date) AS min_hire_date
FROM employees
GROUP BY department) AS first_hires_per_dept ON e.department = first_hires_per_dept.department
AND e.hire_date = first_hires_per_dept.min_hire_date;
"""
print("--- Testing JOIN with subquery alternative ---")
result = con.execute(join_subquery_sql).fetchdf()
print(result)
print("")
# Close the connection
con.close()
Top comments (0)