Dror Atariah

Posted on Jul 9

I was there first! Finding the First in Every Group using SQL 🔥

This post shows how to use the FIRST_VALUE() window function in SQL to find the first occurrence in a partitioned group. We'll use a practical example with an employees table.

Setup

First, let's create and populate our employees table.

CREATE TABLE employees (emp_name VARCHAR, department VARCHAR, hire_date DATE);

INSERT INTO employees (emp_name, department, hire_date)
VALUES ('Alice', 'Sales', '2022-03-15'),
       ('Bob', 'Sales', '2023-01-20'),
       ('Charlie', 'Sales', '2022-08-01'),
       ('David', 'Engineering', '2021-07-10'),
       ('Eve', 'Engineering', '2021-05-22'),
       ('Frank', 'HR', '2024-01-05');

Finding the First Hired Employee

Now, we use FIRST_VALUE() to find the first employee hired in each department. The PARTITION BY department clause groups our data, and ORDER BY hire_date sorts it to find the "first" one.

SELECT emp_name,
       department,
       hire_date,
       FIRST_VALUE(emp_name) OVER (PARTITION BY department
                                   ORDER BY hire_date ASC) AS first_hired_in_dept
FROM employees;

Output

emp_name	department	hire_date	first_hired_in_dept
Alice	Sales	2022-03-15	Alice
Charlie	Sales	2022-08-01	Alice
Bob	Sales	2023-01-20	Alice
Eve	Engineering	2021-05-22	Eve
David	Engineering	2021-07-10	Eve
Frank	HR	2024-01-05	Frank

The first_hired_in_dept column is added to every row, showing who was hired first in that row's department.

Extracting a Unique List Per Department

To get a clean list of just the first-hired employee for each department, you can use a Common Table Expression (CTE) with DISTINCT.

WITH DepartmentFirstHires AS
  (SELECT department,
          FIRST_VALUE(emp_name) OVER (PARTITION BY department
                                      ORDER BY hire_date ASC) AS first_hired_in_dept
   FROM employees)
SELECT DISTINCT department,
                first_hired_in_dept
FROM DepartmentFirstHires;

Output

department	first_hired_in_dept
Sales	Alice
Engineering	Eve
HR	Frank

Alternatives to `FIRST_VALUE()`

Here are a couple of other common ways to get the same result.

Using `ROW_NUMBER()`

The ROW_NUMBER() window function can rank employees in each department by their hire date. You then select the one ranked first (rn = 1).

WITH RankedEmployees AS
  (SELECT emp_name,
          department,
          hire_date,
          ROW_NUMBER() OVER (PARTITION BY department
                             ORDER BY hire_date ASC) AS rn
   FROM employees)
SELECT emp_name,
       department,
       hire_date
FROM RankedEmployees
WHERE rn = 1;

Using a `JOIN` with a Subquery

Another method is to find the minimum hire date for each department in a subquery and then join the table back to itself to find the matching employee.

SELECT e.department,
       e.emp_name
FROM employees e
JOIN
  (SELECT department,
          MIN(hire_date) AS min_hire_date
   FROM employees
   GROUP BY department) AS first_hires_per_dept ON e.department = first_hires_per_dept.department
AND e.hire_date = first_hires_per_dept.min_hire_date;

Both of these are effective alternatives, though window functions like FIRST_VALUE() and ROW_NUMBER() are often more readable and performant for this type of problem.

Key Takeaways

FIRST_VALUE is a window function that requires an OVER() clause.
Use PARTITION BY to define your groups and ORDER BY to define what "first" means.
Window functions enrich every row; they don't aggregate them. Use DISTINCT or a subquery if you need a summary.
Be mindful of ties in your ORDER BY clause. Add more columns to the ORDER BY to ensure deterministic results if needed.

Which approach do you prefer? Why?
Let's discuss in the comments!

Bonus

For your convenice, here is a Python script that tests these queries using duckdb:

import duckdb
import pandas

# Connect to an in-memory DuckDB database
con = duckdb.connect(database=":memory:", read_only=False)

# 1. Setup: Create and populate the employees table
setup_sql = """
CREATE TABLE employees (emp_name VARCHAR, department VARCHAR, hire_date DATE);
INSERT INTO employees (emp_name, department, hire_date)
VALUES ('Alice', 'Sales', '2022-03-15'),
       ('Bob', 'Sales', '2023-01-20'),
       ('Charlie', 'Sales', '2022-08-01'),
       ('David', 'Engineering', '2021-07-10'),
       ('Eve', 'Engineering', '2021-05-22'),
       ('Frank', 'HR', '2024-01-05');
"""
con.execute(setup_sql)
print("--- Table created and populated. ---")
print("")

# 2. Test FIRST_VALUE() query
first_value_sql = """
SELECT emp_name,
       department,
       hire_date,
       FIRST_VALUE(emp_name) OVER (PARTITION BY department
                                   ORDER BY hire_date ASC) AS first_hired_in_dept
FROM employees;
"""
print("--- Testing FIRST_VALUE() ---")
result = con.execute(first_value_sql).fetchdf()
print(result)
print("")

# 3. Test FIRST_VALUE() with DISTINCT to get a unique list
first_value_distinct_sql = """
WITH DepartmentFirstHires AS
  (SELECT department,
          FIRST_VALUE(emp_name) OVER (PARTITION BY department
                                      ORDER BY hire_date ASC) AS first_hired_in_dept
   FROM employees)
SELECT DISTINCT department,
                first_hired_in_dept
FROM DepartmentFirstHires;
"""
print("--- Testing FIRST_VALUE() with DISTINCT ---")
result = con.execute(first_value_distinct_sql).fetchdf()
print(result)
print("")


# 4. Test ROW_NUMBER() alternative
row_number_sql = """
WITH RankedEmployees AS
  (SELECT emp_name,
          department,
          hire_date,
          ROW_NUMBER() OVER (PARTITION BY department
                             ORDER BY hire_date ASC) AS rn
   FROM employees)
SELECT emp_name,
       department,
       hire_date
FROM RankedEmployees
WHERE rn = 1;
"""
print("--- Testing ROW_NUMBER() alternative ---")
result = con.execute(row_number_sql).fetchdf()
print(result)
print("")


# 5. Test JOIN with subquery alternative
join_subquery_sql = """
SELECT e.department,
       e.emp_name
FROM employees e
JOIN
  (SELECT department,
          MIN(hire_date) AS min_hire_date
   FROM employees
   GROUP BY department) AS first_hires_per_dept ON e.department = first_hires_per_dept.department
AND e.hire_date = first_hires_per_dept.min_hire_date;
"""
print("--- Testing JOIN with subquery alternative ---")
result = con.execute(join_subquery_sql).fetchdf()
print(result)
print("")

# Close the connection
con.close()

DEV Community

I was there first! Finding the First in Every Group using SQL 🔥

Setup

Finding the First Hired Employee

Output

Extracting a Unique List Per Department

Output

Alternatives to `FIRST_VALUE()`

Using `ROW_NUMBER()`

Using a `JOIN` with a Subquery

Key Takeaways

Bonus

Top comments (0)

Setup

Finding the First Hired Employee

Output

Extracting a Unique List Per Department

Output

Alternatives to FIRST_VALUE()

Using ROW_NUMBER()

Using a JOIN with a Subquery

Key Takeaways

Bonus

Alternatives to `FIRST_VALUE()`

Using `ROW_NUMBER()`

Using a `JOIN` with a Subquery