DEV Community

Dror Atariah
Dror Atariah

Posted on

I was there first! Finding the First in Every Group using SQL 🔥

This post shows how to use the FIRST_VALUE() window function in SQL to find the first occurrence in a partitioned group. We'll use a practical example with an employees table.

Setup

First, let's create and populate our employees table.

CREATE TABLE employees (emp_name VARCHAR, department VARCHAR, hire_date DATE);

INSERT INTO employees (emp_name, department, hire_date)
VALUES ('Alice', 'Sales', '2022-03-15'),
       ('Bob', 'Sales', '2023-01-20'),
       ('Charlie', 'Sales', '2022-08-01'),
       ('David', 'Engineering', '2021-07-10'),
       ('Eve', 'Engineering', '2021-05-22'),
       ('Frank', 'HR', '2024-01-05');
Enter fullscreen mode Exit fullscreen mode

Finding the First Hired Employee

Now, we use FIRST_VALUE() to find the first employee hired in each department. The PARTITION BY department clause groups our data, and ORDER BY hire_date sorts it to find the "first" one.

SELECT emp_name,
       department,
       hire_date,
       FIRST_VALUE(emp_name) OVER (PARTITION BY department
                                   ORDER BY hire_date ASC) AS first_hired_in_dept
FROM employees;
Enter fullscreen mode Exit fullscreen mode

Output

emp_name department hire_date first_hired_in_dept
Alice Sales 2022-03-15 Alice
Charlie Sales 2022-08-01 Alice
Bob Sales 2023-01-20 Alice
Eve Engineering 2021-05-22 Eve
David Engineering 2021-07-10 Eve
Frank HR 2024-01-05 Frank

The first_hired_in_dept column is added to every row, showing who was hired first in that row's department.

Extracting a Unique List Per Department

To get a clean list of just the first-hired employee for each department, you can use a Common Table Expression (CTE) with DISTINCT.

WITH DepartmentFirstHires AS
  (SELECT department,
          FIRST_VALUE(emp_name) OVER (PARTITION BY department
                                      ORDER BY hire_date ASC) AS first_hired_in_dept
   FROM employees)
SELECT DISTINCT department,
                first_hired_in_dept
FROM DepartmentFirstHires;
Enter fullscreen mode Exit fullscreen mode

Output

department first_hired_in_dept
Sales Alice
Engineering Eve
HR Frank

Alternatives to FIRST_VALUE()

Here are a couple of other common ways to get the same result.

Using ROW_NUMBER()

The ROW_NUMBER() window function can rank employees in each department by their hire date. You then select the one ranked first (rn = 1).

WITH RankedEmployees AS
  (SELECT emp_name,
          department,
          hire_date,
          ROW_NUMBER() OVER (PARTITION BY department
                             ORDER BY hire_date ASC) AS rn
   FROM employees)
SELECT emp_name,
       department,
       hire_date
FROM RankedEmployees
WHERE rn = 1;
Enter fullscreen mode Exit fullscreen mode

Using a JOIN with a Subquery

Another method is to find the minimum hire date for each department in a subquery and then join the table back to itself to find the matching employee.

SELECT e.department,
       e.emp_name
FROM employees e
JOIN
  (SELECT department,
          MIN(hire_date) AS min_hire_date
   FROM employees
   GROUP BY department) AS first_hires_per_dept ON e.department = first_hires_per_dept.department
AND e.hire_date = first_hires_per_dept.min_hire_date;
Enter fullscreen mode Exit fullscreen mode

Both of these are effective alternatives, though window functions like FIRST_VALUE() and ROW_NUMBER() are often more readable and performant for this type of problem.

Key Takeaways

  • FIRST_VALUE is a window function that requires an OVER() clause.
  • Use PARTITION BY to define your groups and ORDER BY to define what "first" means.
  • Window functions enrich every row; they don't aggregate them. Use DISTINCT or a subquery if you need a summary.
  • Be mindful of ties in your ORDER BY clause. Add more columns to the ORDER BY to ensure deterministic results if needed.

Which approach do you prefer? Why?
Let's discuss in the comments!

Bonus

For your convenice, here is a Python script that tests these queries using duckdb:

import duckdb
import pandas

# Connect to an in-memory DuckDB database
con = duckdb.connect(database=":memory:", read_only=False)

# 1. Setup: Create and populate the employees table
setup_sql = """
CREATE TABLE employees (emp_name VARCHAR, department VARCHAR, hire_date DATE);
INSERT INTO employees (emp_name, department, hire_date)
VALUES ('Alice', 'Sales', '2022-03-15'),
       ('Bob', 'Sales', '2023-01-20'),
       ('Charlie', 'Sales', '2022-08-01'),
       ('David', 'Engineering', '2021-07-10'),
       ('Eve', 'Engineering', '2021-05-22'),
       ('Frank', 'HR', '2024-01-05');
"""
con.execute(setup_sql)
print("--- Table created and populated. ---")
print("")

# 2. Test FIRST_VALUE() query
first_value_sql = """
SELECT emp_name,
       department,
       hire_date,
       FIRST_VALUE(emp_name) OVER (PARTITION BY department
                                   ORDER BY hire_date ASC) AS first_hired_in_dept
FROM employees;
"""
print("--- Testing FIRST_VALUE() ---")
result = con.execute(first_value_sql).fetchdf()
print(result)
print("")

# 3. Test FIRST_VALUE() with DISTINCT to get a unique list
first_value_distinct_sql = """
WITH DepartmentFirstHires AS
  (SELECT department,
          FIRST_VALUE(emp_name) OVER (PARTITION BY department
                                      ORDER BY hire_date ASC) AS first_hired_in_dept
   FROM employees)
SELECT DISTINCT department,
                first_hired_in_dept
FROM DepartmentFirstHires;
"""
print("--- Testing FIRST_VALUE() with DISTINCT ---")
result = con.execute(first_value_distinct_sql).fetchdf()
print(result)
print("")


# 4. Test ROW_NUMBER() alternative
row_number_sql = """
WITH RankedEmployees AS
  (SELECT emp_name,
          department,
          hire_date,
          ROW_NUMBER() OVER (PARTITION BY department
                             ORDER BY hire_date ASC) AS rn
   FROM employees)
SELECT emp_name,
       department,
       hire_date
FROM RankedEmployees
WHERE rn = 1;
"""
print("--- Testing ROW_NUMBER() alternative ---")
result = con.execute(row_number_sql).fetchdf()
print(result)
print("")


# 5. Test JOIN with subquery alternative
join_subquery_sql = """
SELECT e.department,
       e.emp_name
FROM employees e
JOIN
  (SELECT department,
          MIN(hire_date) AS min_hire_date
   FROM employees
   GROUP BY department) AS first_hires_per_dept ON e.department = first_hires_per_dept.department
AND e.hire_date = first_hires_per_dept.min_hire_date;
"""
print("--- Testing JOIN with subquery alternative ---")
result = con.execute(join_subquery_sql).fetchdf()
print(result)
print("")

# Close the connection
con.close()
Enter fullscreen mode Exit fullscreen mode

Top comments (0)