Retiago Drago

Posted on

# The Number of Employees Which Report to Each Employee | LeetCode | MSSQL

## The Problem

Consider an `Employees` table with the following schema:

Column Name Type
employee_id int
name varchar
reports_to int
age int

`employee_id` is the primary key for this table. This table contains information about the employees and the id of the manager they report to. Some employees do not report to anyone (`reports_to` is null). For this problem, a manager is defined as an employee who has at least one other employee reporting to them. The task is to write an SQL query to report the ids and the names of all managers, the number of employees who report directly to them, and the average age of the reports rounded to the nearest integer.

## Explanation

Consider the following `Employees` table:

employee_id name reports_to age
9 Hercy null 43
6 Alice 9 41
4 Bob 9 36
2 Winston null 37

The expected output would be:

employee_id name reports_count average_age
9 Hercy 2 39

Here, Hercy has two people report directly to him, Alice and Bob. Their average age is `(41+36)/2 = 38.5`, which is `39` after rounding it to the nearest integer.

## The Solution

We will discuss four different SQL approaches to solve this problem, explaining their main differences, strengths, weaknesses, and underlying structures.

### Source Code 1: Join and Group By

This code uses a JOIN operation to combine rows from two or more tables based on a related column between them.

``````SELECT
m.employee_id,
m.name,
COUNT(e.reports_to) [reports_count],
ROUND(AVG(e.age * 1.0), 0) [average_age]
FROM Employees e JOIN Employees m ON e.reports_to = m.employee_id
GROUP BY
m.employee_id,
m.name
ORDER BY m.employee_id
``````

The GROUP BY clause groups the `employee_id` and `name` from the `m` alias (managers), calculates the count of reports and the average age. The `ROUND` function is used to round the average age to the nearest integer. The result is then ordered by `employee_id`.

This query runtime is 1559ms, beating 33.60% of other submissions on LeetCode.

### Source Code 2: Subquery, Join, and Group By

This code utilizes subqueries to first isolate the needed columns before performing the join operation, potentially improving performance.

``````SELECT
m.employee_id,
m.name,
COUNT(e.reports_to) [reports_count],
ROUND(AVG(e.age * 1.0), 0) [average_age]
FROM (
SELECT
reports_to,
age
FROM Employees
) e JOIN (
SELECT
employee_id,
name
FROM Employees
) m ON e.reports_to = m.employee_id
GROUP BY
m.employee_id,
m.name
ORDER BY m.employee_id
``````

By separating the `reports_to` and `age` in subquery `e`, and `employee_id` and `name` in subquery `m`, the join operation may be faster. This query has a runtime of 1446ms, beating 46.6% of other submissions on LeetCode.

### Source Code 3: Subquery, Join, Group By, and Calculated Average

This code also uses subqueries for the join operation, but it calculates the average differently by explicitly calculating the sum and dividing by the count.

``````SELECT
m.employee_id,
m.name,
COUNT(e.reports_to) [reports_count],
ROUND(SUM(e.age) * 1.0 / COUNT(e.reports_to), 0) [average_age]
FROM (
SELECT
reports_to,
age
FROM Employees
) e JOIN (
SELECT
employee_id,
name
FROM Employees
) m ON e.reports_to = m.employee_id
GROUP BY
m.employee_id,
m.name
ORDER BY m.employee_id
``````

By manually calculating the average (sum of ages divided by count), the rounding to the nearest integer can be more precise. This query runtime is 1122ms, beating 93.38% of other submissions on LeetCode.

### Source Code 4: Join, Group By, and Calculated Average

This code is similar to Source Code 3, but it does not use subqueries, reverting to a more direct join operation.

``````SELECT
m.employee_id,
m.name,
COUNT(e.reports_to) [reports_count],
ROUND(SUM(e.age) * 1.0 / COUNT(e.reports_to), 0) [average_age]
FROM Employees e JOIN Employees m ON e.reports_to = m.employee_id
GROUP BY
m.employee_id,
m.name
ORDER BY m.employee_id
``````

This solution combines the direct join approach from Source Code 1 and the explicit average calculation from Source Code 3. This query runtime is 1207ms, beating 79.18% of other submissions on LeetCode.

## Conclusion

From these solutions, we learn that the choice of SQL constructs and the way we express computations can impact performance significantly. Interestingly, using subqueries did not always lead to better performance, which emphasizes that SQL optimization can be a complex topic. The top-performing solution was Source Code 3, followed by Source Code 4, Source Code 2, and then Source Code 1. However, it's important to note that LeetCode's performance does not perfectly represent real-world database performance, as many other factors come into play, including indexing, database design, and data distribution.

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.