zchtodd

Posted on Oct 7, 2019

#SQL30 Day 5: Hierarchies

#sql #database #postgres #challenge

Welcome to the SQL showdown series!

What is this and how does it work?

I'm committing to publishing a SQL challenge every day for 30 days. In each post I'll describe my solution to the last day's challenge. I'll follow that up with a full description of the challenge for the next day.

Write your own solution in the comments! Let's see who can come up with the most creative solutions.

I'll add connection details for a PostgreSQL database containing test data for the challenge. Solutions are by no means limited to PostgreSQL, but it's there if you want an easy way to test your query!

Challenge #5: Hierarchies

Today we're going to tackle dealing with hierarchical or graph-like data. Imagine something like the Oracle of Bacon that finds the most direct relationship from one actor to Kevin Bacon. In our case, we'll use a made-up corporate org chart to represent a hierarchy of relationships.

Here's the challenge:

Can you take a specific employee ID and return a path up the org chart to the CEO, listing all of the employees on that path?

The only table in this challenge is the employee table. These are the columns that make up the employee table:

employee_id
name
manager_id

Here's an example to give you a better idea of the output you're after:

In this case, we start with Rudy and proceed up the chain until we reach Tonie, who has no manager, and who we assume is the CEO.

Sandbox Connection Details

I have a PostgreSQL database ready for you to play with.

The password is the same as the username! To query the employee table:

SELECT * FROM day5.employee;

Solution for Challenge #4

This is the question we were trying to answer with yesterday's SQL challenge about video game sales:

Can you produce a report that displays one year per row, and the aggregated global sales by genre for that year as columns?

The task is to turn rows into columns, so that we end up with every genre as a column. This is kind of like an Excel pivot table.

Some databases, such as SQL Server, have a PIVOT function that does exactly this. PostgreSQL, however, does not have the PIVOT function. The tablefunc extension module provides a CROSSTAB function for accomplishing the same thing, but let's stick with a vanilla PostgreSQL installation.

Here's how I went about solving this challenge:

SELECT
  year AS "Year",
  SUM(CASE WHEN genre = 'Action' THEN global_sales END) AS "Action",
  SUM(CASE WHEN genre = 'Adventure' THEN global_sales END) AS "Adventure",
  SUM(CASE WHEN genre = 'Fighting' THEN global_sales END) AS "Fighting",
  SUM(CASE WHEN genre = 'Misc' THEN global_sales END) AS "Misc",
  SUM(CASE WHEN genre = 'Platform' THEN global_sales END) AS "Platform",
  SUM(CASE WHEN genre = 'Puzzle' THEN global_sales END) AS "Puzzle",
  SUM(CASE WHEN genre = 'Racing' THEN global_sales END) AS "Racing",
  SUM(CASE WHEN genre = 'Role-Playing' THEN global_sales END) AS "Role-Playing",
  SUM(CASE WHEN genre = 'Shooter' THEN global_sales END) AS "Shooter",
  SUM(CASE WHEN genre = 'Simulation' THEN global_sales END) AS "Simulation",
  SUM(CASE WHEN genre = 'Sports' THEN global_sales END) AS "Sports",
  SUM(CASE WHEN genre = 'Strategy' THEN global_sales END) AS "Strategy"
FROM day4.videogame GROUP BY year ORDER BY year;

If you've never seen CASE statements before, they're similar to ternaries in a language like JavaScript. If we take the first column, it would be similar to the following in JavaScript:

genre == "Action" ? global_sales : 0;

Usually you'll see CASE statements that have an ELSE clause, but here I'm relying on the fact that the default ELSE clause returns NULL, which when passed to the SUM function is equivalent to zero.

By grouping on year, each of the SUM functions is only adding up the sales for one genre over that year after the CASE statement filters out other genres.

This approach is obviously a little tedious, and only works when you know your desired columns ahead of time.

Good luck!

Have fun, and I can't wait to see what you come up with! I'll be back tomorrow with a solution to this problem and a new challenge.

Top comments (1)

Sam Mason • Oct 29 '19 • Edited

found a bit more time! have known about recursive queries for a while but never had to use them outside of exercises like this.

the direct answer to your question would be something like:

WITH RECURSIVE tree AS (
    SELECT * FROM employee WHERE employee_id = 1
  UNION ALL
    SELECT p.*
    FROM employee p
      JOIN tree c ON p.employee_id = c.manager_id
)
SELECT employee_id, name FROM tree;

but it's more interesting to handle the case of more than one employee, and how you do something sensible with the results. if you're expecting this to be handled by code that's good with trees/graphs like this then, the following seems nice:

WITH RECURSIVE tree AS (
    SELECT * FROM employee WHERE employee_id IN (1, 6)
  UNION
    SELECT p.*
    FROM employee p, tree c
    WHERE p.employee_id = c.manager_id  -- gratuitous change of syntax!
)
SELECT employee_id, name, manager_id FROM tree;

the only meaningful change is to use a UNION instead of a UNION ALL, but suppose a plain UNION could have been used in the first query.

if you wanted something that's directly nice for presentation then it might be useful to track the path up the tree:

WITH RECURSIVE tree AS (
    SELECT employee_id AS leaf_employee_id, 0 AS depth, * FROM employee
  UNION ALL
    SELECT leaf_employee_id, depth + 1, p.*
    FROM employee p, tree c
    WHERE p.employee_id = c.manager_id
)
SELECT leaf_employee_id, depth, employee_id, name
FROM tree
WHERE leaf_employee_id IN (1, 2)
ORDER BY 1, 2;