DEV Community

Cover image for Self-Join in Sheets, SQL, Python, and JavaScript
Peter Hozák
Peter Hozák

Posted on

Self-Join in Sheets, SQL, Python, and JavaScript

Let's start with a self-referencing data table employees:

id  name                 manager_id
1   Djan Seriy Anaplian
2   Fal 'Ngeestra
3   Turminder Xuss       1           --->  Djan Seriy Anaplian
4   Jase                 2           --->  Fal 'Ngeestra
Enter fullscreen mode Exit fullscreen mode

How can we get the other record?

Sheets

As with so many questions in Excel, Numbers, Calc, or Sheets, let's solve it by adding a column with an INDEX and MATCH formula:

# 1. find row number of the matching record
=match(C2, A:A, 0)

# 2. get value from the other record
=index(B:B, match(C2, A:A, 0))

# 3. handle errors
=iferror(index(B:B, match(C2, A:A, 0)), "")
Enter fullscreen mode Exit fullscreen mode

You can try out the example in Sheets:
google sheets with INDEX and MATCH formula for self-join employees table

SQL

In virtually all database query languages, joins to a table itself work exactly like joins between tables:

select
  e.name employee,
  m.name manager
from employees e
left join employees m on e.manager_id = m.id
Enter fullscreen mode Exit fullscreen mode

Try out in SQL Fiddle:
left join in sql

Python

To represent the "normal form" of our data in Python, let's choose a dictionary:

employees = {
    1: {'name': 'Djan Seriy Anaplian'},
    2: {'name': 'Fal \'Ngeestra'},
    3: {'name': 'Turminder Xuss', 'manager_id': 1},
    4: {'name': 'Jase', 'manager_id': 2},
}
Enter fullscreen mode Exit fullscreen mode

We want to change it to a list of nested dictionaries, replacing each manager_id pointer with a live manager object, to give ourselves the most freedom to explore. Using imperative procedural code style with a walrus operator := and dict.pop(), that might look like following:

for em in employees.values():
    if manager_id := em.pop('manager_id', None):
        em['manager'] = employees[manager_id]

employees = list(employees.values())
employees[0]['short'] = 'Anaplian'
employees[1]['short'] = 'Fal'
Enter fullscreen mode Exit fullscreen mode

This example lives on Repl.it. Will it work for a deeper management hierarchy? Would it break for a cyclical structure?

Let me know if you would like to see other code styles in Python, functional like JS below or FRP like the Sheets above (with Pandas).

JavaScript / TypeScript

We can start with the exact same employees object as in the Python example, but this time we choose a more team-friendly approach inspired by functional programming style that avoids mutation and side effects, because the nested objects might be used in other parts of the project. We also want to avoid too many levels of nesting to keep it simple, so we replace manager_id with an optional immutable string managerName:

const employeesWithManager = Object.values(employees)
  .map(({name, manager_id}) => ({
    name,
    ...(manager_id && {
      managerName: employees[manager_id].name
    }),
  }))
Enter fullscreen mode Exit fullscreen mode

You can play with a TypeScript version on TS Playground, or run the JavaScript example directly in your browser console:
js example in browser console

Have you noticed how we can use the spread syntax in object literal ({...expr}) with a more complex expression, not just a variable name? Can you refactor the code to use an explicit variable at that place? Which version is more readable for you and for your project coworkers?


👋 Hi, I am Peter Hozák, a.k.a. Aprillion. Do you have any follow-up or similar questions? Let's discuss in the comments below or on Twitter.

Top comments (0)