Each week I'll be combing through the #HelpMeCode hashtag on Twitter and looking for questions that I believe I can help answer and would benefit from a deeper dive than the Twitter platform offers.
Liquid error: internal
My post today is in response to a query from @AudTheCodeWitch. She has a specific question about a fairly common relational database structure
Ok #helpmecode, I'm working on structuring the database for a classroom library app. A User is either a Student OR a Teacher, and a Student belongs to a Teacher. I think I need a joining table, but I'm getting confused. Any help?
Database Normalization and SQL helps us understand how to structure our tables and define the relationships between them. When we're learning the examples chosen are usually straightforward to explain and understand.
This example, from Airtable, uses Museums and Works to demonstrate how one Museum may "own" many Works but any given Work will always belong to one and only one Museum. This is known as a "One-to-Many" relationship (one Museum, many Works) and conventions of Normalization and database design give us clues on how to organize our database objects to support this data.
From there you might imagine that, for any given Work, there are one or more Creators. And Creators certainly can have more than one Work. This is a "Many-to-Many" relationship, and there are conventions for how to efficiently store and retrieve data like this as well.
But AudTheCodeWitch raises an interesting question! What happens when that relationship becomes self-referential? Her example, of Students and Teachers, is a good one. They're both "Users" in her system, and outside of the Student/Teacher distinction there isn't any difference. Another common example of this structure is Employees and Managers. Employees are often also Managers, and may even have their own Manager as well. It's a very circular style.
When modeling this structure in OOP the answer is a little easier. You'd have your User class, and from that inherit both Students and Teachers. But what's the right way to model this in a Normalized relational database?
Self-referencing joins! There is no reason why the "foreign" key in a table cannot refer back to the key of the same table it's stored in. This allows us to maintain normalization and, in my opinion, does not detract from the legibility of the table structure. It's also perfectly valid for either one-to-many or many-to-many relationships.
To continue with AudTheCodeWitch's example, in her Twitter thread she indicates that she initially imagined that each student could have one and only one teacher. Sound familiar? That's the description of a one-to-many relationship. While she certainly could have two separate tables, the simpler solution is to simply have a field in the Users table that is a FK to the Id of that same Users table. It could be named "TeacherUserId" and would be populated for any User object that had a teacher.
In this model the one-to-many relationship is modeled perfectly and without any additional database entities. It's also clear, from a glance, which Users are Students (they'll have a value in the TeacherUserId field!)
On top of that, it can model recursive references without any changes. If we're using it for Employees and Managers instead, each record can both have it's ManagerUserId field filled in, as well as having it's Id in another record's ManagerUserId.
AudTheCodeWitch does acknowledge that she could imagine a scenario where a Student has more than one teacher. That makes sense, Students often have multiple classes with multiple teachers. The self-referential join is still valid even though we're now looking at a many-to-many relationship. Just like a traditional many-to-many relationship we would model this with an intermediary table.
This adds an additional mapping table, but all normalized many-to-many relationships do. And it is still fairly self-documenting.
So there you have it, the self-referential join in all it's glory. Not so scary and immediately useful for a variety of scenarios.
Querying this model is no more difficult than any other, it just requires thoughtful use of aliases. A sample query for the one-to-many implementation:
SELECT S.Name AS StudentName, T.Name AS TeacherName FROM Users S JOIN Users T ON S.TeacherUserId = T.Id
*Find all of Ben's training content here, at CBT Nuggets - http://learn.gg/ben-finkel *