Nearly any data model of non-negligible complexity will include at least one (if not several) many-to-many relationships. This relationship emerges when, given two tables A and B, instances of A could have many instances of B, and instances of B could have many instances of A. Successfully implementing this relationship is a bit of stumbling block for people new to working with or building databases. However, the strategy for implementing this relationship in relational databases is important to understand. Believe it or not, it’s also not all that complicated.
Once you’ve implemented this relationship a few times, the strategy becomes rather intuitive. However, I remember this concept being one of the hardest ones for me and my classmates to finally understand when it was introduced in second year of undergrad. Hopefully, this article will help you internalize the many-to-many relationship like I eventually did.
The many-to-many (m2m) relationship is a relationship that occurs in data modeling when two entities are related but neither belong to the other and both can have many instances of each other. Here are some examples of many-to-many relationships:
- a pet can have many owners, and owners can have many pets
- a language can be spoken by many people, and a person can speak many languages
- a class can have many students, and a student can be in many classes
- a venue can have many visitors, and a visitor can visit many venues
To better understand the many-to-many relationship, let’s briefly talk about the one-to-many relationship to see how the m2m differs:
- a country can have many cities, but a city can only be in one country
- a car company can make many car models, but each car model is made by only one company
In each of these examples, note that there is a sense of one entity belonging to another.
One-to-many relationships are easy to model in the database. For example, consider the country-city example. The country India has many cities such as Mumbai, Hyderabad, and Kolkata, but Kolkata only belongs to the country of India.
So, in the database, we probably have a
Country table that looks like this:
City table plus relationship to the
Country table would look like this:
This is simple enough, and we can query for all cities in a country by simply writing something like
SELECT City.Name FROM City JOIN Country ON Country.Id = City.CountryId WHERE Country.Name = 'India' and we’d get a result set including Mumbai, Hyderabad, Kolkata, and more.
But this kind of modeling is easy and we’ve only used it as an example to better understand our next section: modeling the many-to-many relationship.
Now let’s consider the more complex many-to-many relationship. Specifically, let’s consider the instance of many people speaking many languages. For example, we might have a Person table and Language table like so:
Now, suppose we have two people: Erik and Lina. Erik speaks English and Spanish, Lina speaks English and Arabic. This is what we mean by many-to-many relationships; Lina speaks many languages–English and Arabic–and English is spoken by many people–Erik and Lina. So how do we relate the
Language tables to list out all of the languages a person speaks, or all of the people that speak a language?
The answer is to create a new table, the associative table. The associative table sits between the two entities in a many-to-many relationship and actually has a one-to-many relationship with each. That is, given tables A and B and an associative table C, A has many Bs through C and B has many As through C; put another way, A has many Cs, B has many Cs, but C has one A and one B.
Note: The associative table are also sometimes called association, bridge, intermediary, join, junction, crosswalk, or linking table (and there’s probably many more names for it). I learned the term “associative” table in undergrad, and I think that’s its official name in relational theory, so that’s what we’ll stick with in this article.
Let’s make this more concrete by creating an associative table for the
Language relationship. It’s common practice to name association tables as an amalgamation of the tables they join, so we’ll call our associative table
PersonLanguage. I’d create such a table with the following SQL:
CREATE TABLE PersonLanguage( Id INT IDENTITY(1,1) PRIMARY KEY, PersonId INT FOREIGN KEY REFERENCES Person(Id), LanguageId INT FOREIGN KEY REFERENCES Language(Id) )
Notice that we specified two foreign keys on this table,
LanguageId, pointing to the
Language tables respectively. The table structure now looks like this:
Now, there will be a record in the
PersonLanguage table for every language an individual
Person speaks. As a result, there will also be a record in the
PersonLanguage table for every person that speaks a
Language. If we want to see the list of languages Lina speaks, we could run the following query:
SELECT l.name FROM PersonLanguage pl JOIN Person p ON p.id = pl.PersonId JOIN Language l ON l.Id = pl.LanguageId WHERE p.name = 'Lina'
and we’d get results like:
Likewise, if we want a list of all the people that speak English, we could run a query like this:
SELECT p.name FROM PersonLanguage pl JOIN Person p ON p.id = pl.PersonId JOIN Language l ON l.Id = pl.LanguageId WHERE l.name = 'English'
This would give us the following results:
The above examples show the most basic implementation of building associative tables to break up many-to-many relationships. The example
PersonLanguage associative table above only has two columns (besides its own ID column) for pointing to
We can actually add columns to this table as well and capture interesting information. For example, what if we wanted to record which language was a person’s mother tongue? In other words, if Erik grew up speaking English and learned Spanish later, but Lina grew up speaking Arabic and learned English later, how would we capture that?
Thanks to the associative table, the answer is pretty easy. We just have to add a column to
PersonLanguage that indicates if the
Language record captured is the
Person record’s primary language. In T-SQL, I’d run the following
ALTER TABLE PersonLanguage ADD PrimaryLanguage BIT
and set the
PrimaryLanguage value to
1 for languages the person learned from birth. By capturing this data on the associative table, we can now write a query to generate a report of every person and their mother tongue like so:
SELECT p.name AS 'Person', l.Name AS 'Primary Language' FROM PersonLanguage pl JOIN Person p ON p.id = pl.PersonId JOIN Language l ON l.Id = pl.LanguageId WHERE pl.PrimaryLanguage = 1
and we’d get results like:
That is just one example of how to extend the functionality of our database with associative tables.
In this brief article, we learned how to break up many-to-many relationships in relational databases. The general approach is to place an associative table between the two tables and have a one-to-many relationship with them. Then we saw how we can use associative tables to capture nuanced data and create interesting reports. This is an important concept to understand in any kind of software or data engineering work that uses relational databases. As always, I hope this was helpful to you and don’t be afraid to contact me with any questions you might have!