SQL 101: All about SQL JOINs
Helen Anderson
Nov 17 '18
Updated on Feb 12, 2019
・3 min read
Joining tables is the first big learning curve after getting your head around SQL basics. More often than not you won't find everything you need in one table so will need to learn which join to use when.
If you're familiar with the VLOOKUP function in Excel, you'll be just fine.
T-SQL supports cross, inner and outer JOIN clauses, and UNION operators to combine datasets. While there are plenty of guides to JOINs that use venn diagrams to get the point across, I prefer something more visual.
INNER JOIN
Use an INNER join, or just a JOIN, when you want to find the match between two tables. You need to have a key on both tables that you join ON, and that's where the match happens. Any results where there is not a match are discarded.
select
o.order_item,
i.inventory_item
from orders o
inner join inventory i
on o.order_item = i.inventory_item
LEFT OUTER JOIN
Use a LEFT OUTER join when you want to find a match between two tables, but also show a NULL where there is no match from the right table. Like the INNER JOIN you need a key to join ON.
A RIGHT OUTER JOIN does the same but in reverse.
select
o.order_item,
i.inventory_item
from orders o
left join inventory i
on o.order_item = i.inventory_item
CROSS JOIN
A CROSS JOIN joins everything with everything. There is no need to provide a key to join on and it can result in a very big data set, (and a really big image so you'll have to use your imagination when reviewing the image below).
Proceed with caution.
select
o.order_item,
i.inventory_item
from orders o
cross join inventory i
UNION
While a JOIN combines rows of columns horizontally a UNION combines the results vertically. Using a UNION combines the result of two queries into one column and removes duplicates. If your query has multiple columns, they need to be in the same order to complete the UNION.
select *
from
orders
union
select *
from
inventory
UNION ALL
The UNION ALL combines the results of two queries the same as a UNION but keeps the duplicates in the result.
select *
from
orders
union all
select *
from
inventory
The best way to get your head around JOINs is to start using them. If you aren't working with a SQL database already, check out SQLZoo or Hacker Rank to experiment with JOINs.
Read more about JOINs with katiekodes
Let me know what other key concepts you think would be useful for complete beginners and junior data analysts.
Photo by Pixabay on Pexels
(open source and free forever ❤️)
Helen, you not explain FULL JOIN. Why?
My wild guess would be that’s because
FULL JOIN
is an evidence of a design flaw. If you find yourself doingFULL JOIN
your database structure requires in an immediate surgery.That's not at all true. They're very rare in my experience, but even
GOTO
has its place and there's been much less harm wrought by full joins.For an example, a data model I'm currently working with involves a 1:1 relationship between
stations
andsites
: a station can be deployed to a site, a site may be empty (no station), or a station may be held in reserve (no site). I want to see all my stations and all my sites, with deployed stations correctly lined up with their sites. A full outer join betweenstations
andsites
onstations.site_id
produces exactly the desired output. Anything else would be clumsy at best, and the only remotely appropriate alternative model simply moves the foreign key to the other side.Well, I am not a SQL architect by any mean, but a join table
sites_to_stations
would not only allow to avoidFULL JOIN
but also attach some attributes to links (like linked_at or like) in the future for free.I already get metadata for free (it's event sourced), so a junction table even with the appropriate constraints to enforce 1:1 vs m:m would just take up space and make other queries more complicated.
Got it. Thanks for the explanation, I was wrong. In this case it indeed sounds as the best solution.
Also, the junction table approach still doesn't give me what I want, since the un-junctioned records collide on the join criteria:
To get unaffiliated lefts and rights to show up, you have to run the query twice and union the results:
Indeed. This is a perfect example of how blinkered are we (I :) might be with our background, experience and expertise.
Say, 20 times in my life I saw
FULL JOIN
in the legacy DBs, that I always could have refactored to the joined tables. I never met the example of a kind you just shared (btw, thank you again and again for taking time doing that, I really appreciate and value it.)And—voilà—I stupidly made the wrong assumption based on my experience, expertise and all that crap.
Very enlightening.
You're welcome! :) I should say that the case I mentioned with sites and stations is, to my memory, one of maybe two or three times I've actually written a full join in application code over more than a decade of designing databases and writing SQL. It's really rare, but an important tool nonetheless.
I have a use case for
FULL JOIN
.FULL JOIN
s are fantastic when, for one reason or another (perhaps you built the database just as a way to combine some external data sources ad-hoc), your database has acquired data in two different tables that includes different fact tables about the same real-world entities.See my example in my recent blog post "Every SQL Join You'll Ever Need".
You'll see that it's easy to make a table that looks like this:
Out of a table that looks like this:
And a table that looks like this:
It may not be a terribly common business need in a well-normalized long-term data store (or, as you say, one that doesn't "require immediate surgery"), but it's a really common one in ETL / "set up the latest database management wants" work. :-)
I could certainly also see myself using a
FULL JOIN
in the "stations
andsites
" types of "what sort of data I'm dealing with" situations Dian described. Other people have a point that often times when a "business need" requires two such tables to be joined so often, one would think they might have attributes of their relationships that need to be stored and therefore need a junction table, so that's worth thinking about as one designs their database! But as Dian points out ... not necessarily.I'd give that it definitely does seem to be far more common, business-question-wise, to want to see the intersection of two data sets or to see "definitely thing A, but only things B/C/D/etc. if applicable", than it is to want to see "whatever -- but put it together on the same line if it seems related"! :-) But I wouldn't call having a business need to look for such joins a "flaw" by any stretch of the imagination.
Aleksei …
FULL JOIN will do basically a LEFT JOIN and a RIGHT JOIN, correct?
Yes, you are correct in that I have never found too many practical uses as … no databases would normally have this situation in production use. But I still think it should be in the list.
There ARE limited cases like linking all reps to all products and finding which reps have NOT sold what products.
Dale
I was plain wrong on the subject.
The beauty of development ... there is never a right and wrong! Some things are logical until the day that they aint ... and vice-versa.
Dale
It's not always right. Sometime I using this. Common in big complex query with joining subquery
I do it when trying to debug legacy databases. I don't do it in production code though.
Hey, nice write up! I think another topic worth mentioning for beginners is a light (but IMO necessary) introduction to the cost of SQL queries. A JOIN is typically is an operation that may take days or just a minute depending on multiple factors — I think that knowing some of those factors (like having indices) would be empowering to beginner devs. We can also argue that this is a more advanced topic. :-)
Thanks!
You're absolutely right, it's tricky to keep beginner posts light while covering a lot of those technical factors, like trying to keep efficient and running smoothly.
My next post on the list is on indexes. I'm finding it tricky to keep it at an intro level and more conversational but will hopefully get there in the next few days.
Yep, working with indexes is a very important topic. They can downright make or break the performance of a database! I think it's a good idea to rely on a lot of diagrams, and perhaps animated GIFs explaining how the rows are inserted into the index.
Also I find the library metaphor very useful to build a mental model:
Say you're in a library and you're looking for a book. Without any hints, you'll have to search through all the books to find it. But if books are put on shelves in alphabetical order or by genre, that's already much easier! So, put that information in a book at the entrance and voilà — you've got an index!
I'm tossing up as to whether I explain how a B-tree works or just stick to a high-level explanation. That will dictate the strategy I take with gifs, images and which direction I take it.
I also enjoy the library metaphor:
"Databases are like libraries.
Tables are like books stored in a library.
Rows are like pages of a book and to make getting to the page you need quicker, you need to create an index. Flipping through a textbook page by page looking for that one page you need is going to take time, the same way querying millions of rows in an unindexed table is going be time-consuming and tedious."
I think a high-level B-tree explanation would be interesting — but perhaps you could warn that this needs some CS fundamentals (trees)?
Your approach to the library metaphor is interesting — I actually thought of it one level up, i.e. "a table is like a library" because I was thinking about a book as a row in a table. For example, a book has many properties (like genre, publish date, content, title, authors…) that we can relate to columns and relationships. Also, I'd be cautious not to cause confusion because "page" also have a specific meaning in the realm of databases (grouping of rows stored on disk). :-)
Anyway, excited to read this next article!
This is why I’ve held off tackling indexes so far. I’d like to pitch it to beginners but feel like I’m going to get too far into CS topics I wasn’t intending to explain.
Back to the draft :)
My take on indexes is now up if you’re interested dev.to/helenanders26/sql-series-sp...
Great post, Helen! Next time I'm logged into my website, I'll throw it into a "suggested reading" section of my "Every SQL Join..." post.
I stuck to "same people, different tables" joins because in Salesforce-land, most of what beginners do is beat their heads against the wall trying to combine customer data from different sources into a single table-shaped output, and I think a fair amount of my audience may come from Salesforce-land. (Also, I was having trouble thinking of meaningful examples that would be good for all my join types when sticking to proper PK-FK relationships.)
But I wanted to get around to also showing how joins could be used to look up more traditional PK-FK-connected details. Maybe now I don't have to ... I can just link to your examples. :-D
Thank you, that's great feedback. :)
Do you find SOQL to be very different to T-SQL over in Salesforce land? I'd love to read more about how they differ and any tips on making it easier to switch gears.
Yup, hugely different! You don’t get to DO joins of your own choosing at all. There’s a limited set of sub selects, “parent table lookups,” and semi-joins you can do between tables that already have PK-FK relationships, but there’s no “build whatever you want.” The closest thing, in-platform, is using their Java-like language to hand-join (iterate yourself imperatively) table dumps in memory. The seminal explanation of what you CAN do in pure SOQL is Jeff Douglas’s “SOQL, how I query thee, let me count the ways” (blog post)
Wow, sounds like a tough sell for a SQL/Excel analyst to get their head around. From what I understand it's what you use in Wave to do more than the basics.
That sounds more like SAQL, I think, which I haven’t played with because Wave (now “Einstein Analytics?”) costs extra and we don’t have it. But it DOES have a query language that lets you join at your leisure, I believe I recall from presentations on it. Looked way more like SQL, I believe. SOQL comes free as part of Salesforce for basic queries against the tables you create. SOQL is mostly custom-join-free, I suppose, both for simplicity and to keep non-DB-type amateurs from executing long-running cross-join queries on a server shared with other companies.
Hmmm I’m going to have to investigate the SOQL/SAQL a bit more. It sounds like I’ll be supporting analysts using one of them. It would be good to know which one!
Thanks for the nice intro! I would recommend having a look at sql-joins.leopard.in.ua/ for visualizing JOINS ☺️
Helen …
Thanks for the quick examples (here and elsewhere).
I think you should expand on what LEFT Vs RIGHT JOIN mean (ie. the table specified first is the 'left' table).
For CROSS JOIN I tend to think of it as an "EVERYTHING JOIN" and it can generate a MASSIVE amount of data and is generally bad. If you CROSS JOIN two tables with 1,000 rows each you get 1,000,000 rows.
There are practical uses for CROSS JOIN but they are few and far between. For example:
"I will generate a list of all reps linked to all products and you can then cross out the ones you don't want" has been quite common sort of thing.
I was surprised to see UNION discussed here but … yep it is a join or sorts. However, I think it is a whole new topic all on it's own.
Further topics? Sub-query? Co-related Subquery - and why is can be SOOO bad and should be avoided at all costs with anything other than small datasets!
Dale
Hi Dale
Thanks for all your comments, I appreciate the feedback.
You’re right, I could have given more of an explanation as to what I meant by LEFT and RIGHT. The image doesn’t speak for itself so it could have been clearer.
The Cross join is more of a warning than anything else. As you say, things can get out of hand very quickly.
I included Unions at a very high level to show that you can ‘join’ tables both horizontally and vertically but agree it is a blog post on its own.
Thank you for suggestions too, I’ll be posting a few more data blog posts before the year is out!
Helen
JOIN all the things :-D
Nice post!
Thanks Rhymes! :D
I like that you used actual tables and not Venn diagrams.
Thanks, glad you found it useful
Nice post, Thanks! :)
You're welcome!
Thanks for writing this!
You’re welcome, thanks for all your comments :)