DEV Community

Tim
Tim

Posted on

Hive SQL (and T-SQL): How To Write A Left Anti Join and Why

Previously, we've look at the join functionality of LEFT JOIN. If we recall from that lesson, we learned that a LEFT JOIN will join records that match and don't match on the condition specified. Since we're using the same example over and over again for easily remembering the result, we'll remember that two of our records in Table1 matched with Table2, but two results didn't match, so we saw nulls as the result. When there isn't a match (because the records don't exist), we get a null as the return. What if we only want to get the records in Table1 that don't match in Table2? In the video, SQL Basics: How To Use A LEFT ANTI JOIN and Why, we look at accomplishing this challenge using the LEFT ANTI JOIN. Since not all SQL languages support LEFT ANTI JOIN, I show this syntax two different ways, one of which is generally supported.

Some questions that are answered in the video:

  • Note the tables that we're using and what values are identical based on the column names and what values differ based on the column names. We'll be using these tables throughout these videos. As a note on aliasing tables - the first table always gets a "t1" while the second table always gets a "t2" and so on and so forth (the 5th table would get a "t5"). Finally, consider that some SQL languages may not support the JOINs that we discuss. As a quick reminder, not all SQL languages may support the exact syntax shown.
  • In first our example, we use a LEFT ANTI JOIN without directly calling it (like we could do in some SQL languages), but by using a null filter on the LEFT JOIN condition. What is the purpose of specifying nulls in this example?
  • To accomplish the same results in the first example, how can we LEFT ANTI JOIN Table1 with Table2? How does this save us time in development?
  • What would happen if we used a LEFT ANTI JOIN from Table1 on Table2 for the column Letter? What would our results be? Why? As we see, a big part of what we're doing here is filtering out records when we have two data sets with similarities and differences. If you've ever completed an assignment that involved "compare" and "contrast" then you've done something similar in functionality as you had to identify the similarities and the differences, which is what we're doing here with data.

As for a real world example with LEFT ANTI JOINs, consider a health example where we have populations with different traits, but all that have the same disease. We could use a LEFT ANTI JOIN to filter out these differences. This would possibly allow us to further our analysis (may or may not be useful). The key with any data set where we use a LEFT ANTI JOIN is that must have columns that will be used in our join condition where we do the filtering. In other words, we would filter on trait or multiple traits depending on how we broke down our data in the example. The same applies if we were to use LEFT ANTI JOINs with sales data, housing data, etc. We must have columns where we'll apply our filtering condition.

For more SQL lessons, you can continue to watch the SQL Basics series.

Top comments (0)