DEV Community

Tim
Tim

Posted on

Hive SQL: How To Write A LEFT SEMI JOIN

When we INNER JOIN from one table to another table with a column or set of columns and we select all columns, we'll get the results returned for both tables. This is true for some other join types as well, but the focus of this video will be for INNER JOINs since a LEFT SEMI JOIN is useful in contexts where we want specific results. In some cases with our joins, we only want the result set for one table, not both. Suppose that we wanted the result set for the LEFT table only in the join condition? We could use the standard SQL syntax of selecting all columns in the first table with the * operator filtered on the first table. This would return all columns in the first table and we'd still have the option of selecting columns from the second (and latter) tables. However, if we want to give ourselves the option of only selecting from the first table in our join, we can use a LEFT SEMI JOIN. In the video, SQL Basics: How To Use A LEFT SEMI JOIN and Why, we look at using this functionality.

Some questions that are answered in the video:

  • For this video, we use a self join table since our join will only involve one table. Since we join the table onto itself, any column that we use in the join condition will match provided that we don't change any part of the condition (which we do in our example and I show why).
  • As a quick reminder, not all SQL languages may support the exact syntax shown. Different SQL languages may have different approaches to solving the same problem or developing the same solution.
  • As we see with the results of the INNER JOIN, what output issue occurs when we join Table1 and Table2 when it comes to columns? For developers with application experience, what would be the issue here? Keep in mind that with INNER JOINs, this is seldom an issue, but worth considering in this case because it does come up.
  • How does a LEFT SEMI JOIN solve the above issue?
  • When would we consider using a LEFT SEMI JOIN? What could be the possible issue?

Given that fat finger and code issues occur, a LEFT SEMI JOIN can help us restrict the result set. In a business situation where we want to check a table's values against another table (or set of tables) and whether those values exist in other tables while only returning the results of the first table, a LEFT SEMI JOIN can help us. This functionality absolutely prevents us from possibly returning data in other tables, whereas using an INNER JOIN would allow us to possibly select columns from other tables than the first. For this reason, a LEFT SEMI JOIN can also be a more secure join type, if our business problem is one of filtering values in one tables that exist in other tables.

Top comments (0)