If you’re following along with my SQL Series, welcome to Part 2 🎉 🎉
In the first part, we focused on the basics of retrieving and filtering data from a single table. Now, we’re taking the next big step: learning how to work with data spread across multiple tables.
We’ll break down the different types of joins, look at practical use cases, and then explore how subqueries can make your queries more flexible and easier to maintain.
What are SQL JOINS?
In relational databases, data is often stored across multiple normalized tables. For example, employees and departments might live in two separate tables. But in real-world business queries, you almost always need combined information.
A JOIN in SQL allows you to query data from two or more tables based on a related column between them (often a primary key and foreign key relationship).
Syntax:
SELECT column_list
FROM table1
JOIN table2
ON table1.common_column = table2.common_column;
When to Use JOINS
JOINS should be used when:
- Your data is normalized into multiple tables (e.g., Customers, Orders, Products).
- You want to generate comprehensive reports combining multiple entities (e.g., customer details + their purchases).
- You need data validation, ensuring only matching records across tables are retrieved.
- You’re building reports or dashboards where relationships between entities matter.
Different Types of SQL JOINs
Here are the main types of joins in SQL:
- INNER JOIN → Returns records that have matching values in both tables
- LEFT (OUTER) JOIN → Returns all records from the left table, and the matched records from the right table
- RIGHT (OUTER) JOIN → Returns all records from the right table, and the matched records from the left table
- FULL (OUTER) JOIN → Returns all records when there is a match in either left or right table
- CROSS JOIN → represents the Cartesian product (all combinations).
Example Dataset:
For this guide, we’ll be working with a classic use case that most organizations deal with throughout.
We’ll use two main tables:
- Employees
- Departments
employee_id | name | department_id | salary | hire_date |
---|---|---|---|---|
1 | Alice | 101 | 70000 | 2019-01-15 |
2 | Bob | 102 | 55000 | 2020-03-12 |
3 | Charlie | NULL | 40000 | 2018-07-23 |
4 | Diana | 103 | 62000 | 2017-09-10 |
5 | Ethan | 101 | 72000 | 2021-06-19 |
6 | Farah | 104 | 58000 | 2022-02-14 |
7 | George | NULL | 45000 | 2016-11-05 |
department_id | department_name | location |
---|---|---|
101 | HR | New York |
102 | IT | San Francisco |
103 | Marketing | Chicago |
104 | Finance | Boston |
105 | Operations | Dallas |
Have a look at the tables and the data, were you able to catch any gaps? If not, don't worry I got you.
INNER JOIN
When we perform a simple JOIN our result only includes rows that match our ON
condition.
👉 Example: Find all employees who are assigned to a valid department.
SELECT e.name, e.salary, d.department_name, d.location
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;
Output:
name | salary | department_name | location |
---|---|---|---|
Alice | 70000 | HR | New York |
Bob | 55000 | IT | San Francisco |
Diana | 62000 | Marketing | Chicago |
Ethan | 72000 | HR | New York |
Farah | 58000 | Finance | Boston |
Note:
INNER
is the default, soJOIN
is equivalent toINNER JOIN
.
LEFT JOIN
Returns all rows from the left table (employees), even if they don’t have a matching row from the right table (department).
If no match exists, the right-side columns return NULL
.
👉 Example: List all employees, including those without departments.
SELECT e.name, e.salary, d.department_name
FROM employees e
LEFT JOIN departments d
ON e.department_id = d.department_id;
Output:
name | salary | department_name |
---|---|---|
Alice | 70000 | HR |
Bob | 55000 | IT |
Charlie | 40000 | NULL |
Diana | 62000 | Marketing |
Ethan | 72000 | HR |
Farah | 58000 | Finance |
George | 45000 | NULL |
RIGHT JOIN
A RIGHT JOIN returns all rows from the right table(department), plus matching rows from the left(employees).
If no match exists, the left-side columns return NULL
.
👉 Example: List all departments, including those without employees.
SELECT e.name, d.department_name
FROM employees e
RIGHT JOIN departments d
ON e.department_id = d.department_id;
Output:
name | department_name |
---|---|
Alice | HR |
Ethan | HR |
Bob | IT |
Diana | Marketing |
Farah | Finance |
NULL | Operations |
FULL OUTER JOIN
Returns all records when there is a match in left (employees) or right (departments) table records.
Use FULL OUTER JOIN when you want a complete picture of employees and departments, including mismatches on both sides.
👉 Example: Show a complete view of employees and departments, including unmatched rows on both sides.
SELECT e.name, d.department_name
FROM employees e
FULL OUTER JOIN departments d
ON e.department_id = d.department_id;
Output:
name | department_name |
---|---|
Alice | HR |
Ethan | HR |
Bob | IT |
Diana | Marketing |
Farah | Finance |
Charlie | NULL |
George | NULL |
NULL | Operations |
Note:
FULL OUTER JOIN
andFULL JOIN
are equivalent.
CROSS JOIN
Returns the Cartesian product of both tables (every employee with every department).
It is rarely used in business queries, but useful for generating test data or combinations.
👉 List all employee-department combinations.
SELECT e.name, d.department_name
FROM employees e
CROSS JOIN departments d;
Notice that cross joins don’t require an ON
statement. You’re not really joining on any columns!
UNION
Sometimes we just want to stack one dataset on top of the other. Well, the UNION
operator allows us to do that.
The UNION
operator automatically removes duplicate rows from the result set.
Syntax:
SELECT *
FROM table1
UNION
SELECT *
FROM table2;
SQL has strict rules for appending data:
- Tables must have the same number of columns.
- The columns must have the same data types in the same order as the first table.
Primary Key vs Foreign Key
Relational databases are built on the idea of relationships between tables. These relationships are created using primary keys and foreign keys.
Primary Key:
- A Primary Key is a column (or set of columns) that uniquely identifies each row in a table.
- No two rows can have the same primary key value.
- It cannot contain
NULL
In the Departments table above, department_id
is the primary key because each department must have a unique ID.
Foreign Key:
- A Foreign Key is a column in one table that refers to the primary key in another table.
- It creates the relationship between two tables.
In the Employees table, department_id
is a foreign key referencing the Departments table.
Without primary and foreign keys, JOINS wouldn’t know how to link tables together.
Using WITH in SQL
Sometimes we want to combine two tables, but one of those tables isn’t available directly, instead, it’s the result of another calculation.
For example, HR might want to know:
👉 What is the average salary per department, and which employees belong to those departments?
This requires two steps:
- First, calculate the average salary for each department.
- Then, join this result with the Employees table so we can see which employees belong to those departments.
Step 1: Calculate average salary per department
SELECT department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;
Output:
department_id | avg_salary |
---|---|
101 | 71000 |
102 | 55000 |
103 | 62000 |
104 | 58000 |
NULL | 42500 |
Step 2: Use WITH to build a reusable result
Instead of nesting the above query directly, we wrap it with a CTE:
WITH dept_avg AS (
SELECT department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT e.name, e.salary, d.department_name, dept_avg.avg_salary
FROM employees e
JOIN departments d
ON e.department_id = d.department_id
JOIN dept_avg
ON e.department_id = dept_avg.department_id;
Output:
name | salary | department_name | avg_salary |
---|---|---|---|
Alice | 70000 | HR | 71000 |
Ethan | 72000 | HR | 71000 |
Bob | 55000 | IT | 55000 |
Diana | 62000 | Marketing | 62000 |
Farah | 58000 | Finance | 58000 |
At this point in our SQL journey we know we can query a
database to retrieve desired results. However, what happens when we query a database but we really only need a subset of the results returned?
Let's explore the use of something called a subquery that gives us the same functionality as a join, but with much more readability.
Subqueries
As the name suggests, a subquery is an internal query nested inside of an external query. They can be nested inside of SELECT
, INSERT
, UPDATE
, or DELETE
statements.
Anytime a subquery is present, it gets executed before the external statement is run.
Syntax:
SELECT column1, column2, ...
FROM table1
WHERE column operator (SELECT column FROM table2 WHERE condition);
-
column1, column2, ...
: A list of column names you want to retrieve. -
table1
: The main table from which you want to select data. - column: The column you want to compare or filter against in the main query.
-
operator
: An operator such as =, >, <, IN, NOT IN, etc. -
table2
: The table from which you want to retrieve data for comparison. -
condition
: The condition that specifies which records to select from the subquery.
When to use: When you need to filter, calculate, or aggregate before applying conditions in your main query.
👉 Example: Find employees earning more than the average salary.
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
In this statement, the subquery SELECT statement would be executed first, resulting in a list of average salaries of the employees.
Comparison Operators
Subqueries are often combined with operators for filtering.
- = → Equal to
- > → Greater than
- < → Less than
- >= → Greater than or equal to
- <= → Less than or equal to
- <> or != → Not equal to
In and Not In Clauses
The more common ways to use subqueries is with the use of an IN
or NOT IN
clause
When an IN
clause is used, results retrieved from the external query must appear within the subquery results.
Similarly, when a NOT IN
clause is used, results retrieved from the external query must not appear within the subquery results.
Example (IN): Employees in HR or IT
SELECT name
FROM employees
WHERE department_id IN (101, 102);
Result: Alice, Bob, Ethan
Example (NOT IN): Employees not in HR or IT
SELECT name
FROM employees
WHERE department_id NOT IN (101, 102);
Result: Charlie, Diana, Farah, George
Exists and Not Exists
The EXISTS and NOT EXISTS clauses are used in SQL to test whether a subquery returns any rows.
EXISTS → Returns TRUE
if the subquery produces at least one row.
NOT EXISTS → Returns TRUE
if the subquery produces no rows.
Unlike IN / NOT IN, which collect a list of values from the subquery and then compare them, EXISTS simply checks for the presence of rows.
Example: Show all employees who are assigned to a department.
SELECT *
FROM employees e
WHERE EXISTS (
SELECT *
FROM departments d
WHERE d.department_id = e.department_id
);
- The subquery checks if a department exists with the same department_id as the employee.
- If it finds one, EXISTS is true and the employee is included.
- If not, that employee is skipped.
Example: Show employees who are not assigned to any department.
SELECT *
FROM employees e
WHERE NOT EXISTS (
SELECT *
FROM departments d
WHERE d.department_id = e.department_id
);
- For each employee, the subquery checks the departments table.
- If no department matches the employee’s department_id, then NOT EXISTS returns true, and that employee appears in the result.
Wrapping Up
Congratulations on taking another step in the SQL journey. 🙌
In the next part of the series, we’ll build on this foundation to tackle even more advanced SQL concepts.
Until then, here’s a friendly reminder that you’ve leveled up:
Top comments (0)