DataLane

Posted on Sep 8

The Ultimate Guide to SQL Joins and Subqueries Explained

#sql #tutorial #programming #database

If you’re following along with my SQL Series, welcome to Part 2 🎉 🎉

In the first part, we focused on the basics of retrieving and filtering data from a single table. Now, we’re taking the next big step: learning how to work with data spread across multiple tables.

We’ll break down the different types of joins, look at practical use cases, and then explore how subqueries can make your queries more flexible and easier to maintain.

What are SQL JOINS?

In relational databases, data is often stored across multiple normalized tables. For example, employees and departments might live in two separate tables. But in real-world business queries, you almost always need combined information.

A JOIN in SQL allows you to query data from two or more tables based on a related column between them (often a primary key and foreign key relationship).

Syntax:

SELECT column_list
FROM table1
JOIN table2
ON table1.common_column = table2.common_column;

When to Use JOINS

JOINS should be used when:

Your data is normalized into multiple tables (e.g., Customers, Orders, Products).
You want to generate comprehensive reports combining multiple entities (e.g., customer details + their purchases).
You need data validation, ensuring only matching records across tables are retrieved.
You’re building reports or dashboards where relationships between entities matter.

Different Types of SQL JOINs

Here are the main types of joins in SQL:

INNER JOIN → Returns records that have matching values in both tables
LEFT (OUTER) JOIN → Returns all records from the left table, and the matched records from the right table
RIGHT (OUTER) JOIN → Returns all records from the right table, and the matched records from the left table
FULL (OUTER) JOIN → Returns all records when there is a match in either left or right table
CROSS JOIN → represents the Cartesian product (all combinations).

Example Dataset:

For this guide, we’ll be working with a classic use case that most organizations deal with throughout.

We’ll use two main tables:

Employees
Departments

employee_id	name	department_id	salary	hire_date
1	Alice	101	70000	2019-01-15
2	Bob	102	55000	2020-03-12
3	Charlie	NULL	40000	2018-07-23
4	Diana	103	62000	2017-09-10
5	Ethan	101	72000	2021-06-19
6	Farah	104	58000	2022-02-14
7	George	NULL	45000	2016-11-05

department_id	department_name	location
101	HR	New York
102	IT	San Francisco
103	Marketing	Chicago
104	Finance	Boston
105	Operations	Dallas

Have a look at the tables and the data, were you able to catch any gaps? If not, don't worry I got you.

INNER JOIN

When we perform a simple JOIN our result only includes rows that match our ON condition.

👉 Example: Find all employees who are assigned to a valid department.

SELECT e.name, e.salary, d.department_name, d.location
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;

Output:

name	salary	department_name	location
Alice	70000	HR	New York
Bob	55000	IT	San Francisco
Diana	62000	Marketing	Chicago
Ethan	72000	HR	New York
Farah	58000	Finance	Boston

Note: INNER is the default, so JOIN is equivalent to INNER JOIN.

LEFT JOIN

Returns all rows from the left table (employees), even if they don’t have a matching row from the right table (department).

If no match exists, the right-side columns return NULL.

👉 Example: List all employees, including those without departments.

SELECT e.name, e.salary, d.department_name
FROM employees e
LEFT JOIN departments d
ON e.department_id = d.department_id;

Output:

name	salary	department_name
Alice	70000	HR
Bob	55000	IT
Charlie	40000	NULL
Diana	62000	Marketing
Ethan	72000	HR
Farah	58000	Finance
George	45000	NULL

RIGHT JOIN

A RIGHT JOIN returns all rows from the right table(department), plus matching rows from the left(employees).

If no match exists, the left-side columns return NULL.

👉 Example: List all departments, including those without employees.

SELECT e.name, d.department_name
FROM employees e
RIGHT JOIN departments d
ON e.department_id = d.department_id;

Output:

name	department_name
Alice	HR
Ethan	HR
Bob	IT
Diana	Marketing
Farah	Finance
NULL	Operations

FULL OUTER JOIN

Returns all records when there is a match in left (employees) or right (departments) table records.

Use FULL OUTER JOIN when you want a complete picture of employees and departments, including mismatches on both sides.

👉 Example: Show a complete view of employees and departments, including unmatched rows on both sides.

SELECT e.name, d.department_name
FROM employees e
FULL OUTER JOIN departments d
ON e.department_id = d.department_id;

Output:

name	department_name
Alice	HR
Ethan	HR
Bob	IT
Diana	Marketing
Farah	Finance
Charlie	NULL
George	NULL
NULL	Operations

Note: FULL OUTER JOIN and FULL JOIN are equivalent.

CROSS JOIN

Returns the Cartesian product of both tables (every employee with every department).

It is rarely used in business queries, but useful for generating test data or combinations.

👉 List all employee-department combinations.

SELECT e.name, d.department_name
FROM employees e
CROSS JOIN departments d;

Notice that cross joins don’t require an ON statement. You’re not really joining on any columns!

UNION

Sometimes we just want to stack one dataset on top of the other. Well, the UNION operator allows us to do that.

The UNION operator automatically removes duplicate rows from the result set.

Syntax:

SELECT *
FROM table1
UNION
SELECT *
FROM table2;

SQL has strict rules for appending data:

Tables must have the same number of columns.
The columns must have the same data types in the same order as the first table.

Primary Key vs Foreign Key

Relational databases are built on the idea of relationships between tables. These relationships are created using primary keys and foreign keys.

Primary Key:

A Primary Key is a column (or set of columns) that uniquely identifies each row in a table.
No two rows can have the same primary key value.
It cannot contain NULL

In the Departments table above, department_id is the primary key because each department must have a unique ID.

Foreign Key:

A Foreign Key is a column in one table that refers to the primary key in another table.
It creates the relationship between two tables.

In the Employees table, department_id is a foreign key referencing the Departments table.

Without primary and foreign keys, JOINS wouldn’t know how to link tables together.

Using WITH in SQL

Sometimes we want to combine two tables, but one of those tables isn’t available directly, instead, it’s the result of another calculation.

For example, HR might want to know:

👉 What is the average salary per department, and which employees belong to those departments?

This requires two steps:

First, calculate the average salary for each department.
Then, join this result with the Employees table so we can see which employees belong to those departments.

Step 1: Calculate average salary per department

SELECT department_id,
       AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;

Output:

department_id	avg_salary
101	71000
102	55000
103	62000
104	58000
NULL	42500

Step 2: Use WITH to build a reusable result

Instead of nesting the above query directly, we wrap it with a CTE:

WITH dept_avg AS (
    SELECT department_id,
           AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.name, e.salary, d.department_name, dept_avg.avg_salary
FROM employees e
JOIN departments d
  ON e.department_id = d.department_id
JOIN dept_avg
  ON e.department_id = dept_avg.department_id;

Output:

name	salary	department_name	avg_salary
Alice	70000	HR	71000
Ethan	72000	HR	71000
Bob	55000	IT	55000
Diana	62000	Marketing	62000
Farah	58000	Finance	58000

At this point in our SQL journey we know we can query a
database to retrieve desired results. However, what happens when we query a database but we really only need a subset of the results returned?

Let's explore the use of something called a subquery that gives us the same functionality as a join, but with much more readability.

Subqueries

As the name suggests, a subquery is an internal query nested inside of an external query. They can be nested inside of SELECT, INSERT, UPDATE, or DELETE statements.

Anytime a subquery is present, it gets executed before the external statement is run.

Syntax:

SELECT column1, column2, ...
FROM table1
WHERE column operator (SELECT column FROM table2 WHERE condition);

column1, column2, ...: A list of column names you want to retrieve.
table1: The main table from which you want to select data.
column: The column you want to compare or filter against in the main query.
operator: An operator such as =, >, <, IN, NOT IN, etc.
table2: The table from which you want to retrieve data for comparison.
condition: The condition that specifies which records to select from the subquery.

When to use: When you need to filter, calculate, or aggregate before applying conditions in your main query.

👉 Example: Find employees earning more than the average salary.

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

In this statement, the subquery SELECT statement would be executed first, resulting in a list of average salaries of the employees.

Comparison Operators

Subqueries are often combined with operators for filtering.

= → Equal to
> → Greater than
< → Less than
>= → Greater than or equal to
<= → Less than or equal to
<> or != → Not equal to

In and Not In Clauses

The more common ways to use subqueries is with the use of an IN or NOT IN clause

When an IN clause is used, results retrieved from the external query must appear within the subquery results.

Similarly, when a NOT IN clause is used, results retrieved from the external query must not appear within the subquery results.

Example (IN): Employees in HR or IT

SELECT name
FROM employees
WHERE department_id IN (101, 102);

Result: Alice, Bob, Ethan

Example (NOT IN): Employees not in HR or IT

SELECT name
FROM employees
WHERE department_id NOT IN (101, 102);

Result: Charlie, Diana, Farah, George

Exists and Not Exists

The EXISTS and NOT EXISTS clauses are used in SQL to test whether a subquery returns any rows.

EXISTS → Returns TRUE if the subquery produces at least one row.

NOT EXISTS → Returns TRUE if the subquery produces no rows.

Unlike IN / NOT IN, which collect a list of values from the subquery and then compare them, EXISTS simply checks for the presence of rows.

Example: Show all employees who are assigned to a department.

SELECT *
FROM employees e
WHERE EXISTS (
    SELECT *
    FROM departments d
    WHERE d.department_id = e.department_id
);

The subquery checks if a department exists with the same department_id as the employee.
If it finds one, EXISTS is true and the employee is included.
If not, that employee is skipped.

Example: Show employees who are not assigned to any department.

SELECT *
FROM employees e
WHERE NOT EXISTS (
    SELECT *
    FROM departments d
    WHERE d.department_id = e.department_id
);

For each employee, the subquery checks the departments table.
If no department matches the employee’s department_id, then NOT EXISTS returns true, and that employee appears in the result.

Wrapping Up

Congratulations on taking another step in the SQL journey. 🙌

In the next part of the series, we’ll build on this foundation to tackle even more advanced SQL concepts.

Until then, here’s a friendly reminder that you’ve leveled up:

DEV Community

The Ultimate Guide to SQL Joins and Subqueries Explained

What are SQL JOINS?

When to Use JOINS

Different Types of SQL JOINs

INNER JOIN

LEFT JOIN

RIGHT JOIN

FULL OUTER JOIN

CROSS JOIN

UNION

Primary Key vs Foreign Key

Using WITH in SQL

Subqueries

Comparison Operators

In and Not In Clauses

Exists and Not Exists

Wrapping Up

Top comments (0)