Suppose you manage a small retail store. The store has multiple departments i.e Electronics, Clothing, and Groceries. Each department has employees assigned to it. And those employees generate sales.
Now you want to evaluate performance.
You might ask:
- Which employee belongs to which department?
- How much revenue has each employee generated?
- Are there employees who have not recorded any sales?
At this point, the data you need is not stored in a single table. It is distributed across several related tables. That is how relational databases are designed. Data is separated logically to reduce redundancy and maintain consistency.
To demonstrate how this data can be combined meaningfully, I will use PostgreSQL as the database management system. I am connecting to a locally installed PostgreSQL server using DBeaver, which serves as the SQL client for writing and executing queries.
In this article, I will walk through the process step by step:
Table of Contents
1. Creating a Database and Schema
Create the Database
A database is the top-level container that stores schemas, tables, functions, and other database objects. This statement creates a new PostgreSQL database named demo.
CREATE DATABASE demo;
After creating the database, I connect to it in DBeaver so that all subsequent objects are created inside demo.
Create a Schema
A schema organizes database objects within a database. It acts as a namespace, preventing naming conflicts and grouping related tables logically. This statement creates a schema named joins_window.
CREATE SCHEMA joins_window;
In this case, the schema joins_window will contain all tables used in this demonstration.
Set the Search Path
By default, PostgreSQL searches for tables in the public schema. Setting the search path ensures that any tables created or queried will reference the joins_window schema automatically. Syntax:
SET search_path TO joins_window;
This keeps the work organized and avoids having to prefix every table with the schema name.
2. Creating the Tables
To demonstrate joins effectively, I created three related tables:
- departments
- employees
- sales
Each table represents a distinct entity in the store system, and the relationships between them are enforced using primary and foreign keys.
Table 1: departments
Syntax:
CREATE TABLE departments (
department_id SERIAL PRIMARY KEY,
department_name VARCHAR(50)
);
INSERT INTO departments (department_name) VALUES
('Electronics'),
('Clothing'),
('Groceries');
The department_id column is defined as a primary key.
This guarantees that each department has a unique identifier.
The SERIAL keyword automatically generates sequential integer values for each new row.
Table 2: employees
Syntax:
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
employee_name VARCHAR(50),
department_id INT,
salary NUMERIC,
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
INSERT INTO employees (employee_name, department_id, salary) VALUES
('Arya', 1, 50000),
('Baelish', 1, 60000),
('Cersei', 2, 40000),
('Daenerys', 2, 45000),
('Sansa', 3, 35000);
Here, employee_id is the primary key.
The department_id column is a foreign key referencing departments(department_id). This establishes a relationship between employees and departments.
The foreign key constraint enforces referential integrity. In practical terms, it prevents inserting an employee with a department that does not exist in the departments table.
This design models a one-to-many relationship:
- One department can have many employees.
- Each employee belongs to exactly one department.
Table 3: sales
Syntax:
CREATE TABLE sales (
sale_id SERIAL PRIMARY KEY,
employee_id INT,
sale_date DATE,
amount NUMERIC,
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);
INSERT INTO sales (employee_id, sale_date, amount) VALUES
(1, '2025-01-01', 2000),
(1, '2025-01-02', 3000),
(2, '2025-01-01', 4000),
(3, '2025-01-01', 1500),
(4, '2025-01-03', 2500),
(5, '2025-01-02', 1000);
Again, sale_id is the primary key.
The employee_id column is a foreign key referencing employees(employee_id). This ensures that every sale is linked to a valid employee.
This creates another one-to-many relationship:
- One employee can record multiple sales.
- Each sale belongs to one employee.
3. Demonstrating Joins Using the Store Tables
With the tables now defined and populated, we can begin examining how joins operate. The data we need is distributed across separate tables by design. Departments are stored independently from employees, and sales are stored independently from employees. This separation prevents redundancy and enforces data integrity. However, it also means that answering even simple analytical questions requires combining tables.
That combination is achieved using joins. Here is a list of the joins we will be looking into:
a) INNER JOIN - returns only matching rows (intersection of both tables)
b) LEFT JOIN - preserves all rows from the left table.
c) RIGHT JOIN - preserves all rows from the right table.
d) FULL JOIN - preserves all rows from both tables.
e) CROSS JOINS - generates all possible row combinations (Cartesian product).
f) NATURAL JOIN - automatically matches columns with identical names.
g) SELF JOIN - joins a table to itself to model hierarchical relationships.
In the sections that follow, some additional rows may be inserted to clearly demonstrate how different join types behave when matching data is present and when it is null.
a) INNER JOIN
An INNER JOIN returns only the rows where the join condition evaluates to true in both tables. In other words, it keeps only matching records.
General Syntax:
SELECT table1.column1, table2.column2
FROM table1
INNER JOIN table2
ON table1.common_field = table2.common_field;
The ON clause defines the relationship between the two tables. Without it, PostgreSQL would not know how the rows should be matched.
Example:
SELECT
e.employee_name,
d.department_name,
e.salary
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;
PostgreSQL compares employees.department_id with departments.department_id. For every employee row, it searches for a department with the same department_id. When a match is found, the rows are combined into a single result row. If no match is found, that employee would not appear in the result. It returns only the intersection between the two tables.
b) LEFT JOIN
A LEFT JOIN guarantees that all rows from the left table appear in the result set. The key difference from an INNER JOIN is this:
If a matching row does not exist in the right table, the left table row is still preserved. The columns from the right table are filled with NULL.
General Syntax:
SELECT table1.column1, table2.column2
FROM table1
LEFT JOIN table2
ON table1.common_field = table2.common_field;
At the moment, every employee in our dataset has at least one sale. That would make the LEFT JOIN behave similarly to an INNER JOIN. To demonstrate the difference clearly, we insert an employee who has no associated sales:
INSERT INTO employees (employee_name, department_id, salary)
VALUES ('Jon', 1, 48000);
Now Jon exists in the employees table but has no corresponding row in the sales table. Let us see how the output will be.
Example:
Question: Which employees exist in the company, and what sales have they recorded, if any?
SELECT
e.employee_name,
s.sale_date,
s.amount
FROM employees e
LEFT JOIN sales s
ON e.employee_id = s.employee_id
ORDER BY e.employee_name;
PostgreSQL starts with the employees table (the left table). For each employee, it attempts to find matching rows in the sales table using employee_id. If matches are found, the rows are combined. If no match is found, the employee row is still returned.
c) RIGHT JOIN
A RIGHT JOIN is structurally similar to a LEFT JOIN, but while a LEFT JOIN guarantees all rows from the left table, a RIGHT JOIN guarantees all rows from the right table. And if a matching row does not exist in the left table, the right table row is still returned. The columns from the left table are filled with NULL.
General Syntax:
SELECT table1.column1, table2.column2
FROM table1
RIGHT JOIN table2
ON table1.common_field = table2.common_field;
The table written after RIGHT JOIN is the one that will always be preserved (all rows from the right table).
Example:
Question: What sales were recorded, and who is associated with each one?
SELECT
e.employee_name,
s.sale_date,
s.amount
FROM employees e
RIGHT JOIN sales s
ON e.employee_id = s.employee_id;
PostgreSQL starts with the sales table (the right table). For each sale, it attempts to find a matching employee using employee_id. If a match is found, the rows are combined. If no match is found, the sale row is still returned.
When we performed a LEFT JOIN between employees and sales, Jon appeared with NULL values in the sales columns because he had no recorded sales. In this RIGHT JOIN, Jon does not appear at all because the right table is sales. Jon has no row in the sales table, therefore, there is nothing for the RIGHT JOIN to preserve on his behalf.
d) FULL JOIN
A FULL JOIN combines the qualities of both LEFT and RIGHT joins. In that, no row from either table is excluded. If a row exists in one table but not in the other, it still appears in the result and the columns from the missing side are filled with NULL.
General Syntax:
SELECT table1.column1, table2.column2
FROM table1
FULL OUTER JOIN table2
ON table1.common_field = table2.common_field;
Example:
Question: Provide a complete view of employee records and sales transactions.
SELECT
e.employee_name,
s.sale_date,
s.amount
FROM employees e
FULL OUTER JOIN sales s
ON e.employee_id = s.employee_id
ORDER BY e.employee_name;
PostgreSQL attempts to match employees to sales using employee_id. Where a match exists, the rows are combined.
e) NATURAL JOINS
A NATURAL JOIN is a type of join that automatically matches columns with the same name in both tables. It is important to note:
- NATURAL JOIN only works when columns have the same name.
- Warning: NATURAL JOIN can be dangerous if tables share multiple same-named columns. NATURAL JOINS will join on all shared column names, not just the one you intended.
In our schema, both employees and departments contain a column named department_id.
General Syntax:
SELECT column_list
FROM table1
NATURAL JOIN table2;
Example:
SELECT
employee_name,
department_name
FROM employees
NATURAL JOIN departments;
f) CROSS JOINS
A CROSS JOIN produces a Cartesian product of two tables, that means every row from the first table is combined with every row from the second table.
So, if table A contains m rows and table B contains n rows, the result of a CROSS JOIN will contain: m × n rows.
General Syntax:
SELECT column_list
FROM table1
CROSS JOIN table2;
Example:
Question: What are all possible employee–department assignment combinations?
SELECT
e.employee_name,
d.department_name
FROM employees e
CROSS JOIN departments d;
g) SELF JOIN
A SELF JOIN occurs when a table is joined to itself. At first glance, that may seem unnecessary, I honestly thought the same thing when I came across this join. Why would a table need to join to itself? The answer actually lies in hierarchical relationships. What does that mean? Many real-world structures are recursive in nature:
- An employee reports to another employee (Maybe a manager).
- A product category contains subcategories.
- A comment replies to another comment.
In each case, the relationship exists within the same table.
Hence why we use SELF JOIN, it allows us to treat the same table as two logical instances and define a relationship between them.
Now this is where we slightly modify the employees table to make it meaningful. We will introduce a reporting structure in the employees table by adding a manager_id column:
ALTER TABLE employees
ADD COLUMN manager_id INT;
The manager_id will reference another employee_id in the same table.
Next, we assign managers:
UPDATE employees SET manager_id = 2 WHERE employee_id = 1; -- Arya reports to Baelish
UPDATE employees SET manager_id = 2 WHERE employee_id = 3; --Cersei reports to Baelish
UPDATE employees SET manager_id = 4 WHERE employee_id = 5; -- Sansa reports to Daenerys
General Syntax:
SELECT columns
FROM table_name t1
JOIN table_name t2
ON t1.related_column = t2.primary_key;
Example:
Question: For each employee, who is their assigned manager?
SELECT
e.employee_name AS employee,
m.employee_name AS manager
FROM employees e
LEFT JOIN employees m
ON e.manager_id = m.employee_id;
The table employees is referenced twice where e represents the employee and m represents the manager. PostgreSQL matches e.manager_id to m.employee_id, where a match exists, the employee is paired with their manager. If an employee has no assigned manager, the manager column appears as NULL.
Inclusive vs Exclusive Joins
Inclusive joins return matching rows along with unmatched rows from one or both tables.
Exclusive joins return only unmatched rows by filtering NULL values after an outer join.
Exclusive joins = Outer Join + NULL filter. These are not separate SQL keywords, but rather combinations of outer joins with filtering conditions using WHERE ... IS NULL.
This used in analytics to:
- Find customers without orders.
- Find products never sold.
- Find users who didn’t log in.
- Find orphaned records. You can learn more about Inclusive vs Exclusive Joins here.
I hope this article helps you to better understand how joins are used in SQL.













Top comments (0)