Kithokoi James

Posted on Mar 1 • Edited on Mar 12

SQL Joins and Window Functions: A Beginner Friendly Guide.

#database #sql #data

If you’re learning data analytics, SQL will quickly become your daily tool.

But then, two topics decide whether you’ll feel confident or constantly confused: joins and window functions. Joins are how you combine related data across tables, and window functions are how you do calculations across a set of table rows, running totals, and comparisons without destroying the detail in your result set.

When you understand these two well, SQL stops feeling like memorizing commands and starts feeling like reasoning.

A simple way to remember the difference is this: a join helps you bring in columns from another table, while a window function helps you compute insights across rows while keeping every row visible.

In this article we will look at both of them and see how we can work with them.

What are Joins and How do Tables Connect to Tell a Story

A database is usually split into multiple tables because it's cleaner and prevents repeatition of data. For example, customers are stored in one table, orders in another, and products in another.

That structure is good design, but it creates a practical question, how do you view a complete story, like “show me each order with the customer name and the items inside it”? And that's where joins come into play.

A join works by matching rows across tables using a condition written after ON. Most of the time, that condition matches a foreign key to a primary key. Briefly, one table stores a reference to another table, and SQL uses that reference to connect them.

Imagine four common tables: customers(customer_id, name), orders(order_id, customer_id, order_date, total_amount), order_items(order_id, product_id, qty, unit_price), and products(product_id, product_name). An order belongs to a customer, and order items belong to an order. Those “belongs to” relationships are exactly what joins are built for.

Types of Joins

-1. INNER JOINS
An inner join returns only the rows where a match exists on both sides. If you join orders to customers, SQL will keep only the orders that successfully match a customer record. This is the join you use when you only want valid, connected data.

SELECT o.order_id, o.order_date, c.name AS customer_name, o.total_amount
FROM orders o
INNER JOIN customers c
  ON o.customer_id = c.customer_id;

This query produces one row per order, assuming each order has exactly one customer. If an order points to a customer ID that doesn’t exist, that order won’t show up, because INNER JOIN keeps only confirmed matches.

-2. LEFT JOIN
A left join keeps every row from the left table, even when there is no match on the right.

When no match exists, the right-side columns become NULL. This join is extremely common because it lets you keep your “main list” intact.

For example, if you want to list all customers and show any orders they might have, you’d use a LEFT JOIN. Customers with no orders still appear, which is often exactly what you need when analyzing conversion or engagement.

SELECT c.customer_id, c.name, o.order_id, o.order_date
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.customer_id;

With this querry, you’ll see customers who have never ordered, and their order columns will be NULL

-3. RIGHT JOIN
A right join is the same idea as a LEFT JOIN but reversed. It keeps everything from the right table. It’s not wrong, but many analysts avoid it because it’s hard to read. If you swap the table order, you can express it as a LEFT JOIN and keep your queries consistent.

The Join Mistakes People Make That Destroy Accuracy

Joins are not difficult because the syntax is hard. Joins are difficult because it’s easy to get correct looking results that are mathematically wrong.

The first major trap is row multiplication. When you join a “one” table to a “many” table, you will repeat the “one” row for each matching “many” row. For example, one order can have multiple order items. If you join orders to order_items, the order row repeats once per item. That repetition is correct, but it becomes confusing when you sum values from the repeated side.

Here’s a common mistake: calculating revenue from orders.total_amount after joining to items.

SELECT SUM(o.total_amount) AS revenue
FROM orders o
JOIN order_items oi
  ON o.order_id = oi.order_id;

If an order has five items, o.total_amount appears five times, and your revenue becomes inflated. The fix here is simple once you see it, don’t sum a value from a table after it has been multiplied. Either sum directly from orders without joining, or calculate revenue from items using qty * unit_price.

SELECT SUM(oi.qty * oi.unit_price) AS revenue
FROM order_items oi;

The second trap is turning a LEFT JOIN into an INNER JOIN by accident.
This happens when you filter right-table columns in the WHERE clause. A LEFT JOIN produces NULL for missing matches, but a WHERE filter removes those NULL rows.

This query looks reasonable, but it’s misleading:

SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.customer_id
WHERE o.order_date >= DATE '2026-01-01';

Customers with no orders have o.order_date = NULL, and they get filtered out. To keep customers even when there’s no order, move the filter into the join condition.

SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o
  ON o.customer_id = c.customer_id
 AND o.order_date >= DATE '2026-01-01';

The third trap is joining on unstable fields like names. Names can repeat, change, and contain typos. In analytics, you should join using IDs or real keys whenever possible.

Window Functions

Window functions solve a different problem. Many times, you want to calculate something “across a set of rows,” but you don’t want to collapse the result the way GROUP BY does.

If you run:

SELECT customer_id, SUM(total_amount)
FROM orders
GROUP BY customer_id;

You’ll get one row per customer. That’s useful, but you lose the order-level detail. Window functions let you compute totals, ranks, and comparisons while keeping each order row visible.

The core idea is simple, a window function performs a calculation over a “window” of related rows defined by OVER(...). You can split rows into groups using PARTITION BY and define ordering inside each group using ORDER BY.

How Joins and Window Functions Work Together

In real analytics, you often join first to bring in context, then apply window functions to compute insights.

You might join orders to customers to get names, then rank customers by spending, or compute running totals per customer over time.

The key is to stay clear on what each tool is doing. Joins change the shape of your data by combining rows from tables. Window functions add calculations across related rows without collapsing them. If you keep that difference in your head, your SQL becomes far more predictable.

Summary: The Key Ideas to Remember

Joins connect tables. INNER JOIN keeps only matching rows, while LEFT JOIN keeps all rows from the left and fills missing matches with NULLs. Joins can multiply rows, so you must be careful when summing after a join, otherwise your totals may be inflated.

Also, filtering right-table columns in a WHERE clause after a LEFT JOIN often defeats the purpose of the LEFT JOIN, so filters should be placed in the ON clause when you need to keep unmatched left rows.

Window functions calculate across related rows while keeping each row in the result. PARTITION BY defines the group, ORDER BY defines the sequence, and window frames define how many rows are included in calculations.

Top comments (1)

dyagzy • May 21

This is a great explanation to the subject, adding the final result of each of the example queries above would have made it more easier to follow along as we read the article. Overall it is a really great article and explanation