SQL Joins and Window Functions: The Tools That Changed How I Query Data

#database #sql #data #beginners

       **  INTRODUCTION**

when I first started learning SQL, joins confused the hell out of me. And window functions? Forget about it. I'd see queries with ROW_NUMBER() OVER (PARTITION BY...) and my brain would just shut down.
But here's the thing: once these concepts clicked, my entire approach to data analysis changed. Suddenly I could answer questions that used to require multiple queries or even exporting to Excel. I could rank products by sales within each category. Compare this month's revenue to last month's in a single query. Find duplicate records in seconds.
So if you're struggling with joins or have no idea what window functions are, don't worry. I've been there. Let me walk you through this stuff the way I wish someone had explained it to me.
Understanding SQL Joins
What Joins Actually Do
Think of joins as a way to combine information from different tables based on something they have in common.
Let's say you've got two tables: one with customer information (names, emails, addresses) and another with orders (what was purchased, when, how much). These tables are separate, but they're connected—each order has a customer_id that points back to the customers table.
A join lets you answer questions like "Show me all orders along with the customer names" without having to manually look up each customer. SQL does the matching for you based on that customer_id field.
The Different Types of Joins (And When to Use Each One)
Here's where it gets interesting. There are several types of joins, and picking the wrong one can give you totally wrong results. Let me break them down with real examples.
INNER JOIN -
The "Only Show Matches" Join
This is the most common join you'll use. An INNER JOIN only returns rows where there's a match in BOTH tables.

SELECT 
    c.customer_name,
    o.order_date,
    o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

What this does: Shows all orders with their customer names. But here's the catch—if a customer has never placed an order, they won't show up in these results. And if somehow an order exists without a matching customer (shouldn't happen, but databases get messy), that order won't show up either.
I use INNER JOIN when I only care about records that exist in both tables. Like "show me actual purchases with customer details"—I don't need to see customers who've never bought anything.
LEFT JOIN
The "Keep Everything From the First Table" Join
LEFT JOIN (also called LEFT OUTER JOIN) keeps ALL rows from the first table, even if there's no match in the second table.

SELECT 
    c.customer_name,
    c.email,
    o.order_date,
    o.total_amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

LEFT JOIN orders o ON c.customer_id = o.customer_id;
This query shows every customer, whether they've placed orders or not. Customers with no orders will have NULL values in the order_date and total_amount columns.
When do I use this? When I want to find gaps or missing data. "Which customers have never purchased anything?" Run this query and filter for NULL order dates. Super useful for identifying inactive customers or finding data quality issues.
RIGHT JOIN
RIGHT JOIN is just like LEFT JOIN, but backwards. It keeps everything from the second table.

SELECT 
    c.customer_name,
    o.order_date,
    o.total_amount
FROM customers c
RIGHT JOIN orders o ON c.customer_id = o.customer_id;

RIGHT JOIN orders o ON c.customer_id = o.customer_id;
Honestly? I barely use RIGHT JOIN. Anything you can do with a RIGHT JOIN, you can do with a LEFT JOIN by just switching the table order. Most developers stick with LEFT JOIN for consistency.
FULL OUTER JOIN - The "Keep Everything" Join
FULL OUTER JOIN returns all rows from both tables. If there's a match, great. If not, you get NULLs.

SELECT 
    c.customer_name,
    o.order_id,
    o.total_amount
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;

FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;
This shows all customers (even those without orders) AND all orders (even orphaned ones without matching customers). It's less common, but I've used it for data reconciliation—finding records that should match but don't.
CROSS JOIN
The Match Everything to Everything" Join
CROSS JOIN creates every possible combination between two tables. No join condition needed.

SELECT 
    p.product_name,
    s.store_name
FROM products p
CROSS JOIN stores s;

If you have 100 products and 10 stores, this returns 1,000 rows—every product paired with every store.
I've used this exactly twice in my career. Once to generate a template for planning which products go in which stores, and once by accident when I forgot to add a WHERE clause and crashed the database. Be careful with CROSS JOIN—it can generate massive result sets fast.
SELF JOIN
Joining a Table to Itself
Sometimes you need to compare rows within the same table. That's a self join.

SELECT 
    e1.employee_name AS employee,
    e2.employee_name AS manager
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.employee_id;

This shows each employee alongside their manager's name. Both come from the employees table, but you're using it twice with different aliases (e1 and e2).
I use self joins for hierarchical data—org charts, category trees, anything where records reference other records in the same table.

Window Functions: Your Secret Weapon
If joins changed how I combine data, window functions changed how I analyse it. They let you perform calculations across sets of rows that are related to the current row without collapsing everything into groups like GROUP BY does.
What Makes Window Functions Different
Here's the key difference: GROUP BY collapses rows. Window functions don't.
With GROUP BY:

SELECT customer_id, SUM(total_amount)
FROM orders
GROUP BY customer_id;

You get one row per customer showing their total. You lose the individual order details.
With a window function:

SELECT 
    customer_id,
    order_date,
    total_amount,
    SUM(total_amount) OVER (PARTITION BY customer_id) as customer_total
FROM orders;

You keep every order row AND you can see each customer's total on each row. You're adding a calculated column without losing detail.
This is huge for reporting. You can show individual transactions while also displaying running totals, rankings, or comparisons—all in one query.
The Core Window Functions I Use All The Time
ROW_NUMBER() - Assigning Unique Row Numbers
ROW_NUMBER() assigns a unique number to each row within a partition.

SELECT 
    customer_name,
    order_date,
    total_amount,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as order_sequence
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;

This numbers each customer's orders chronologically. Their first order gets 1, second gets 2, and so on.
I use this constantly for finding "first" or "last" records. Want each customer's most recent order? Add WHERE order_sequence = 1 after wrapping this in a subquery.
RANK - Ranking with Ties
These are like ROW_NUMBER() but handle ties differently.

SELECT 
    product_name,
    sales_amount,
    RANK() OVER (ORDER BY sales_amount DESC) as sales_rank,

FROM product_sales;

Say three products tie for second place with $5,000 in sales:

RANK() gives them all rank 2, then jumps to rank 5 for the next product

I use RANK() when I want the ranking to reflect the total number of items (like competition rankings). I use DENSE_RANK() when I want consecutive numbers.
SUM(), AVG(), MIN(), MAX() as Window Functions
You can use aggregate functions as window functions by adding OVER().

SELECT 
    order_date,
    total_amount,
    SUM(total_amount) OVER (ORDER BY order_date) as running_total,
    AVG(total_amount) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as seven_day_avg
FROM orders;

This shows each order with a running total of all sales up to that point, plus a 7-day moving average.
The ROWS BETWEEN 6 PRECEDING AND CURRENT ROW part is called a frame clause. It defines which rows to include in the calculation. Super powerful for trend analysis.
LAG() and LEAD() - Accessing Adjacent Rows
LAG() looks at previous rows. LEAD() looks at following rows.

SELECT 
    order_date,
    total_amount,
    LAG(total_amount) OVER (ORDER BY order_date) as previous_day_sales,
    total_amount - LAG(total_amount) OVER (ORDER BY order_date) as day_over_day_change
FROM daily_sales;

This compares each day's sales to the previous day. You can calculate day-over-day changes, week-over-week growth, or any sequential comparison.
I use LAG() all the time for "compare to previous period" questions that used to require self-joins.

PARTITION BY

Divides the data into groups. Like GROUP BY, but doesn't collapse rows. Optional—if you skip it, the window is the entire result set.
ORDER BY
Determines the order for ranking, row numbering, or frame calculations. Required for some functions (like ROW_NUMBER), optional for others.
Frame clause - Defines which rows within the partition to include. Only needed for running totals, moving averages, and similar calculations.

SUMMARY
After Using SQL for a while, I've realized that joins and window functions are really the two skills that separate basic querying from actual data analysis. Joins are all about connecting tables—INNER JOIN when you only want records that match in both tables, LEFT JOIN when you want to keep everything from your main table and just add matching info from another (or spot the gaps with NULLs), and occasionally FULL OUTER JOIN when you're doing data reconciliation. The key is always getting your join condition right, because a wrong join doesn't throw an error, it just gives you garbage results. Window functions changed the game for me because they let you do calculations across rows without losing the detail that GROUP BY would collapse. ROW_NUMBER() helps you find first or last records, RANK() handles competitive rankings, and using SUM() or AVG() with OVER() gives you running totals and moving averages. LAG() and LEAD() are lifesavers for comparing sequential data without messy self-joins. The real power comes from PARTITION BY, which divides your data into groups, and ORDER BY, which controls how those calculations flow. What makes these tools essential is that they turn what used to be multiple queries—or worse, exporting to Excel—into single, clean SQL statements. Joins connect your related data, window functions analyse patterns while keeping row-level detail, and together they're the foundation of pretty much every complex query you'll write.

DEV Community

SQL Joins and Window Functions: The Tools That Changed How I Query Data

Top comments (0)