Let us talk about joins and windows functions in sql
Joins combine tables
Window Functions analyze data without collapsing it
Many beginners confuse the two. Let’s break them down properly step y step with real examples, clear explanations, and practical queries
Part 1- SQL Joins — Combining Data Across Tables
Imagine you have two tables:
customers
customer_id | first_name|second_name
1 | Alice |Johnson
2 | Bob |Njagi
orders
order_id| customer_id |Order_amount
1 | 1 | 250
2 | 1 | 300
3 | 2 | 150
If you want to see who made which order, you need a ##JOIN##.
What is a JOIN?
A JOIN allows you to combine rows from two or more tables based on a related column.
In simple terms:
A JOIN connects tables using a common key.
INNER JOIN
Returns only matching rows.
SELECT customer_id, order_id,
order_amount,
FROM customers,
INNER JOIN orders,
ON customer_id = customer_id;
Result:
Alice | 1 | 250
Alice | 2 | 300
Bob | 3 | 150
If there’s no match, the row is excluded
LEFT JOIN
Returns all rows from the left table, even if there’s no match.
SELECT customer_name,
order_id,orders_amount,
FROM customers,
LEFT JOIN orders,
ON customer_id = customer_id;
If a customer has no orders, they still appear with NULL values for order columns.
RIGHT JOIN
Opposite of LEFT JOIN returns all rows from the right table.
FULL OUTER JOIN
Returns all rows from both tables matched where possible.
Key Insight About Joins
Joins increase columns.
They bring data from multiple tables into a single result set.
They do NOT calculate ranking, running totals, or row by row analytics.
That’s where Window Functions come in.
Part 2- Window Functions Analyzing Without Collapsing Data
Window functions are different.
They:
Do NOT reduce rows (unlike GROUP BY)
Perform calculations across related rows
Allow row-level analytics
This is extremely important.
Example question
What if we want:
Total spending per customer, but still show each individual order?
If you use GROUP BY:
SELECT customer_id,
SUM(amount) AS total_spent,
FROM orders
GROUP BY customer_id;
You get
1 | 550
2 | 150
But you lose individual orders.
Enter Window Functions
SELECT order_id,customer_id,amount,
SUM(amount) OVER (PARTITION BY customer_id)
AS total_spent
FROM orders;
Result:
1 | 1 | 250 | 550
2 | 1 | 300 | 550
3 | 2 | 150 | 150
Now you have
Each order
AND total per customer
Without collapsing rows
That’s a better way to do it.
Understanding OVER()
The magic happens inside the OVER() clause.
PARTITION BY
Groups rows logically (like GROUP BY), but does not collapse them.
ORDER BY
Defines order within each partition(basically how you want them show).
Example: Ranking orders by amount.
SELECT order_id, customer_id,amount,
RANK() OVER (PARTITION BY customer_id ORDER BY amount DESC)
AS customer_rank,
FROM orders;
This ranks each customer’s orders separately.
Common Window Functions You Should Know
- ROW_NUMBER()
Gives unique row numbers.
ROW_NUMBER() OVER (ORDER BY amount DESC)
- RANK()
Gives same rank for ties, skips numbers.
- DENSE_RANK()
Gives same rank for ties, does NOT skip numbers.
- SUM() OVER()
Running totals
SELECT order_id, amount,
SUM(amount) OVER (ORDER BY order_id)
AS running_total
FROM orders;
- LAG() and LEAD()
Compare rows to the ones before or the ones after.
SELECT order_id, amount,
LAG(amount) OVER (ORDER BY order_id)
AS previous_amount
FROM orders;
Very useful for time-series analysis
Joins vs Window Functions
The Real Difference
Here’s the more clearer distinction
Joins
- Combine tables
- Increase columns
- Used to bring related data
- Based on keys
Window Functions
- Analyze rows within a table
- Add calculated insights
- collapsing rows
- Based on partitions and order
When Should You Use Each?
Use JOIN when-
You need data from multiple tables
You’re connecting facts and dimensions
You’re building analytical datasets
Use Window Functions when-
You need ranking
You need running totals
You need comparisons between rows
-You want aggregates without GROUP BY
In everyday analytics and data engineering, you often use BOTH together.
Example
SELECT customer_name, order_id,order_amount,
SUM(order_amount) OVER (PARTITION BY customer_id) AS total_spent
FROM customers
JOIN orders
ON customer_id = customer_id;
This combines tables AND applies analytics
That’s production-level SQL.
In conclusion
Joins connect tables using keys.
Window functions perform analytics without collapsing rows.
GROUP BY reduces rows, window functions do not.
PARTITION BY is like GROUP BY, but keeps detail rows.
Modern data work heavily relies on window functions.
If you’re serious about becoming strong in SQL especially as a Data Engineer mastering both concepts is non-negotiable.
you can check more of my articles on https://www.linkedin.com/in/thomas-wangeci-065469194/



Top comments (0)