DEV Community

mary thandy
mary thandy

Posted on

A Guide to SQL Joins and Window Functions

Data, in its raw state, is seldom found in a single neat table. The reality of working in analytics or data engineering is needing to synthesize information from disjointed sources and compute metrics that rely on understanding relationships and patterns across a dataset. This is where SQL excels. If you want to move beyond simple filtering and aggregation, you must master two foundational tools: Joins and Window Functions.

Understanding SQL Joins

Joins are the mechanism we use to reunite this data into meaningful results. A join connects rows from two or more tables based on a related column between them.
Joins are how you put the puzzle pieces back together to get a full picture.

Below are the most common types of joins:

A. INNER JOIN (The Matchmaker)
This is the most common join. It only returns rows where there is a matching value in both tables. If a row doesn't have a match, it is excluded.
Use Case:Showing a list of customers who have actually placed orders.
B. LEFT JOIN (The "Don't Leave Me" Join)
This returns all rows from the left table, and the matched rows from the right table. If there is no match on the right, you get NULL values.
Use Case: Finding all customers, including those who haven't bought anything yet (great for identifying "cold" leads).
C. RIGHT JOIN
The exact opposite of a Left Join. It returns all rows from the right table.
Pro Tip: Most developers just use LEFT JOIN and switch the table order to keep the code easier to read.
D. FULL OUTER JOIN (The "Everyone Invited" Join)
Returns all rows when there is a match in either the left or right table. If there’s no match, the missing side gets a NULL.

Use Case: Merging two lists (like a list of local leads and a list of international leads) to see everyone at once.

Basic Syntax

To join tables, you need a "Key" (usually an ID) that exists in both tables.

SELECT 
    tableA.column_name, 
    tableB.column_name
FROM TableA
INNER JOIN TableB 
    ON TableA.common_id = TableB.common_id;
Enter fullscreen mode Exit fullscreen mode

Real-World Example

Imagine you have a Students table and a Clubs table.

Students Table
| id | name |
|:---|:---|
| 1 | Alice |
| 2 | Bob |
| 3 | Charlie |

Clubs Table
| student_id | club_name |
|:---|:---|
| 1 | Chess |
| 2 | Music |
| 5 | Drama |

Which Join should you use?

"Who is in a club?" → INNER JOIN (Alice and Bob).

"List all students and their clubs (even if they have none)?" → LEFT JOIN (Alice, Bob, and Charlie with a NULL club).

"Show all clubs, even if the student ID doesn't exist in our table?" → RIGHT JOIN (Chess, Music, and Drama with a NULL name).

Introducing Window Functions

A window function performs a calculation across a set of table rows that are somehow related to the current row. Critically, it does not cause rows to become grouped into a single output row.
Standard SQL aggregation (GROUP BY) is powerful, but it comes with a major limitation: it reduces multiple rows into a single summary row.

Key Window Function Syntax

The definitive keyword is OVER, which defines the "window" or the set of rows the function will operate on. It typically has two optional but powerful clauses:

  • PARTITION BY: This divides the window into groups (e.g., partition by department or by region). The function runs independently on each partition.
  • ORDER BY: This defines the logical sequence of rows within each partition.

Common Categories of Window Functions

  • Ranking Functions: Ideal for finding the "top" items within a category. Key functions include ROW_NUMBER() (sequential number, no ties), RANK() (numbers with ties that skip the next number), and DENSE_RANK() (numbers with ties that do not skip numbers).
  • Aggregate Window Functions: You can use standard aggregates like SUM(), AVG(), or COUNT() within the OVER()clause to calculate things like running totals or moving averages.
  • Value-based Functions: These are used to compare the current row to surrounding rows.LAG()(looks at a previous row) and LEAD() (looks at a following row) are essential for identifying changes over time, like calculating day-over-day growth.

Example :You want to see each sale and the total accumulated sales so far.

SELECT 
    sale_date, 
    amount,
    SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;
Enter fullscreen mode Exit fullscreen mode

Summary

Here is a summary of the most crucial concepts from the article:

  • Joins are for Relationship Management: Joins are essential because relational databases are normalized. They allow you to reconstruct data across different tables based on common columns.
  • Left Join is a Key Tool: The LEFT JOIN is one of the most practical and frequently used joins, specifically for identifying mismatches or ensuring all records in a primary table are retained.
  • Window Functions Prescribe Rows: The fundamental difference between standard aggregation (GROUP BY)and window functions is that window functions retain the individual rows.
  • The OVER() Clause is the 'Window': The OVER() clause defines the specific set of rows a function operates on, often customized with PARTITION BY (to define subgroups) and ORDER BY (to define the order within subgroups).
  • Ranking and Change-Over-Time Analysis: Window functions make calculations like ranking (DENSE_RANK()), running totals (SUM()), and day-over-day growth (LAG()) intuitive and direct.

Top comments (0)