At this point most of us don't really write SQL from scratch anymore. We describe what we need, tweak a prompt, maybe adjust a few lines and move on, because the query will get written anyway and the job still gets done.
Interviews, however, seem to have chosen stability over evolution. You are handed a schema, a problem statement and a blank editor and the expectation is that you will reconstruct a correct query on the spot, calmly and without external help, as if this remains the default way SQL is produced in real projects rather than something we mostly stopped doing years ago. It feels slightly time-shifted, but it is also the format that continues to decide who passes and who does not.
Complaining about it does not change much, so the more practical question is how to prepare without turning it into an exercise in memorizing fifty unrelated queries that only make sense in isolation. The reassuring part is that most of these problems are not random at all. They fall into a small number of recurring patterns and once you start recognizing them, the task shifts from remembering syntax to recognizing structure: grouping, picking one row per entity, comparing sequential records, checking whether something exists, stitching start and stop events, finding streaks, detecting overlapping ranges.
Even if you do not remember the exact syntax, you usually know the direction and direction is far more valuable in a blank editor than perfect recall of keywords in a world where syntax is the easiest thing to generate.
What "SQL patterns" actually mean
When I talk about SQL patterns, I do not mean memorizing functions or collecting a mental list of keywords. A pattern is not ROW_NUMBER or CASE or EXISTS. It is a recurring way of looking at a problem and recognizing its shape before you even think about the exact syntax.
When you read a task carefully, certain signals tend to appear. Words like "latest", "per user", "previous", "missing", "at least N" are rarely accidental. They usually point to a specific class of solution, even if the schema and business context change every time.
"Latest per user" is almost never just aggregation; it usually means ranking within a group and selecting one row. "Previous value" is not about grouping at all; it is about sequential comparison and ordered data. "Users without orders" is not really about joining tables for the sake of it; it is about checking whether something exists or does not exist.
That is what I mean by patterns. Most SQL interview problems fall into a relatively small set of these structures. Real problems often combine two of them, like aggregation plus existence or ranking plus conditional logic, but once you can see the individual shapes, the combinations stop being scary.
One more thing before we start. Throughout this article I use CTEs (the WITH ... AS syntax) to break queries into readable steps. If you have not used them before, a CTE is just a named subquery that you define at the top and reference below. It does not change what the query does, it just makes it easier to read. Think of it as giving a name to an intermediate result so you do not have to nest everything inside everything else.
Let's look at the patterns.
1. Aggregation
Aggregation is probably the most common pattern you will see and also the one that looks misleadingly simple. At its core it is about collapsing multiple rows into a single result per entity and making a decision at that level.
The signals are usually straightforward. Phrases like "for each user," "per department," "per product" or anything that sounds like "how many" or "how much" are almost always pointing in this direction. If the problem can be rephrased as "for each X, calculate Y" you are very likely dealing with aggregation.
What changes from task to task is the story. Sometimes you are counting reports, sometimes orders, sometimes transactions, sometimes computing averages or sums. What does not change is the approach: define the grouping key, compute an aggregate and possibly filter based on that aggregate. That means GROUP BY, one of the aggregate functions (COUNT, SUM, AVG, etc.), and HAVING when the filter applies to the result of the aggregation rather than to individual rows.
Let's take a classic example from the LeetCode Top SQL 50 list, Managers with at Least 5 Direct Reports.
It asks you to find managers who have at least five people reporting to them. The signal here is "at least 5 direct reports," which is a count with a threshold, so aggregation with a HAVING clause. The grouping key is managerId, because you want to count how many employees share the same manager.
SELECT e2.name
FROM (
SELECT managerId, COUNT(*) AS c
FROM Employee
GROUP BY managerId
HAVING c >= 5
) res
INNER JOIN Employee e2
ON res.managerId = e2.id;
The subquery groups employees by their manager, counts each group, keeps only those with five or more, and the outer join brings back the manager's name. Group, count, filter, look up the detail you need.
Common mistakes to watch for: trying to return columns that are not part of the group, using WHERE when you need HAVING (remember, WHERE filters rows before grouping, HAVING filters after) and misidentifying what the grouping key actually is, which usually means you misread what "per entity" means in the problem.
If you want more practice with the same pattern, try Find Followers Count and Number of Unique Subjects Taught by Each Teacher, both are pure single table aggregation with no distractions. For a trickier variation, Customers Who Bought All Products uses the same GROUP BY + HAVING structure but compares against a subquery to check that a customer has every product, not just a fixed threshold.
2. Conditional Aggregation
Conditional aggregation is what happens when plain aggregation is not enough because you need to split the result by some condition without splitting the query itself. Instead of writing separate queries for each category and combining them, you compute multiple metrics in one pass. By the way, this is also how you pivot rows into columns in databases that do not have a dedicated PIVOT keyword.
The signals are tasks that ask for several counts or sums side by side, usually broken down by status, type or category. "Count how many approved and how many rejected," "total revenue from domestic vs international," "number of completed orders and number of cancelled orders per user." Whenever you see a problem that wants multiple metrics from the same table grouped the same way but filtered differently, you are looking at conditional aggregation.
The approach is wrapping CASE WHEN inside your aggregate functions. Instead of filtering rows out, you tell each aggregate which rows to care about. SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) counts only approved rows, while COUNT(*) still counts everything. Same grouping, same pass, different conditions.
A good example from the LeetCode Top SQL 50 list is Monthly Transactions I.
The task asks you to report for each month and country the total number of transactions, the number of approved ones, the total amount and the approved amount. The signal is that you need both "all transactions" and "only approved" metrics in the same result. That is two different filters on the same grouping, which is conditional aggregation.
SELECT
DATE_FORMAT(trans_date, '%Y-%m') AS month,
country,
COUNT(*) AS trans_count,
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_amount
FROM Transactions
GROUP BY month, country;
You group by month and country, then use unconditional aggregates for the totals and conditional ones for the approved subset. One query, one scan, all four metrics at once.
Common mistakes to watch for: writing separate queries or subqueries for each metric when a single CASE WHEN inside the aggregate would do, forgetting the ELSE 0 (which can introduce NULLs that quietly break your sums) and confusing this pattern with WHERE filtering, which would remove the rows you still need for the total counts.
If you want more practice, try Queries Quality and Percentage and Count Salary Categories, both use conditional logic inside aggregates on a single table.
3. Top N Per Group (Ranking)
This is the pattern people confuse with aggregation most often and the difference matters. Aggregation collapses a group into one number. Ranking keeps the actual rows but picks which ones you want from each group.
The signals are phrases like "latest," "highest," "most recent," "top 3," combined with a per entity qualifier like "per user," "per department," "per category." The key tell is that the problem wants you to return full rows, not just a count or a sum. If someone asks for "the highest salary per department," they do not want the number, they want the employee. That is ranking, not aggregation.
The approach is window functions: ROW_NUMBER, RANK or DENSE_RANK partitioned by the group and ordered by whatever defines "top." You compute the rank in a subquery or CTE, then filter on it in the outer query. The choice between the three functions depends on how you want to handle ties. ROW_NUMBER gives exactly one row per position regardless of ties. RANK leaves gaps after ties (1, 1, 3). DENSE_RANK does not leave gaps (1, 1, 2), which is what you usually want when the problem says "top 3" and means "top 3 distinct values."
A good example is Department Top Three Salaries from the LeetCode Top SQL 50 list.
The task asks you to find employees who earn one of the top three unique salaries in their department. The signal is "top three" + "in each department" and the problem wants employee names and salaries back, not just numbers. That is ranking. Since "top three" here means three distinct salary levels, you need DENSE_RANK.
WITH RankedSalaries AS (
SELECT
e.name AS employee,
e.salary,
e.departmentId,
DENSE_RANK() OVER (
PARTITION BY e.departmentId
ORDER BY e.salary DESC
) AS salary_rank
FROM Employee e
)
SELECT
d.name AS Department,
r.employee,
r.salary
FROM RankedSalaries r
JOIN Department d
ON r.departmentId = d.id
WHERE r.salary_rank <= 3;
The CTE ranks every employee within their department by salary descending, then the outer query keeps only those with rank 3 or less and joins to get the department name.
Common mistakes to watch for: using ROW_NUMBER when the problem expects ties to be preserved (two people with the same salary should both appear), using RANK instead of DENSE_RANK when the problem says "top 3" and means three distinct values not three positional ranks and forgetting that the window function has to go in a subquery or CTE because you cannot filter on it directly in the same WHERE clause where it is computed.
If you want more practice, try Product Sales Analysis III, which asks for the first year of sales per product. It can be solved with a simple MIN subquery or with RANK() OVER (PARTITION BY product_id ORDER BY year), which makes it a good problem to compare both approaches. For something harder FAANG Stock Min-Max combines ranking with aggregation: you first compute monthly prices, then rank twice to find the highest and lowest per ticker.
4. Sequential Analysis (LAG / LEAD)
This pattern shows up whenever the problem asks you to compare a row with its neighbor. Not a different group, not an aggregate, but the row right before or right after in some ordered sequence.
The signals are words like "previous," "next," "change," "difference," "consecutive" or anything that implies ordered comparison. Time based data is the most common context: "compared to yesterday," "change from last month," "three consecutive days." But it also applies to any ordered sequence, like consecutive IDs or ranked entries.
The approach is LAG and LEAD window functions. LAG gives you access to the previous row's value, LEAD gives you the next one. You define the order with ORDER BY inside the window and then you can compare, subtract or check conditions between the current row and its neighbor. The result usually goes into a subquery or CTE because you need to compute the shifted value first and then filter on it.
A good example is Consecutive Numbers from the LeetCode Top SQL 50 list.
The task asks you to find all numbers that appear at least three times consecutively. The signal is "consecutive," which means you need to look at neighbors in sequence, not count occurrences globally. This is not aggregation. You need to check that a row's value equals both its previous and its next value.
WITH cte AS (
SELECT
num,
LEAD(num) OVER (ORDER BY id) AS next,
LAG(num) OVER (ORDER BY id) AS prev
FROM Logs
)
SELECT DISTINCT num AS ConsecutiveNums
FROM cte
WHERE num = next AND num = prev;
The CTE adds two columns to each row: the next value and the previous value. The outer query keeps only rows where all three match, meaning the number appears at least three times in a row. DISTINCT handles cases where a number appears consecutively more than three times.
Common mistakes to watch for: forgetting that LAG and LEAD return NULL for the first and last rows respectively (which can break comparisons if you do not account for it), ordering by the wrong column (the order has to reflect the actual sequence, not just any column) and trying to solve sequential problems with self joins when a window function would be simpler and more readable.
If you want more practice, try Rising Temperature, which asks you to find days where the temperature was higher than the previous day. It is a simpler version of the same pattern, just two consecutive rows instead of three and worth trying both with LAG and with a self join to see which reads better. For something harder Repeated Payments uses LAG partitioned by three columns to detect duplicate credit card charges within a 10 minute window.
5. Event Pairing
This is my favorite pattern to ask about in interviews because it tests whether someone can recognize that rows in a table are not always independent records. Sometimes two rows are two halves of the same event and the real information only appears when you stitch them together.
The signals are tables where each entity has multiple rows with a status or event type column: "start" and "stop," "open" and "close," "login" and "logout." The problem then asks for a duration, a gap or a total time. Whenever you see a table that logs state changes and the question asks about the time between them, you are looking at event pairing. This uses the same tool as sequential analysis (chapter 4) but solves a fundamentally different kind of problem: instead of comparing neighbors, you are combining them into one record.
The approach is LEAD (or LAG): for each "start" row, grab the next row's timestamp to find the matching stop. Partition by the entity, order by time and you have your pairs.
A good example is Average Time of Process per Machine.
The task asks you to compute the average processing time per machine, where each process has a "start" and "end" row. The signal is a table with activity_type being either "start" or "end," and the question asking for time between them. Two rows, one event.
WITH paired AS (
SELECT
machine_id,
activity_type,
timestamp,
LEAD(timestamp) OVER (
PARTITION BY machine_id, process_id
ORDER BY timestamp
) AS end_time
FROM Activity
)
SELECT
machine_id,
ROUND(AVG(end_time - timestamp), 3) AS processing_time
FROM paired
WHERE activity_type = 'start'
GROUP BY machine_id;
The CTE uses LEAD to attach the next timestamp to each row within the same machine and process. The outer query filters for "start" rows only, so end_time is the matching stop, then averages the difference per machine.
This problem can also be solved with conditional aggregation (MAX(CASE WHEN 'start' ...) and MAX(CASE WHEN 'end' ...) grouped by process), which works well when start and stop share an explicit key. But LEAD is the more general tool, especially when rows just alternate in order and there is no shared key tying them together.
Common mistakes to watch for: assuming that start and stop always alternate perfectly (real data has gaps, missing events and duplicates), forgetting to filter for only "start" rows after using LEAD (otherwise you also pair stop with the next start) and using the wrong partition, which pairs events from different entities together.
6. Gaps and Islands
This pattern is about finding streaks in data, consecutive runs of rows that share some property, where the grouping is not explicit anywhere in the table. Nobody marked where a streak starts or ends. You have to discover it from the sequence itself.
The signals are words like "consecutive," "streak," "in a row," "continuous," "uninterrupted." The problem gives you ordered data and asks you to find groups of rows that form an unbroken chain: consecutive days of activity, consecutive years of filing, consecutive months of subscription. The difference from sequential analysis (chapter 4) is that you are not just comparing neighbors, you are identifying entire groups of consecutive rows and measuring or filtering them.
The approach is a classic technique that looks like a trick the first time you see it but becomes second nature quickly. The idea is: if you have a sequence of consecutive values (like dates or years) and you subtract a ROW_NUMBER from each value, the result is the same for all rows in the same consecutive run and different when there is a gap. That constant becomes your group identifier.
Think of it this way. If a user made purchases on days 5, 6, 7, 10, 11, the ROW_NUMBER within that user would be 1, 2, 3, 4, 5. Subtract ROW_NUMBER from the day: 5−1=4, 6−2=4, 7−3=4, 10−4=6, 11−5=6. The first streak gets group 4, the second gets group 6. The actual numbers do not matter, what matters is that consecutive rows produce the same group value.
A good example is User Shopping Sprees.
The task asks you to find users who made purchases on 3 or more consecutive days. The signal is "3 or more consecutive days," which is exactly island detection: find streaks of consecutive dates and check their length.
WITH daily_purchases AS (
SELECT DISTINCT
user_id,
transaction_date::date AS purchase_date
FROM transactions
),
islands AS (
SELECT
user_id,
purchase_date,
purchase_date - ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY purchase_date
)::int AS grp
FROM daily_purchases
)
SELECT DISTINCT user_id
FROM islands
GROUP BY user_id, grp
HAVING COUNT(*) >= 3
ORDER BY user_id;
The first CTE deduplicates to one row per user per day, since a user might make multiple purchases on the same day and you care about distinct days. The second CTE applies the classic trick: purchase_date - ROW_NUMBER() produces the same value for consecutive dates, giving each streak its own group identifier. The outer query groups by user and island, keeps only streaks of 3 or more days and returns the distinct user IDs.
Common mistakes to watch for: forgetting to deduplicate before applying ROW_NUMBER (duplicate dates break the consecutive subtraction trick), not partitioning ROW_NUMBER by the right entity (which merges streaks from different users into one sequence).
If you want more practice, try Consecutive Filing Years on DataLemur, which is the same pattern with years instead of dates and adds a product filter on top.
7. Deduplication (Picking One Row)
This pattern overlaps with ranking (chapter 3) but the intent is different. With ranking you are selecting the top entries from a group. With deduplication you are cleaning up data that should not have multiple rows in the first place.
The signals are words like "duplicate," "remove," "keep only one," "unique per user" or "latest record." Sometimes the problem explicitly says delete. Sometimes it asks you to return a result as if duplicates did not exist. Either way the core task is the same: define what makes two rows "the same," decide which one survives, and get rid of the rest.
The approach is ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...). You partition by whatever defines a duplicate group, order by whatever decides which row wins, and then either keep rn = 1 in a SELECT or delete everything that is not rn = 1. The same mechanic works for both reading and cleaning.
A good example is Delete Duplicate Emails from the LeetCode Top SQL 50 list.
The task asks you to delete all duplicate emails, keeping only the row with the smallest ID for each email. The signal is right in the title: "delete duplicate." The grouping key is email, and the tie breaker is ID.
DELETE FROM Person
WHERE id NOT IN (
SELECT id FROM (
SELECT
id,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY id
) AS rn
FROM Person
) ranked
WHERE rn = 1
);
The inner query partitions by email, orders by ID, and assigns a row number. The middle query keeps only rn = 1, which is the smallest ID per email. The outer DELETE removes everything else. If this were a SELECT problem instead of a DELETE, you would just use the inner two layers and return the result directly.
This can also be solved with a self join (DELETE p FROM Person p, Person q WHERE p.email = q.email AND p.id > q.id) which is shorter but less general. The ROW_NUMBER approach scales better because if the problem changes from "smallest ID" to "most recent timestamp" you just change the ORDER BY, and if it changes from DELETE to SELECT you just drop the outer layer.
Common mistakes to watch for: reaching for DISTINCT when the problem needs you to actually choose which row to keep (DISTINCT deduplicates the output but gives you no control over which row's data you get), not defining the ordering properly so you keep a random row instead of the one the problem specifies and forgetting about tie breaking when two rows are identical on the ordering column as well.
8. Existence / Anti-Existence
This pattern is about checking whether something exists or does not exist in a related table, without actually needing any data from that table. You are not joining to pull columns. You are joining to answer a yes or no question.
The signals are phrases like "users without orders," "never purchased," "did not make," "at least one" or "has no matching." Whenever the problem is asking you to filter one table based on the presence or absence of rows in another table, you are looking at existence logic.
The approach has two common forms. The first is EXISTS / NOT EXISTS with a correlated subquery. The second is LEFT JOIN + IS NULL, where you join to the related table and then filter for rows where the join found nothing. Both do the same thing. LEFT JOIN is often more intuitive for people who think in terms of joins, while EXISTS can be more readable when the condition is complex.
A good example is Customer Who Visited but Did Not Make Any Transactions.
The task asks you to find customers who visited but did not make any transaction during that visit and count how many times that happened. The signal is "visited but did not make any transactions," which is classic anti-existence: you want visits where no matching transaction exists.
SELECT customer_id, COUNT(*) AS count_no_trans
FROM Visits v
LEFT JOIN Transactions t
ON t.visit_id = v.visit_id
WHERE transaction_id IS NULL
GROUP BY customer_id;
The LEFT JOIN keeps all visits regardless of whether a transaction exists. The WHERE transaction_id IS NULL filters down to only the visits with no match. Then you group by customer and count. The same result could also be written with NOT EXISTS:
SELECT customer_id, COUNT(*) AS count_no_trans
FROM Visits v
WHERE NOT EXISTS (
SELECT 1 FROM Transactions t
WHERE t.visit_id = v.visit_id
)
GROUP BY customer_id;
Both are valid. Pick whichever reads more naturally to you in the moment.
Common mistakes to watch for: using INNER JOIN when you need anti-existence (which drops exactly the rows you are looking for), using NOT IN with a subquery that can return NULLs (if any value in the subquery result is NULL, NOT IN returns nothing, which is one of the most subtle bugs in SQL) and confusing this pattern with a regular join when the problem does not actually need any columns from the second table.
If you want more practice, try Employees Whose Manager Left the Company, where you need to find employees whose manager ID does not exist in the employees table anymore. For something harder Reactivated Users applies the same anti-existence logic to time series: find users who logged in this month but did not log in the previous month.
9. Self-Join / Pairwise Comparison
This pattern comes up when all the data lives in one table and you need to compare rows from that table against each other. There is no second table to join to. The relationship is inside the dataset itself.
The signals are problems where entities in a table reference other entities in the same table: employees and their managers, friends and friend requests, records that need to be compared with other records from the same source. Anything where the problem says "find pairs," "compare with other rows in the same table" or where a column like manager_id or reports_to points back to the same table's primary key.
The approach is joining the table to itself with aliases. You treat one copy as the "main" row and the other as the "related" row and the join condition defines the relationship between them. The key is being precise about that condition, because a sloppy self join can easily produce duplicate pairs or cartesian explosions.
This might look similar to sequential analysis (chapter 4) since both compare rows within the same table. The difference is the type of relationship. LAG/LEAD works when rows are neighbors in an ordered sequence. Self join works when rows are connected by a key or condition that has nothing to do with ordering. You cannot solve "find each employee's manager" with LAG because there is no ordering where the next row is your manager. The connection is hierarchical, not positional, and a self join is the only way to follow it.
A good example is The Number of Employees Which Report to Each Employee.
The task asks you to find, for each manager, how many employees report to them and the average age of those employees. The signal is reports_to pointing back to employee_id in the same table. One table, two roles: the employee and their manager.
SELECT
m.employee_id,
m.name,
COUNT(e.employee_id) AS reports_count,
ROUND(AVG(e.age)) AS average_age
FROM Employees e
JOIN Employees m
ON e.reports_to = m.employee_id
GROUP BY m.employee_id, m.name
ORDER BY m.employee_id;
One copy of the table (e) represents employees, the other (m) represents managers. The join connects each employee to their manager through the reports_to foreign key, which is a hierarchical relationship that cannot be solved with LAG/LEAD because there is no ordering where the next row happens to be your manager. From there it is just aggregation: count the reports, average their age, group by manager.
Common mistakes to watch for: forgetting to alias the table differently on each side (which makes the query ambiguous), using the wrong join direction so you get employees without reports instead of managers with reports and not accounting for rows that match themselves when the join condition allows it, which can inflate counts or create false pairs.
If you want more practice, try Rising Temperature, which joins the Weather table to itself to compare each day's temperature with the previous day. It shows self join used for sequential comparison rather than hierarchical relationships.
10. Cartesian Expansion (CROSS JOIN)
This pattern shows up when the data you need does not exist yet. The table has records of what happened, but the problem wants you to report on every possible combination, including the ones where nothing happened. You need to generate the full matrix first and then fill in the actuals.
The signals are problems that expect rows in the output for combinations that have no data. "Show attendance for every student in every subject" means you need a row even if a student never took that subject. "Revenue per product per month" means you need a row even for months with zero sales. Whenever the output should include missing combinations with zeroes or NULLs, you are looking at a CROSS JOIN.
The approach is to generate all possible combinations first with CROSS JOIN, then LEFT JOIN to the actual data to fill in what exists. The CROSS JOIN builds the skeleton, the LEFT JOIN adds the flesh, and anything that did not match stays as zero or NULL.
A good example is Students and Examinations.
The task asks you to report how many times each student attended each exam. The catch is that every student should appear with every subject, even if they never took it. The signal is three tables (Students, Subjects, Examinations) where the output needs every student × subject pair, not just the ones with exam records.
SELECT
s.student_id,
s.student_name,
sub.subject_name,
COUNT(e.subject_name) AS attended_exams
FROM Students s
CROSS JOIN Subjects sub
LEFT JOIN Examinations e
ON s.student_id = e.student_id
AND sub.subject_name = e.subject_name
GROUP BY s.student_id, s.student_name, sub.subject_name
ORDER BY s.student_id, sub.subject_name;
The CROSS JOIN between Students and Subjects creates every possible student-subject pair. The LEFT JOIN to Examinations attaches actual attendance records where they exist. COUNT(e.subject_name) counts only the non-NULL matches, so students who never took a subject get zero. Without the CROSS JOIN, those zero rows would simply be missing from the output.
Common mistakes to watch for: not recognizing that you need a CROSS JOIN in the first place (many people try to solve this with just LEFT JOIN and wonder why combinations with no data are missing), exploding row counts by cross joining large tables without understanding how many combinations you are generating and using COUNT(*) instead of COUNT(column_from_left_joined_table), which counts the NULL rows as 1 instead of 0.
If you want more practice, try 3-Topping Pizzas on DataLemur, which cross joins a table to itself three times to generate all possible 3-topping combinations and uses < comparisons to eliminate duplicates and enforce alphabetical order in one step.
11. Interval Overlap
This pattern comes up when you have rows that represent ranges, usually time ranges and you need to find where they intersect. It is surprisingly common in real interviews, especially for companies that deal with scheduling, bookings or resource allocation.
The signals are problems that mention "overlapping," "conflicting," "at the same time," "double booked" or "concurrent." Whenever entities have a start and an end, and the question asks whether any of them collide, you are looking at interval overlap.
The approach relies on one condition that is worth memorizing: two intervals [a_start, a_end] and [b_start, b_end] overlap when a_start < b_end AND b_start < a_end. It is easier to think about it the other way: two intervals do not overlap when one ends before the other starts. The overlap condition is simply the negation of that.
Here is a problem to illustrate. Imagine a meeting_rooms table where each row is a booking:
meeting_rooms
+------------+------------+---------------------+---------------------+
| booking_id | room_id | start_time | end_time |
+------------+------------+---------------------+---------------------+
| 1 | A | 2024-03-01 09:00 | 2024-03-01 10:00 |
| 2 | A | 2024-03-01 09:30 | 2024-03-01 10:30 |
| 3 | A | 2024-03-01 11:00 | 2024-03-01 12:00 |
| 4 | B | 2024-03-01 09:00 | 2024-03-01 10:30 |
| 5 | B | 2024-03-01 10:00 | 2024-03-01 11:00 |
+------------+------------+---------------------+---------------------+
The task: find all pairs of bookings that conflict, meaning they are in the same room and their time ranges overlap.
SELECT
a.booking_id AS booking_1,
b.booking_id AS booking_2,
a.room_id
FROM meeting_rooms a
JOIN meeting_rooms b
ON a.room_id = b.room_id
AND a.booking_id < b.booking_id
AND a.start_time < b.end_time
AND b.start_time < a.end_time;
The self join pairs every two bookings in the same room. a.booking_id < b.booking_id ensures you get each conflicting pair once instead of twice. The last two conditions are the overlap check: A starts before B ends, and B starts before A ends.
For room A, bookings 1 and 2 overlap (9:00–10:00 and 9:30–10:30). Booking 3 does not conflict with either because it starts at 11:00. For room B, bookings 4 and 5 overlap (9:00–10:30 and 10:00–11:00).
Common mistakes to watch for: getting the overlap condition wrong (people often check a_start BETWEEN b_start AND b_end which misses cases where A fully contains B), forgetting a.booking_id < b.booking_id and getting duplicate pairs or rows matching themselves, and using <= instead of < when the business rule says that a meeting ending at 10:00 and another starting at 10:00 do not conflict.
If you want to practice this pattern on real problems, User Concurrent Sessions and Merge Overlapping Events in the Same Hall are both good interval overlap exercises, though both require a paid subscription.
12. Running Totals / Cumulative Metrics
This pattern is about accumulating values as you move through an ordered sequence. You are not grouping rows into buckets like aggregation and you are not comparing with neighbors like LAG/LEAD. You are keeping a running count that grows with each row.
The signals are words like "running total," "cumulative," "so far," "up to this point," "as of each date." Sometimes the problem does not use those words directly but describes a threshold that gets checked row by row, which is a running total in disguise.
The approach is SUM() OVER (ORDER BY ...). The ORDER BY inside the window defines the sequence and the SUM accumulates as it moves through it. Add PARTITION BY if the accumulation resets per group.
A good example is Last Person to Fit in the Bus.
The task asks you to find the last person who can board a bus with a 1000 kg weight limit, where people board in order of their turn column. The problem does not say "running total" but that is exactly what it is: you accumulate weight person by person and stop when you hit the limit.
WITH cumulative AS (
SELECT
person_name,
SUM(weight) OVER (ORDER BY turn) AS total_weight
FROM Queue
)
SELECT person_name
FROM cumulative
WHERE total_weight <= 1000
ORDER BY total_weight DESC
LIMIT 1;
The CTE computes a running total of weight ordered by turn. The outer query keeps only rows where the total is still within the limit and picks the last one. No GROUP BY anywhere, because this is not aggregation. Each row retains its identity and gets a cumulative value attached to it.
Common mistakes to watch for: omitting the ORDER BY inside the window function (which makes the accumulation order undefined and the results unpredictable), confusing this with GROUP BY (which collapses rows, while a window function keeps them) and partitioning when you should not or vice versa, which either resets the running total too often or never resets it when it should.
If you want more practice, try Restaurant Growth, which asks for a rolling 7 day average of restaurant spending. It uses the same SUM() OVER (ORDER BY ...) mechanic but with a RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW frame, adding a layer of complexity on top of the basic running total.
Conclusion
You do not need to memorize twelve patterns. You need to solve enough problems that when you read a new one, something clicks and you think "I have seen this shape before." That is the whole point. The blank editor is not asking you to recall syntax. It is asking whether you know where you are going. If you do, the rest is just typing.
There are other patterns you will encounter, but most of them are combinations of these twelve. "Find employees earning above their department average" sounds like its own thing, but it is really just a window function (AVG() OVER (PARTITION BY department)) attached to each row and then filtered, which is running totals logic applied to a different aggregate. Pivoting rows into columns is just conditional aggregation with CASE WHEN inside SUM or COUNT. Once you have the building blocks, the combinations assemble themselves.
For quick reference, here is the cheat sheet:
| Pattern | Signal | Core tool |
|---|---|---|
| Aggregation | "for each," "how many," "per user" | GROUP BY + HAVING |
| Conditional aggregation | multiple metrics, status breakdown | CASE WHEN inside SUM/COUNT |
| Top N per group | "latest," "highest," "top 3 per" | DENSE_RANK() OVER (PARTITION BY) |
| Sequential analysis | "previous," "next," "consecutive" | LAG / LEAD |
| Event pairing | "start/stop," "duration," "session" | LEAD partitioned by entity |
| Gaps and islands | "streak," "consecutive days," "in a row" | ROW_NUMBER subtraction trick |
| Deduplication | "duplicate," "keep one," "unique per" | ROW_NUMBER() WHERE rn = 1 |
| Existence / anti-existence | "without," "never," "has no" | LEFT JOIN IS NULL / NOT EXISTS |
| Self-join | "same table reference," "reports to" | JOIN table to itself with aliases |
| Cartesian expansion | "all combinations," "missing pairs" | CROSS JOIN + LEFT JOIN |
| Interval overlap | "conflicting," "double booked" | a_start < b_end AND b_start < a_end |
| Running totals | "cumulative," "so far," "up to" | SUM() OVER (ORDER BY) |
Top comments (2)
This is one of the clearest breakdowns of SQL patterns I’ve seen.
I agree with the pattern-first approach, but it also highlights how outdated many interview formats are.