Part of the "SQL: Zero to Ninja" series.
Ever pulled a table and got back 10,000 rows when you only wanted the 12 that matter? You scroll and scroll, hunting for the paid orders, the gmail users, the cheap products. That is exhausting. SQL has a cleaner way: tell the database the rule up front, and let it hand you only the rows that pass.
The idea in one line
WHERE keeps only the rows that match your rule, and throws the rest away before you ever see them.
The metaphor: WHERE is a bouncer at the door
Picture a club with a bouncer at the door. Every row in your table is a person trying to get in. The bouncer has a rule on a clipboard. He checks each person one by one. Pass the rule, you are in. Fail it, you are out.
all rows --> [ bouncer: status = 'paid' ] --> only paid rows get in
WHERE is that bouncer. It looks at each row, checks it against your rule, and only lets the matching ones into your result.
The basic rule: comparison operators
The simplest rule is "this column equals that value".
SELECT name, total
FROM orders
WHERE status = 'paid'; -- only rows where status is exactly 'paid'
The bouncer also understands other comparisons:
| Operator | Means |
|---|---|
= |
equal |
<> |
not equal |
!= |
not equal (same thing) |
< |
less than |
> |
greater than |
<= |
less than or equal |
>= |
greater than or equal |
SELECT id, total
FROM orders
WHERE total > 100; -- only big orders
Combining rules: AND, OR, NOT
One rule is nice. Real life needs a few at once. The bouncer can check several things.
-
ANDmeans both must pass. -
ORmeans at least one must pass. -
NOTflips a rule around.
SELECT id, total, status
FROM orders
WHERE status = 'paid' AND total > 100; -- paid AND big
SELECT id, status
FROM orders
WHERE status = 'paid' OR status = 'shipped'; -- either one is fine
Why parentheses matter (the wrong way first)
Here is a trap that bites almost everyone. Say you want orders that are paid or shipped, and over 100.
A junior often writes this:
-- WRONG: this does not mean what you think
SELECT id, total, status
FROM orders
WHERE status = 'paid' OR status = 'shipped' AND total > 100;
AND is stronger than OR, so the database reads it like this:
status = 'paid' OR (status = 'shipped' AND total > 100)
So you get every paid order, even a 5 dollar one, plus shipped orders over 100. Surprise! That is not what you asked for.
The fix is parentheses. They tell the bouncer how to group the rules:
-- RIGHT: group the OR, then apply the AND
SELECT id, total, status
FROM orders
WHERE (status = 'paid' OR status = 'shipped') AND total > 100;
When you mix AND and OR, always add parentheses. It costs you two characters and saves you an hour of confusion.
Handy shortcuts: IN and BETWEEN
When you check one column against a list of values, IN is cleaner than a pile of ORs.
-- instead of: status = 'paid' OR status = 'shipped' OR status = 'refunded'
SELECT id, status
FROM orders
WHERE status IN ('paid', 'shipped', 'refunded');
When you want a range, BETWEEN reads nicely (and it includes both ends):
SELECT id, total
FROM orders
WHERE total BETWEEN 10 AND 100; -- 10 and 100 are both included
Fuzzy matching: LIKE with % and _
Sometimes you do not know the exact value. You want "anything ending in @gmail.com". That is LIKE, with two wildcards:
-
%matches any number of characters (even zero). -
_matches exactly one character.
SELECT name, email
FROM users
WHERE email LIKE '%@gmail.com'; -- any email that ends in @gmail.com
SELECT email
FROM users
WHERE email LIKE 'a%'; -- any email that starts with the letter a
SELECT name
FROM users
WHERE name LIKE '_at'; -- 'cat', 'bat', 'hat'... three letters ending in 'at'
The NULL trap (read this twice)
NULL is not zero and not an empty string. NULL means "unknown". The value is simply not there.
Here is the classic mistake:
-- WRONG: this returns nothing, ever
SELECT name FROM users WHERE country = NULL;
Why does it return nothing? Think about it. You are asking "is this unknown thing equal to unknown?" The honest answer is "I do not know", and the bouncer only lets in rows that are clearly true. "I do not know" is not true, so the row is rejected. Every time.
You cannot use = with NULL. You need a special check:
-- RIGHT
SELECT name FROM users WHERE country IS NULL; -- has no country
SELECT name FROM users WHERE country IS NOT NULL; -- has some country
Remember: for NULL, use IS NULL and IS NOT NULL, never = or <>.
A real case
Your boss asks: "Show me all paid orders over 100 from users in Egypt."
That is three rules. The country lives on the users table, so we filter there, and the order rules live on orders.
SELECT orders.id, orders.total, users.name
FROM orders
JOIN users ON users.id = orders.user_id
WHERE users.country = 'EG'
AND orders.status = 'paid'
AND orders.total > 100;
Three rules joined with AND, so a row only gets in if it passes all three. That is the bouncer doing exactly what you told him.
Gotchas
-
Mixing AND and OR without parentheses.
ANDbinds tighter, so group yourORs in parentheses or you get surprise rows. -
Using
= NULL. It never matches. UseIS NULL/IS NOT NULL. -
Forgetting the quotes. Text values need single quotes (
'paid'). Numbers do not (100).status = paidwill error because the database looks for a column namedpaid.
Recap
-
WHEREis a bouncer: it keeps only the rows that pass your rule. - Compare with
=,<>,<,>,<=,>=. - Combine rules with
AND,OR,NOT, and always parenthesize when you mixANDandOR. -
INchecks a list,BETWEENchecks a range (both ends included),LIKEdoes fuzzy matching with%and_. - For NULL, use
IS NULLandIS NOT NULL.= NULLnever works.
Your turn
Write one query that finds users whose email is not a gmail address and who do have a country set. Two rules, joined with AND. If you can explain why you used IS NOT NULL instead of <> NULL, you have got it.
Next up: Part 04: ORDER BY and LIMIT, where we sort our rows and grab just the top few.
Top comments (1)
Part 03 Practice: WHERE, Filtering Rows
Time to be the bouncer. Use the shared schema (
users,orders,products,order_items). Try each one yourself before peeking at the solution.1. Cheap products
Find the name and price of every product that costs less than 20.
Solution
Why: a single comparison rule,
<, keeps only the rows under 20.2. Paid or shipped orders
List the id and status of orders whose status is either
'paid'or'shipped'. UseIN.Solution
Why:
INis a clean way to check one column against a list of values, instead of writing twoORs.3. Mid-priced products with a range
Find products with a price between 50 and 200 (both included).
Solution
Why:
BETWEENincludes both ends, so 50 and 200 both count.4. Gmail users
Find the name and email of every user whose email ends in
@gmail.com.Solution
Why:
%matches any characters before@gmail.com, so any gmail address passes.5. Users with no country (the NULL trap)
Find the name of every user who has no country set.
Solution
Why: NULL means "unknown", so
= NULLnever matches. You must useIS NULL.6. The big one: parentheses matter
Find orders that are (
'paid'or'shipped') and have a total over 100. Group your conditions correctly.Solution
Why: without the parentheses,
ANDbinds tighter thanOR, and you would get every paid order plus only the shipped ones over 100. The parentheses force the grouping you actually want.You just played bouncer six times. Next stop is sorting and paging in Part 04. Keep going, you are doing great.