The sql order by clause is the only thing standing between a relational engine and the inconvenient truth that a table has no inherent order. The standard says so explicitly: a SELECT without ORDER BY returns a set, and the engine is free to hand you back rows in whatever sequence the planner finds cheapest — index scan order, hash-bucket order, parallel-worker arrival order, or the physical order of pages on disk after the last VACUUM. The moment you need a deterministic sequence — for pagination, for a report, for a leaderboard, for a window function that depends on row order — you have to say so.
That sounds easy until you start writing order by sql for real systems and meet the four traps that quietly break production queries: sql order by descending applies only to the column it touches (not the whole key list), NULL values sort in different positions on different engines, order by descending sql plus LIMIT can either be a zero-cost index walk or a full table sort depending on whether you indexed the column, and OFFSET-based pagination silently turns into a sequential scan once users page past row 10,000. This guide takes those four traps apart in detail, with worked interview answers, dialect-by-dialect notes for Postgres, MySQL, SQL Server, Snowflake, and BigQuery, and the keyset-pagination pattern every senior data engineering interview now expects you to reach for.
When you want hands-on reps immediately after reading, browse sorting practice library →, drill the ORDER BY + LIMIT drills →, and rehearse on SQL sorting problems →.
On this page
- Why ORDER BY is more subtle than it looks
- ASC vs DESC and the default behaviour
- NULLS FIRST / NULLS LAST across dialects
- Multi-column sorts and tie-breaking
- ORDER BY plus LIMIT, indexes, and performance
- Cheat sheet — ORDER BY recipes
- Frequently asked questions
- Practice on PipeCode
1. Why ORDER BY is more subtle than it looks
The relational rule: a table is a set; ORDER BY is the only thing that imposes a sequence
The one-sentence invariant: a SQL table has no order, a SQL result set has no order, and only a top-level ORDER BY produces a deterministic row sequence. Subqueries, CTEs, views, derived tables, and the inputs to set operations like UNION are all unordered — even if you wrote ORDER BY inside them, the optimiser may legally drop it. That subtlety is the seed of every "but it was sorted on my laptop" bug in junior code.
The four things ORDER BY actually does.
-
Imposes a total order on the final result set. The clause runs after
SELECTproduces rows, so it can reference output columns, aliases, and column positions. -
Drives
LIMIT/OFFSET/FETCH FIRSTsemantics. "Top 10 by revenue" is meaningless without anORDER BY revenue DESC—LIMIT 10over an unsorted set is "any 10 rows the engine felt like returning." -
Feeds window functions that are order-dependent.
ROW_NUMBER() OVER (ORDER BY created_at)andLAG() / LEAD()need a deterministic ordering inside the window. (Note: theORDER BYinside a window clause is unrelated to the outerORDER BYof the query.) -
Stabilises pagination. Cursor-based / keyset pagination is built directly on the
ORDER BYkey list — if the key isn't unique, page boundaries are non-deterministic.
Where ORDER BY fits in the logical execution order.
-
FROM— locate base tables. -
WHERE— row-level filtering. -
GROUP BY— collapse rows into groups. -
HAVING— group-level filtering. -
SELECT— project columns / aliases / expressions. -
ORDER BY— after SELECT, so you can sort by aliases. -
LIMIT/OFFSET— applied last.
That ordering is why order by in sql can reference column aliases (SELECT price * qty AS revenue FROM sales ORDER BY revenue DESC) in nearly every modern dialect — ORDER BY sees the projected columns. The handful of exceptions are old Oracle versions and a couple of niche embedded engines; assume the alias works on Postgres, MySQL 8+, SQL Server, Snowflake, BigQuery, DuckDB, ClickHouse, and Redshift.
Where ORDER BY silently does nothing.
-
Inside a CTE that is not the final query.
WITH x AS (SELECT … ORDER BY a) SELECT * FROM x— the inner ORDER BY may be discarded; only the outerSELECT * FROM x ORDER BY ais guaranteed. -
Inside a view. Postgres in particular warns that a view's
ORDER BYis a "hint" and is dropped when the planner inlines the view. -
Inside a subquery feeding
UNION/INTERSECT/EXCEPT. The set operation has no ordering input. The only meaningfulORDER BYis on the outer statement. -
Inside
INSERT … SELECTfor some engines. Postgres makes no guarantee that rows are physically inserted in the SELECT order; MySQL preserves it forMyISAMbut notInnoDB.
Dialect defaults at a glance.
-
Postgres / Snowflake / BigQuery / Oracle:
ASCdefault → NULLS LAST;DESCdefault → NULLS FIRST. (Postgres lets you flip with explicitNULLS FIRST/NULLS LAST.) -
MySQL / SQL Server / SQLite: NULL is treated as the smallest value →
ASCputs NULLs first,DESCputs NULLs last. No nativeNULLS FIRST/LASTsyntax (added only to recent Snowflake-aligned MySQL variants). -
DuckDB / Vertica / DB2: support
NULLS FIRST/LASTsyntax explicitly.
These differences look academic until you migrate a report from Postgres to SQL Server and discover that "top 10 unpaid invoices" suddenly shows a different first row because NULL paid_at values moved.
What interviewers listen for.
- Do you say "a table has no order" without prompting? — senior signal.
- Do you reach for an explicit
NULLS FIRST / NULLS LASTwhen NULLs are part of the column? — required. - Do you mention that
order by sqlcan reference aliases and computed expressions? — useful tutoring signal. - Do you flag
OFFSETpagination ageing as a known scaling pitfall, and propose keyset pagination? — distinguishing senior answer.
Worked example — does this query have a deterministic order?
Detailed explanation. Junior engineers often assume that a SELECT with a GROUP BY "comes out grouped" — and on small datasets it sometimes does, by accident, because the hash aggregate iterates buckets in insertion order. Move to a parallel plan, switch engines, or grow the data, and the apparent order vanishes. The only way to guarantee output order is ORDER BY on the outer query.
Question. Given the orders table below, which of the following queries has a guaranteed row order?
SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id;SELECT * FROM (SELECT * FROM orders ORDER BY id) AS o;SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id ORDER BY total DESC;
Input.
| id | customer_id | amount |
|---|---|---|
| 1 | 7 | 50 |
| 2 | 7 | 30 |
| 3 | 9 | 10 |
| 4 | 9 | 80 |
Code.
-- 1. Group only — undefined order
SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id;
-- 2. ORDER BY inside a derived table — engine may drop it
SELECT * FROM (SELECT * FROM orders ORDER BY id) AS o;
-- 3. Outer ORDER BY — deterministic
SELECT customer_id, SUM(amount) AS total
FROM orders
GROUP BY customer_id
ORDER BY total DESC;
Step-by-step explanation.
- Query 1 produces two rows but the engine never promised an order. On Postgres a hash aggregate could return
9, 7or7, 9depending on hash collisions and parallel-worker arrival. - Query 2's inner
ORDER BY idis not guaranteed to propagate. The standard treats the subquery as a set; the planner is allowed (and on many engines does) discard the ORDER BY. - Query 3 is the only deterministic option — the outer
ORDER BY total DESCis binding. If two customers tie ontotal, the order between them is still undefined; for a fully deterministic answer add a tie-breaker likeORDER BY total DESC, customer_id ASC.
Output.
| Query | Deterministic? | Why |
|---|---|---|
| 1 | No | no outer ORDER BY |
| 2 | No | inner ORDER BY may be dropped |
| 3 | Almost — needs tie-breaker | outer ORDER BY is binding; ties undefined |
Rule of thumb. Every query whose output is consumed by a human, a UI, a CSV download, or a paginated API should end with a top-level ORDER BY whose key list ends in a unique column (typically a primary key) — that final column eliminates ties.
SQL interview question on deterministic ordering
A senior interviewer often shapes this round as: "Here's a paginated leaderboard endpoint that occasionally repeats users across page boundaries. The query is SELECT user_id, score FROM leaderboard ORDER BY score DESC LIMIT 50 OFFSET ?. Why does it repeat, and how would you fix it without breaking the UI contract?"
The fix is to make the order by in sql key list unique by appending a tie-breaker. A single score value shared by ten users will let pages 2 and 3 reshuffle that block on every request — keyset pagination on (score, user_id) is the deterministic answer.
Solution Using ORDER BY score DESC, user_id DESC and a keyset cursor
-- v1: append a unique tie-breaker — fixes deterministic ordering even with OFFSET
SELECT user_id, score
FROM leaderboard
ORDER BY score DESC, user_id DESC
LIMIT 50 OFFSET 100;
-- v2: keyset / cursor-based pagination — stays O(log N) and resists row insertions
SELECT user_id, score
FROM leaderboard
WHERE (score, user_id) < (:cursor_score, :cursor_user_id)
ORDER BY score DESC, user_id DESC
LIMIT 50;
Step-by-step trace.
| Page | Cursor (score, user_id) | Predicate | Result block |
|---|---|---|---|
| 1 | (none) |
LIMIT 50 only |
top 50 rows ordered by (score DESC, user_id DESC) |
| 2 | (980, 9123) — last row of page 1 | (score, user_id) < (980, 9123) |
next 50 rows strictly after the cursor |
| 3 | (940, 1734) | (score, user_id) < (940, 1734) |
next 50 rows |
| New row inserted at score=970 between pages 2 and 3 | (940, 1734) | predicate unchanged | page 3 unaffected — the new row would have been on page 2 only if the cursor was older |
Output:
| Endpoint | Page 2 (OFFSET style) cost | Page 2 (keyset) cost | Behaviour under inserts |
|---|---|---|---|
| Leaderboard top 1000 | O(N) — engine scans 100 rows it then discards | O(log N + 50) — index seek + 50 scans | OFFSET shifts rows; keyset stable per cursor |
Why this works — concept by concept:
-
Unique tie-breaker — appending
user_idto the sort key collapses every tie group of length > 1 into a strict total order. Without it, the optimiser is free to return tied rows in any sequence each time the query runs. -
Tuple comparison —
(score, user_id) < (s, u)is a single tuple comparison; Postgres, MySQL 8+, SQLite, and DuckDB all rewrite it to the lexicographic equivalent so a(score DESC, user_id DESC)index can serve both the comparison and the sort with no row sort step. - Keyset vs OFFSET — OFFSET pagination scans-and-discards every row up to the offset; keyset uses the index to seek directly to the cursor and walks forward. The cost gap explodes past page 100.
- Insert resilience — keyset pagination is anchored to the last row's key, so new rows inserted after the cursor is captured do not shift page boundaries.
- Cost — OFFSET pagination = O(offset + page_size); keyset pagination = O(log N + page_size). For 10k+ rows of pagination depth, the gap is 100× to 10,000×.
SQL
Topic — order by + limit
ORDER BY + LIMIT and pagination problems (SQL)
Where ORDER BY is silently dropped — the four "looked sorted on my laptop" bugs
Junior engineers usually meet order by in sql through the obvious top-level SELECT … ORDER BY … form. The bugs start when ORDER BY appears inside something else — a CTE, a view, a subquery, an INSERT … SELECT — and the engine quietly drops it. Knowing exactly where ORDER BY is binding and where it isn't is the difference between a query that ships and a query that ships and still works after the planner upgrades.
Bug 1 — ORDER BY inside a CTE.
WITH ranked AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM logins
ORDER BY created_at DESC -- this is dropped
)
SELECT * FROM ranked WHERE rn = 1;
The ORDER BY inside the CTE is removed by the planner. The good news is the ROW_NUMBER() OVER (… ORDER BY created_at DESC) is independent of the outer ORDER BY — window functions have their own ordering. The bug fires only when someone assumes the CTE's rows arrive in the outer query in sorted order.
Bug 2 — ORDER BY in a view.
Postgres explicitly warns "ORDER BY in a view is a hint; it may be dropped." The fix is to put ORDER BY on the outer query that selects from the view, not on the view definition itself.
Bug 3 — ORDER BY in a subquery feeding UNION.
(SELECT id, name FROM users WHERE region = 'APAC' ORDER BY name)
UNION ALL
(SELECT id, name FROM users WHERE region = 'EMEA' ORDER BY name);
The inner ORDER BYs are decoration. UNION ALL outputs rows in whatever order is cheapest. To get a final sort, wrap the whole thing: SELECT * FROM (… UNION ALL …) t ORDER BY name.
Bug 4 — INSERT … SELECT … ORDER BY.
Most engines do not guarantee that rows are inserted in the SELECT's order. Sequential primary keys may not match the ORDER BY order. The only guarantee is the final SELECT's order if you re-query with an ORDER BY.
Worked example — surface the row's rank correctly across a CTE
Detailed explanation. A common pattern: select the latest record per user. The naive impulse is to ORDER BY inside a CTE and pick the first row per user. The robust pattern uses ROW_NUMBER() because the window function's own ORDER BY is binding, while a CTE-level ORDER BY may be dropped.
Question. From a logins table, write a query that returns the most recent login row per user_id, plus the running rank of each login per user. The result must be stable on Postgres, MySQL 8+, and Snowflake.
Input.
| user_id | login_at |
|---|---|
| 7 | 2026-05-25 09:00 |
| 7 | 2026-05-27 14:00 |
| 7 | 2026-05-26 08:00 |
| 9 | 2026-05-22 10:00 |
| 9 | 2026-05-28 11:00 |
Code.
WITH ranked AS (
SELECT
user_id,
login_at,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_at DESC) AS rn
FROM logins
)
SELECT user_id, login_at
FROM ranked
WHERE rn = 1
ORDER BY user_id ASC;
Step-by-step explanation.
- Inside the CTE the window's ORDER BY ranks each user's logins from newest to oldest —
rn = 1is the most recent. - There is intentionally no outer-CTE ORDER BY — adding one would be ignored anyway, and it would mislead the reader.
- The outer
SELECTfilters down to one row per user, then imposes a final, binding ORDER BY onuser_idto produce a deterministic result set. - Using
ROW_NUMBER()rather thanRANK()guarantees exactly one row per user even when two logins land on the same timestamp; add a unique tie-breaker (e.g.login_id ASC) inside the OVER clause if your data permits ties.
Output.
| user_id | login_at |
|---|---|
| 7 | 2026-05-27 14:00 |
| 9 | 2026-05-28 11:00 |
Rule of thumb. Put the ORDER BY where it binds: inside OVER (… ORDER BY …) for window functions, and on the outermost SELECT for the final result. Anywhere else, treat it as decoration that the planner is free to discard.
2. ASC vs DESC and the default behaviour
sql order by descending is per-column, not per-key-list — every column needs its own DESC
The single most common ORDER BY bug is reading ORDER BY a, b DESC as "sort both columns descending" — the actual semantics are "sort by a ascending (the default), then within each tie, sort by b descending." DESC binds only to the column immediately before it, exactly like a unary postfix operator. To sort every column descending you have to write each DESC explicitly.
The minimum rules for ASC and DESC.
-
ASCis the default.ORDER BY ais identical toORDER BY a ASC. -
DESCis per-column.ORDER BY a, b DESC≡ORDER BY a ASC, b DESC. To reverse both, writeORDER BY a DESC, b DESC. -
Direction interacts with collation. Strings sort according to the column's collation —
'A'and'a'may compare equal under case-insensitive collations and unequal under binary collations. -
Direction does not affect equality.
ORDER BY x DESCdoes not change howx = yis evaluated elsewhere; it only flips the comparator inside the sort.
The three traps every junior engineer hits at least once.
-
The DESC-binds-to-one trap.
ORDER BY a, b DESC→ "I asked for both DESC, but only b is." -
The stringly-typed-number trap.
ORDER BY id_texton aVARCHARcolumn with values'1', '2', '10', '11'yields'1', '10', '11', '2'— the lexicographic order. Fix:ORDER BY CAST(id_text AS INTEGER)or fix the schema. -
The collation trap.
ORDER BY nameon a Postgres column withCcollation sorts'Apple', 'apple', 'banana'as'Apple', 'apple', 'banana'(uppercase first); withen_US.utf8it sorts case-insensitively as'Apple', 'apple', 'banana'or'apple', 'Apple', 'banana'depending on locale ties.
Computed expressions in ORDER BY.
You can ORDER BY any expression, not just a bare column. The engine evaluates the expression after the SELECT projection and sorts on the result. This is how you build:
-
Case-driven custom order.
ORDER BY CASE WHEN status = 'urgent' THEN 0 WHEN status = 'normal' THEN 1 ELSE 2 END, created_at DESC. -
Sort by length, hash, modulo.
ORDER BY LENGTH(name),ORDER BY MOD(user_id, 7),ORDER BY MD5(email)(deterministic shuffle for sampling). -
Sort by alias.
SELECT price * quantity AS revenue … ORDER BY revenue DESC. -
Sort by column position.
ORDER BY 1, 2— concise but fragile; if someone re-orders the SELECT list, the ORDER BY silently changes. Acceptable in throwaway queries, dangerous in stored procedures.
Direction with mixed columns — the worked patterns.
| Goal | Correct ORDER BY |
|---|---|
| Newest first, then alphabetical name | created_at DESC, name ASC |
| Highest revenue, ties broken oldest first | revenue DESC, created_at ASC, id ASC |
| Status priority, then SLA breach, then id | CASE status WHEN 'urgent' THEN 0 WHEN 'normal' THEN 1 ELSE 2 END, sla_breach_at ASC, id ASC |
| Lexicographic reverse of name |
name DESC (binary collation) or REVERSE(name) ASC
|
Worked example — fix a "both columns descending" bug
Detailed explanation. Production-bug story: a paginated table view was supposed to show "most recent first, highest priority first." The reviewer wrote ORDER BY created_at, priority DESC and shipped. The list looked roughly right because most recent rows happened to also be high priority — until a backlog of week-old high-priority tickets quietly drifted off the first page. The cause was the DESC binding only to priority; created_at was still ascending.
Question. Given the tickets table, write the ORDER BY clause that sorts by created_at descending then priority descending (1 = highest), so the most recent high-priority ticket appears first.
Input.
| id | created_at | priority | summary |
|---|---|---|---|
| 1 | 2026-05-25 09:00 | 1 | DB outage |
| 2 | 2026-05-26 11:00 | 2 | UI bug |
| 3 | 2026-05-26 14:00 | 1 | API 500 |
| 4 | 2026-05-27 09:00 | 3 | typo |
| 5 | 2026-05-27 10:00 | 1 | payment fail |
Code.
-- WRONG — only priority is DESC; created_at stays ASC
SELECT * FROM tickets ORDER BY created_at, priority DESC;
-- RIGHT — both columns explicitly DESC
SELECT * FROM tickets ORDER BY created_at DESC, priority DESC;
Step-by-step explanation.
- In the wrong version,
ORDER BY created_at, priority DESCis parsed ascreated_at ASC, priority DESC. The oldest row (id=1) appears first. - In the right version,
created_at DESCputs the newest day first; within the newest day,priority DESC(where 1 = highest priority means 1 < 3) actually puts priority=3 first. Watch for this — if "priority" is a small-is-high-priority numeric value you usually wantpriority ASC("smallest priority number first"). - Re-reading the requirement — "highest priority first" with
priority=1= highest — the correct sort iscreated_at DESC, priority ASC, id ASC.
Output (corrected).
| id | created_at | priority |
|---|---|---|
| 5 | 2026-05-27 10:00 | 1 |
| 4 | 2026-05-27 09:00 | 3 |
| 3 | 2026-05-26 14:00 | 1 |
| 2 | 2026-05-26 11:00 | 2 |
| 1 | 2026-05-25 09:00 | 1 |
Rule of thumb. When the column semantics are "smaller number = better" (priority 1 = highest, rank 1 = top), use ASC. The instinct to write DESC for "I want the best first" is the bug — sort direction must match the numeric semantics, not the English semantics.
SQL interview question on string sorting
The probe usually sounds like: "We have a version column of type VARCHAR with values like '1.2.0', '1.10.1', '1.2.3'. The product team complains that ORDER BY version returns 1.10.1 before 1.2.3. Why, and how do you fix it?"
This is the lexicographic-vs-natural-sort trap. SQL string sort is character-by-character on the collation's comparator — '1.10.1' and '1.2.3' are compared character 1 ('1' = '1'), character 2 ('.' = '.'), character 3 ('1' < '2') — so '1.10.1' sorts first.
Solution Using STRING_TO_ARRAY + numeric cast
-- Postgres-native: parse and cast the version string to an int array, sort on the array
SELECT id, version
FROM releases
ORDER BY
string_to_array(version, '.')::int[], -- natural-sort
id ASC; -- unique tie-breaker
Step-by-step trace.
| version (string) | string_to_array → int[] | sort key |
|---|---|---|
1.2.0 |
{1, 2, 0} |
(1, 2, 0) |
1.10.1 |
{1, 10, 1} |
(1, 10, 1) |
1.2.3 |
{1, 2, 3} |
(1, 2, 3) |
2.0.0 |
{2, 0, 0} |
(2, 0, 0) |
After the array sort:
-
(1, 2, 0)— major=1, minor=2, patch=0. -
(1, 2, 3)— same major+minor, patch=3 wins over patch=0. -
(1, 10, 1)— major=1, minor=10 sorts after minor=2 because integers, not strings. -
(2, 0, 0)— different major bucket.
Output:
| id | version |
|---|---|
| 11 | 1.2.0 |
| 12 | 1.2.3 |
| 19 | 1.10.1 |
| 41 | 2.0.0 |
Why this works — concept by concept:
- Natural-sort vs lexicographic sort — version strings are dotted numeric tuples, not free text. Comparing them character-by-character mis-orders any minor that crosses the 10/100 boundary. Parsing them into integer arrays restores the natural semantics.
-
Array comparison in Postgres —
int[]arrays compare element-by-element (lexicographic on integers), which is exactly the multi-column tuple comparison you want for(major, minor, patch). -
Schema-level fix — if you control the schema, store
(major INT, minor INT, patch INT)columns separately andORDER BY major, minor, patch. The query-timestring_to_arraycast is a workaround for tables you don't own. -
Cross-engine equivalents — MySQL:
ORDER BY CAST(SUBSTRING_INDEX(version, '.', 1) AS UNSIGNED), CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(version, '.', 2), '.', -1) AS UNSIGNED), …. SQL Server:PARSENAME(REPLACE(version, '.', ' '), …). Snowflake / BigQuery:SPLIT(version, '.')then cast each element. - Cost — natural sort is O(N log N) on K-element arrays = O(N · K · log N); for typical version strings K ≤ 4, so essentially linear-log.
SQL
Topic — sorting (SQL)
Sorting problems (SQL)
Collation and case sensitivity — when "alphabetical" disagrees with itself
ORDER BY name looks innocent, but in production it interacts with the column's collation: the rule the engine uses to compare characters. Two sound-alike rows can sort in a different relative position on different engines, or even on the same engine after a locale upgrade. Knowing the collation surface is the difference between a stable report and a list that subtly reshuffles after the next OS / glibc upgrade.
The collation triad.
- Character set — the alphabet (ASCII, Latin-1, UTF-8). Determines which bytes are legal.
-
Collation — the comparison rule (
C/en_US.utf8/de_DE.utf8/ja_JP). Determines comparison order. -
Case sensitivity — a property of the collation.
Cis case-sensitive (uppercase before lowercase by ASCII code).en_US.utf8is locale-aware and dictionary-style (typically case-insensitive on Postgres for ties).
Sorting strings — the four common gotchas.
-
Case order flips by collation.
'Apple' < 'apple'onC, but'Apple' = 'apple'on a case-insensitive collation. -
Accent-folding.
'café' < 'cafz'onC(because theébyte is high). On a locale-aware collation,'café' < 'cafz'may compare like'cafe' < 'cafz'— same direction but for different reasons. -
Diacritic order varies by language. Swedish puts
'å'after'z', German puts'ä'near'a'. -
Numbers as strings. Already covered:
'1', '10', '11', '2'lexicographic; cast to integer to fix.
Per-query collation overrides.
Postgres lets you ORDER BY with an explicit collation:
-- Force binary (byte-wise) compare
SELECT name FROM users ORDER BY name COLLATE "C";
-- Force English locale-aware compare
SELECT name FROM users ORDER BY name COLLATE "en_US.utf8";
SQL Server uses COLLATE Latin1_General_BIN or COLLATE Latin1_General_CI_AS. MySQL uses ORDER BY name COLLATE utf8mb4_bin. The principle is identical across engines: collation is a property of the comparison, and it can be overridden per query.
Worked example — case-insensitive sort with deterministic tie-break
Detailed explanation. A dropdown of "tags" must sort case-insensitively ('JavaScript' and 'javascript' should appear next to each other), but the output must be stable across query runs (so a UI snapshot test doesn't flake). Lowercasing inside ORDER BY plus a unique tie-breaker is the cleanest pattern.
Question. Given the tags table, write an ORDER BY that sorts tags case-insensitively in ascending order, with a deterministic tie-break.
Input.
| id | name |
|---|---|
| 1 | JavaScript |
| 2 | python |
| 3 | Java |
| 4 | javascript |
| 5 | Python |
Code.
SELECT id, name
FROM tags
ORDER BY LOWER(name) ASC, id ASC;
Step-by-step explanation.
-
LOWER(name)normalises every row's name to lowercase for comparison purposes only — the SELECT still returns the original mixed-casename. - The sort happens on the lowercased string, so
'JavaScript'and'javascript'are equal and'Java'<'JavaScript'<'Python'<'python'. - Wait —
'JavaScript'and'javascript'both lowercase to'javascript', so they tie. Theid ASCtie-breaker resolves: id=1 (JavaScript) before id=4 (javascript). - The same logic applies to the two
Pythonrows. - A locale-aware alternative is
ORDER BY name COLLATE "en_US.utf8"on Postgres orORDER BY name COLLATE utf8mb4_general_cion MySQL — same result, different implementation.
Output.
| id | name |
|---|---|
| 3 | Java |
| 1 | JavaScript |
| 4 | javascript |
| 2 | python |
| 5 | Python |
Rule of thumb. When user-visible text might mix cases, ORDER BY on LOWER() (or a case-insensitive collation) plus a unique tie-breaker. The collation approach is faster (no function call) when an expression index exists; the LOWER() approach is more portable.
3. NULLS FIRST / NULLS LAST across dialects
Where NULL sorts is a dialect decision, and the answer is different on Postgres, MySQL, and SQL Server
NULL is not a value — it's the absence of a value — which means equality comparisons against NULL are themselves NULL (NULL = NULL is not true). For sorting, every engine had to make a pragmatic decision: pick a position. They picked differently, and the choice is part of the SQL personality of each dialect.
The dialect matrix in five bullets.
-
PostgreSQL. Default behaviour:
ASC→ NULLS LAST;DESC→ NULLS FIRST. ExplicitNULLS FIRST/NULLS LASTsyntax is fully supported. Treats NULL as "infinitely large" by default. -
MySQL (8.0+). Default behaviour: NULL is treated as smaller than any value →
ASCputs NULLs first;DESCputs NULLs last. No nativeNULLS FIRST / NULLS LASTsyntax (the recent 8.x line accepts the keywords only on certain function returns, not on plain columns). -
SQL Server. Default behaviour: NULL is the smallest value →
ASCputs NULLs first;DESCputs NULLs last. NoNULLS FIRST / NULLS LASTkeywords. UseCASE WHEN col IS NULL THEN 1 ELSE 0 ENDorIIF(col IS NULL, ...)as a workaround. -
Snowflake / BigQuery. Default behaviour:
ASC→ NULLS LAST;DESC→ NULLS FIRST, with fullNULLS FIRST / NULLS LASTsyntax. Same direction as Postgres. -
Oracle / DB2. Default behaviour and syntax match Postgres:
ASC→ NULLS LAST default; explicitNULLS FIRST/NULLS LASTworks.
The universally-portable workaround.
ORDER BY col IS NULL, col — works on every dialect because IS NULL returns a boolean (true / false), which sorts false (0) before true (1). So:
-
ORDER BY col IS NULL, col ASC→ non-NULLs ascending, NULLs last. -
ORDER BY col IS NULL DESC, col DESC→ NULLs first, non-NULLs descending. -
ORDER BY col IS NOT NULL, col DESC→ NULLs first, non-NULLs descending (alternate phrasing).
This is the safe pattern for cross-dialect queries, ORM-generated SQL that may run on multiple backends, and migration windows where you're moving from one engine to another.
Common NULL traps in ORDER BY.
-
Joining sparse columns. A
LEFT JOINagainstpaymentsproducespaid_at = NULLfor unpaid orders.ORDER BY paid_at DESCon Postgres puts NULLs first — your "most recent payment" report leads with unpaid orders. Fix:ORDER BY paid_at DESC NULLS LAST. -
NULL in a pagination key. Keyset pagination falls over if any sort-key column can be NULL — the
(col, id) < (?, ?)tuple comparison treats NULL specially. Either filter NULLs out, or useCOALESCE(col, sentinel). -
NULL in a window function ORDER BY.
ROW_NUMBER() OVER (ORDER BY last_login)numbers users with NULLlast_loginas either the lowest or highest depending on engine — verify before relying on it.
The portable patterns side-by-side.
| Goal | Postgres / Snowflake / BigQuery | MySQL | SQL Server | Universal |
|---|---|---|---|---|
| NULLs last, ascending | ORDER BY col ASC NULLS LAST |
ORDER BY col IS NULL, col ASC |
ORDER BY CASE WHEN col IS NULL THEN 1 ELSE 0 END, col ASC |
ORDER BY col IS NULL, col ASC |
| NULLs first, descending | ORDER BY col DESC NULLS FIRST |
ORDER BY col IS NULL DESC, col DESC |
ORDER BY CASE WHEN col IS NULL THEN 0 ELSE 1 END, col DESC |
ORDER BY col IS NULL DESC, col DESC |
| NULLs first, ascending | ORDER BY col ASC NULLS FIRST |
ORDER BY col ASC (NULL is smallest) |
ORDER BY col ASC (NULL is smallest) |
ORDER BY col IS NULL DESC, col ASC |
| NULLs last, descending | ORDER BY col DESC NULLS LAST |
ORDER BY col DESC (NULL is smallest, so last in DESC) |
ORDER BY col DESC (same) |
ORDER BY col IS NULL, col DESC |
Worked example — "most recent payment, blanks at the bottom"
Detailed explanation. A reporting query joins orders with payments and shows orders sorted by paid_at descending. Some orders haven't been paid yet, so paid_at IS NULL. On Postgres the default would place those NULL rows first, which is the opposite of what the reviewer wants.
Question. Write the ORDER BY clause so that paid orders appear first, most-recently-paid at the top, and unpaid orders sink to the bottom, on Postgres, MySQL, and as a portable variant.
Input.
| id | paid_at |
|---|---|
| 1 | 2026-05-27 |
| 2 | NULL |
| 3 | 2026-05-25 |
| 4 | NULL |
| 5 | 2026-05-26 |
Code.
-- Postgres / Snowflake / BigQuery / Oracle — explicit NULLS LAST
SELECT id, paid_at
FROM orders
ORDER BY paid_at DESC NULLS LAST, id ASC;
-- MySQL — no NULLS LAST syntax; use the IS NULL trick
SELECT id, paid_at
FROM orders
ORDER BY paid_at IS NULL, paid_at DESC, id ASC;
-- SQL Server — no NULLS LAST; use a CASE expression
SELECT id, paid_at
FROM orders
ORDER BY
CASE WHEN paid_at IS NULL THEN 1 ELSE 0 END,
paid_at DESC,
id ASC;
Step-by-step explanation.
- The Postgres version is the cleanest.
paid_at DESC NULLS LASTflips the engine's default (which would put NULLs first in DESC). - The MySQL version uses
paid_at IS NULLas a boolean — false (0) sorts before true (1), so non-NULLs come first. - The SQL Server version is functionally identical to MySQL —
CASE WHEN col IS NULL THEN 1 ELSE 0 ENDis a 0/1 sort key prepended to the real sort. - The unique tie-breaker
id ASCat the end ensures a deterministic order between any two orders paid on the same day.
Output.
| id | paid_at | position |
|---|---|---|
| 1 | 2026-05-27 | 1 (most recent paid) |
| 5 | 2026-05-26 | 2 |
| 3 | 2026-05-25 | 3 |
| 2 | NULL | 4 (NULLs LAST) |
| 4 | NULL | 5 |
Rule of thumb. Whenever you ORDER BY a column that can be NULL, ask the requirement explicitly: "Where do the NULLs go?" — and write NULLS LAST / NULLS FIRST (or the IS NULL workaround) accordingly. Default behaviour is dialect-dependent and is the single most common source of "the report changed when we migrated" bugs.
SQL interview question on NULL positioning across dialects
A senior probe sounds like: "We're moving a report from MySQL to Postgres. The query has ORDER BY last_login DESC LIMIT 100. The product team reports the top of the list has changed. Explain why and write a portable fix."
The cause is the dialect difference: MySQL treats NULL as the smallest value, so last_login DESC puts NULLs last. Postgres treats NULL as the largest by default, so last_login DESC puts NULLs first. Suddenly the top of the list is full of last_login = NULL (i.e. users who never logged in) instead of the recently-active users the report wanted.
Solution Using a portable IS NULL sort prefix
-- Portable form — works identically on Postgres, MySQL, SQL Server, Snowflake, BigQuery
SELECT user_id, last_login
FROM users
ORDER BY
last_login IS NULL, -- false (0) first → non-NULLs lead
last_login DESC,
user_id ASC -- unique tie-breaker
LIMIT 100;
Step-by-step trace.
| user_id | last_login | IS NULL → | sort key |
|---|---|---|---|
| 7 | 2026-05-27 09:00 | 0 | (0, 2026-05-27 09:00) |
| 9 | NULL | 1 | (1, NULL) |
| 4 | 2026-05-27 12:00 | 0 | (0, 2026-05-27 12:00) |
| 2 | 2026-05-26 08:00 | 0 | (0, 2026-05-26 08:00) |
| 11 | NULL | 1 | (1, NULL) |
After sorting by the two-key tuple (IS NULL ASC, last_login DESC, user_id ASC):
- All rows with
IS NULL = 0come first, sorted bylast_login DESC. - Within those, ties on
last_login(none here) would be resolved byuser_id ASC. - The two
last_login = NULLrows come last, ordered byuser_id ASCbecause all other keys are equal.
Output:
| user_id | last_login |
|---|---|
| 4 | 2026-05-27 12:00 |
| 7 | 2026-05-27 09:00 |
| 2 | 2026-05-26 08:00 |
| 9 | NULL |
| 11 | NULL |
Why this works — concept by concept:
-
Boolean as sort key —
IS NULLreturns a boolean; every modern engine sorts booleans as 0/1 (or false/true), and false sorts before true. That gives you a free, portable "NULLs last" prefix. -
Decoupling NULL placement from value direction — the
IS NULLterm decides position (last / first); thelast_login DESCterm decides direction. They no longer fight each other. -
Deterministic tie-break —
user_id ASCon the end guarantees a stable, repeatable order even across multiple NULL rows. -
Portability — the same query runs identically on every dialect that supports
IS NULL, which is all of them. No more migration surprises. -
Cost — adding a leading boolean key barely affects sort cost — it's a single byte of comparison per row. The dominant cost is still the index scan or external sort on
last_login.
SQL
Topic — NULL handling (SQL)
NULL handling and ordering (SQL)
COALESCE as a NULL-aware sort sentinel — when you need NULLs to participate in the comparison
The IS NULL boolean prefix is the cleanest way to position NULLs at one end. The other pattern is COALESCE(col, sentinel) — replacing NULLs with a chosen value so they sort inside the value range rather than at one end. This is useful when "missing" should mean "treat as default value X" (zero, infinity, today, end-of-time).
When to reach for which.
-
NULLs at the end / start (no value semantics). Use
IS NULLboolean prefix orNULLS FIRST/LAST. Cleanest. -
NULLs should behave like a specific value. Use
COALESCE(col, sentinel). Lets the NULL "blend in" with the real values. -
NULL means "earliest possible".
COALESCE(start_date, '1900-01-01')— places NULL-start records ahead of everything else. -
NULL means "latest possible / not yet expired".
COALESCE(end_date, '9999-12-31')— places NULL-end records after every real end date.
Example pattern.
-- Sort subscriptions by end_date ascending; NULL means "not expired" → treat as far future
SELECT subscription_id, end_date
FROM subscriptions
ORDER BY COALESCE(end_date, DATE '9999-12-31') ASC, subscription_id ASC;
This is functionally equivalent to ORDER BY end_date ASC NULLS LAST on Postgres, but stays consistent on engines without that syntax. It is also the right pattern when "NULL" carries a domain meaning (active subscription, infinite end date) that you want to make explicit in the sort.
Worked example — sort users by subscription status with a sensible NULL default
Detailed explanation. A reporting view of subscriptions: paying customers first (sorted by amount descending), then trial users (no amount yet, but they have a trial_started_at), then churned customers (canceled_at IS NOT NULL). NULL amount for trials must place them after paying customers, not before.
Question. Sort subscriptions so paying customers appear first by amount descending, then trial customers (NULL amount) by trial_started_at ascending, then canceled customers last by canceled_at descending. Make the final order deterministic.
Input.
| id | status | amount | trial_started_at | canceled_at |
|---|---|---|---|---|
| 1 | paying | 99 | NULL | NULL |
| 2 | paying | 199 | NULL | NULL |
| 3 | trial | NULL | 2026-05-25 | NULL |
| 4 | trial | NULL | 2026-05-26 | NULL |
| 5 | canceled | NULL | NULL | 2026-05-20 |
| 6 | canceled | NULL | NULL | 2026-05-22 |
Code.
SELECT id, status, amount, trial_started_at, canceled_at
FROM subscriptions
ORDER BY
CASE status
WHEN 'paying' THEN 0
WHEN 'trial' THEN 1
WHEN 'canceled' THEN 2
ELSE 3
END,
amount DESC NULLS LAST,
trial_started_at ASC NULLS LAST,
canceled_at DESC NULLS LAST,
id ASC;
Step-by-step explanation.
- The
CASEexpression maps the categoricalstatusto a numeric priority bucket. Paying customers (bucket 0) sort before trial customers (bucket 1) and canceled customers (bucket 2). - Within each bucket only one of the remaining keys is meaningful — paying rows have non-NULL
amount, trial rows have non-NULLtrial_started_at, canceled rows have non-NULLcanceled_at.NULLS LASTensures NULLs in a column don't drift ahead of the meaningful values. - The final
id ASCis the deterministic tie-breaker; without it, two paying customers with the sameamountwould be in undefined order. - On engines without
NULLS LAST(MySQL, SQL Server), substituteIS NULLboolean prefix per term — e.g.amount IS NULL, amount DESC— but readability suffers.
Output.
| id | status | amount | trial_started_at | canceled_at |
|---|---|---|---|---|
| 2 | paying | 199 | NULL | NULL |
| 1 | paying | 99 | NULL | NULL |
| 3 | trial | NULL | 2026-05-25 | NULL |
| 4 | trial | NULL | 2026-05-26 | NULL |
| 6 | canceled | NULL | NULL | 2026-05-22 |
| 5 | canceled | NULL | NULL | 2026-05-20 |
Rule of thumb. When a column has a domain-specific meaning for NULL (active / infinite / unknown / pending), encode that meaning either in a CASE bucket (categorical) or a COALESCE sentinel (continuous). Don't ship a sort that depends on the engine's default NULL position when the data has a real intent.
4. Multi-column sorts and tie-breaking
Multi-column ORDER BY is a cascade — later keys only matter when earlier keys tie
The mental model in one line: each column in the ORDER BY list resolves the ties left by the columns before it. Until ties exist, later columns are irrelevant; once a tie group forms, the next column gets a chance. The full sort is stable across the whole key list as long as the last column produces no ties (i.e. is unique).
The five sort-list patterns every interview probes.
-
Lexicographic on a tuple.
ORDER BY country, city, name— strict left-to-right tie resolution. -
Mixed directions.
ORDER BY rating DESC, price ASC, created_at DESC— each column has its own direction; they're independent. -
Computed expressions.
ORDER BY CASE WHEN status='urgent' THEN 0 ELSE 1 END, sla_at— case-driven priority then ascending SLA. -
Column position.
ORDER BY 1, 2 DESC— same semantics, terse, fragile. -
Stable pagination.
ORDER BY created_at DESC, id DESC— always end the key list with a unique column for cursor stability.
Why "stable sort" is not a thing you can rely on.
Computer scientists talk about stable sorts (rows that compare equal preserve their input order) and unstable sorts (no such guarantee). SQL engines pick algorithms based on the planner's cost model — they may use a parallel quicksort (unstable), an external merge sort (stable), or an in-memory sort (depends on implementation). Two consecutive runs of the same query can therefore return tied rows in different orders. The portable fix is to make ties impossible by adding a unique tie-breaker column (typically the primary key) as the last sort term.
Tie-breakers and pagination — the cursor invariant.
Cursor-based pagination keeps a "last seen row's sort key" and queries WHERE (sort_keys) < (cursor). If the sort key list is not unique, the cursor straddles ties — page 2 may re-show rows from page 1, or skip rows entirely. The rule is non-negotiable: the last column in the ORDER BY list must be unique.
Composable case-when patterns.
You can drive arbitrary priority orders with CASE expressions inside the ORDER BY:
-- "Active first, then suspended, then deleted; within each, by latest activity"
ORDER BY
CASE status
WHEN 'active' THEN 0
WHEN 'suspended' THEN 1
ELSE 2
END,
last_active_at DESC,
user_id ASC;
This is functionally equivalent to mapping status to a numeric priority, then sorting numerically. It composes cleanly with DESC per term, and the engine evaluates the CASE once per row during the sort.
Worked example — three-key cascade with mixed direction
Detailed explanation. A common report: "Top sellers per region, by revenue, with newest-first as the inner tie-break." The natural key is (region, revenue, created_at) — but the directions differ per column: region ascending (alphabetical), revenue descending, created_at descending, and a final unique id to keep things deterministic.
Question. Given the sellers table, write the ORDER BY clause that produces "region ascending, revenue descending, then most recent first, then deterministic by id."
Input.
| id | region | revenue | created_at |
|---|---|---|---|
| 1 | APAC | 100 | 2026-05-10 |
| 2 | APAC | 100 | 2026-05-12 |
| 3 | EMEA | 200 | 2026-05-09 |
| 4 | EMEA | 150 | 2026-05-15 |
| 5 | APAC | 80 | 2026-05-13 |
Code.
SELECT id, region, revenue, created_at
FROM sellers
ORDER BY
region ASC,
revenue DESC,
created_at DESC,
id ASC;
Step-by-step explanation.
-
region ASCproduces two buckets:APAC(ids 1, 2, 5) andEMEA(ids 3, 4). - Inside the
APACbucket,revenue DESCranks revenue=100 above revenue=80. So far the order isAPAC/100(ids 1, 2),APAC/100(tie), thenAPAC/80. - The two
APAC/100rows tie on revenue.created_at DESCresolves: id=2 (2026-05-12) before id=1 (2026-05-10). - The final
id ASCdoesn't change anything in this dataset (no remaining ties) but is the safety net. -
EMEAbucket: only two rows, revenue=200 beats revenue=150. No ties.
Output.
| position | id | region | revenue | created_at |
|---|---|---|---|---|
| 1 | 2 | APAC | 100 | 2026-05-12 |
| 2 | 1 | APAC | 100 | 2026-05-10 |
| 3 | 5 | APAC | 80 | 2026-05-13 |
| 4 | 3 | EMEA | 200 | 2026-05-09 |
| 5 | 4 | EMEA | 150 | 2026-05-15 |
Rule of thumb. Read the requirement aloud; for every "then" in English add a column to the ORDER BY. End with a unique column — there is no exception worth defending.
SQL interview question on stable cursor pagination
A senior probe: "Our infinite-scroll feed sometimes shows the same post twice. The query is SELECT * FROM posts ORDER BY popularity DESC LIMIT 50 OFFSET ?. Walk through why it duplicates and write the keyset-pagination fix."
The duplicate root cause is that popularity is not unique — hundreds of posts share popularity = 500. Between page 1 and page 2 the engine returns the tied block in different orders (because sort is not stable across runs), so a row in position 50 on page 1 can re-appear in position 1 on page 2. The fix is a unique tie-breaker and keyset pagination on the tuple.
Solution Using (popularity, id) keyset cursor
-- Page 1
SELECT id, title, popularity
FROM posts
ORDER BY popularity DESC, id DESC
LIMIT 50;
-- Page N (anchored to the last row of the previous page)
SELECT id, title, popularity
FROM posts
WHERE (popularity, id) < (:cursor_popularity, :cursor_id)
ORDER BY popularity DESC, id DESC
LIMIT 50;
Step-by-step trace.
| Page | Cursor (popularity, id) | Predicate | Rows returned |
|---|---|---|---|
| 1 | (none) | none | top 50 by (popularity DESC, id DESC)
|
| 2 | (500, 9123) — last row of page 1 | (popularity, id) < (500, 9123) |
next 50 strictly less than the cursor |
| 3 | (480, 1734) — last row of page 2 | (popularity, id) < (480, 1734) |
next 50 |
The tuple comparison (popularity, id) < (P, I) is shorthand for "popularity < P OR (popularity = P AND id < I)" — exactly the lexicographic break the multi-column sort defines.
Output:
| Pagination strategy | Duplicate / skip risk | Deepest-page cost | Resilient to inserts? |
|---|---|---|---|
LIMIT 50 OFFSET 1000 (no tie-breaker) |
high — ties reorder | O(offset + page) | no — new rows shift pages |
LIMIT 50 OFFSET 1000 + id DESC tie-breaker |
none | O(offset + page) | no — same shift |
Keyset (popularity, id) < (?, ?)
|
none | O(log N + page) | yes — cursor anchored |
Why this works — concept by concept:
-
Unique tie-breaker — appending
idto the sort key collapses everypopularitytie group to a strict total order. Two consecutive runs return rows in the same order. -
Tuple comparison —
(popularity, id) < (P, I)expresses the lexicographic compare exactly; Postgres rewrites it to use a(popularity DESC, id DESC)index seek in one step. -
Index alignment — a B-tree on
(popularity DESC, id DESC)can serve the query with a single seek to the cursor plus a 50-row forward walk. No sort step, no temp buffer. - Cursor stability under inserts — rows inserted after the cursor's popularity / id pair fall to a later page automatically; rows inserted before the cursor would have appeared on earlier pages — they don't retroactively shift the current page boundary.
- Cost — keyset pagination is O(log N + page_size) regardless of page depth, vs OFFSET pagination's O(offset + page_size) which degrades linearly with depth.
SQL
Topic — pagination
Pagination problems (SQL, keyset cursors)
ORDER BY by column position — concise, but treat it like a footgun
ORDER BY 1, 2 DESC references the column position in the SELECT list rather than the column name. It is part of the SQL standard, supported on every modern engine, and saves typing — but it is fragile in production code and a known reviewer red flag in long-lived queries.
Why it tempts people.
-
Short.
ORDER BY 1, 2is shorter thanORDER BY product_category, sales_total. - No alias collisions. When SELECT contains computed expressions, positional ORDER BY avoids the need to alias them.
- Quick exploration. Ad-hoc analytics in a notebook benefit from terseness.
Why production code avoids it.
-
Reordering SELECT silently changes the sort. Someone reorders the SELECT list for readability —
ORDER BY 1, 2now sorts on completely different columns. No syntax error, just wrong results. - GROUP BY by position has the same brittleness. Mixing positional GROUP BY and positional ORDER BY multiplies the risk.
-
Linters / formatters often discourage it. SQLFluff defaults to flagging it;
sqlformatadds a warning comment.
The two production-acceptable uses are:
-
Computed expressions you do not want to alias.
SELECT a + b, c FROM t ORDER BY 1 DESC— the expression is unique to the query, and aliasing it just to reference it in ORDER BY adds noise. - Throwaway analytics in a notebook. Anything you wouldn't commit to git.
For long-lived queries, always sort by name or by alias.
Worked example — refactor a fragile positional ORDER BY
Detailed explanation. A reviewer flags the query below for using positional ORDER BY. The risk is real — a future contributor reorders the SELECT list to put region before revenue, and the report silently starts sorting by region first.
Question. Refactor SELECT region, revenue, customers FROM regional_sales ORDER BY 2 DESC, 1 ASC, 3 DESC; to use named columns and add a deterministic tie-breaker.
Input.
| region | revenue | customers |
|---|---|---|
| APAC | 1200 | 30 |
| EMEA | 900 | 22 |
| APAC | 1200 | 40 |
| AMER | 1200 | 25 |
| EMEA | 900 | 18 |
Code.
-- BEFORE (fragile)
SELECT region, revenue, customers
FROM regional_sales
ORDER BY 2 DESC, 1 ASC, 3 DESC;
-- AFTER (refactored)
SELECT region, revenue, customers, id
FROM regional_sales
ORDER BY
revenue DESC,
region ASC,
customers DESC,
id ASC; -- unique tie-breaker
Step-by-step explanation.
- The original ORDER BY references positions 2, 1, 3 —
revenue, region, customers. Reordering the SELECT to(revenue, region, customers)would silently change the sort. - The refactor restates the columns by name; the sort is now self-documenting and survives SELECT-list reorderings.
- Adding
id(the primary key) as the last sort key guarantees deterministic order between rows that tie on all visible columns — important if a downstream consumer (UI snapshot test, CSV diff) depends on row order. - The cost is identical to the positional version; this is a readability and correctness refactor.
Output.
| region | revenue | customers |
|---|---|---|
| AMER | 1200 | 25 |
| APAC | 1200 | 40 |
| APAC | 1200 | 30 |
| EMEA | 900 | 22 |
| EMEA | 900 | 18 |
Rule of thumb. Use positional ORDER BY only when the column it refers to is a one-off expression in the SELECT. For every other case, name the column.
5. ORDER BY plus LIMIT, indexes, and performance
A B-tree index on the ORDER BY column turns "sort the table" into "walk the index"
ORDER BY plus LIMIT is one of the highest-leverage optimisation targets in any OLTP system, because the same query — SELECT * FROM events ORDER BY created_at DESC LIMIT 10 — runs in a millisecond with the right index or several seconds without one at 10 million rows. The difference is whether the planner can satisfy the sort by walking an index in order, or has to read every row and run an external sort.
The five facts every senior interview wants you to land.
-
A B-tree index physically stores keys in sorted order. Walking the index produces rows in
ORDER BY column ASCorder with no sort step. Postgres and MySQL can also walk it backward forDESC— same index, same physical structure, both directions. - LIMIT pushes down through ORDER BY. The engine stops as soon as it has the top K rows. With an index this is K index-tuples; without one, it's still O(N log K) (a heap-based "top-K sort") rather than a full sort.
-
Multi-column ORDER BY needs a composite index.
ORDER BY a, b DESCis best served by an index on(a, b DESC)(Postgres supports per-column direction on indexes; MySQL since 8.0 too). Without the matching composite, the engine falls back to a sort. -
OFFSETis the silent killer.LIMIT 50 OFFSET 1000000makes the engine fetch the first 1,000,050 rows, throw away 1,000,000, and return the last 50. Use keyset pagination at scale. -
Index-only scan beats heap fetch. If
SELECT colonly references columns covered by the index, Postgres can skip the heap entirely (visibility map permitting). This is the difference between 30 µs and 3 ms per row at scale.
Index direction and the ASC/DESC duality.
A Postgres B-tree on (created_at) stores keys in ascending order. The planner can:
- Walk the index left-to-right to satisfy
ORDER BY created_at ASC(no sort). - Walk the index right-to-left to satisfy
ORDER BY created_at DESC(no sort, same index). - For mixed-direction multi-column sorts (e.g.
ORDER BY a ASC, b DESC), the single index(a)is not enough; either build(a ASC, b DESC)or accept a sort step.
MySQL since 8.0 supports descending indexes (CREATE INDEX … (a DESC, b ASC)) — earlier versions ignored the DESC keyword and stored ascending, then walked backward in memory.
Composite-index column order.
For ORDER BY a, b you need the index column order to match the leftmost prefix of the sort key. (a, b) works; (b, a) does not. The same rule governs which queries the index can serve:
| Query | Index (a, b)
|
Index (b, a)
|
|---|---|---|
WHERE a = ? ORDER BY b |
yes (range walk under leading equality) | no — wrong leading column |
WHERE b = ? ORDER BY a |
no | yes |
ORDER BY a, b |
yes (full index walk) | no |
ORDER BY b, a |
no | yes |
This is also why "add an index for every column" is bad advice — one composite index typically replaces three single-column ones and serves more query shapes.
The EXPLAIN reading shortcut.
- See
Sortin the plan? — the engine ran a sort. Cost is O(N log N) or O(N log K) for top-K; if N is large, you want an index. - See
Index Scanwith noSortabove it? — the index served the order. The cost line will read likecost=0.43..8.46 rows=10instead ofcost=0..43012 rows=10 (after sort). - See
Index Only Scan? — the engine didn't even touch the heap. This is the gold standard. - See
Bitmap Heap Scanunder anORDER BY? — the bitmap path is unordered; a sort step always follows. Bitmap is the wrong path forORDER BY + LIMITqueries.
LIMIT + OFFSET pitfalls — the production reality.
Pagination via OFFSET scales linearly with depth. The cost at page 1 is the same as the cost at page 1,000,000 minus the offset. For real applications:
- Page 1–10: OFFSET is fine — the cost difference is invisible.
- Page 10–100: OFFSET adds noticeable latency (10–100 ms).
- Page 100+: OFFSET becomes the dominant cost (hundreds of ms).
- Page 10,000+: OFFSET is unusable. Switch to keyset pagination — it stays O(log N + page) regardless of depth.
Worked example — turn a 3-second sort into a 5 ms index walk
Detailed explanation. A common reality: the "latest 50 orders" endpoint slows linearly with table size. At 100k rows it's fine, at 10M rows it's 3 seconds. The plan shows a sequential scan plus a sort. Adding a B-tree on created_at is a 30-second migration that takes the query to 5 ms.
Question. The query SELECT id, total FROM orders ORDER BY created_at DESC LIMIT 50; runs in 3 seconds at 10M rows. Show the EXPLAIN before, build the right index, and show EXPLAIN after.
Input (schema before).
| Column | Type |
|---|---|
| id | bigserial PK |
| customer_id | bigint |
| total | numeric(10,2) |
| created_at | timestamptz |
No indexes other than the PK on id.
Code.
-- BEFORE: sequential scan + top-K sort
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, total
FROM orders
ORDER BY created_at DESC
LIMIT 50;
-- Limit (cost=320145..320150 rows=50)
-- -> Sort (cost=320145..345128 rows=10000000)
-- Sort Key: created_at DESC
-- -> Seq Scan on orders (cost=0..183120 rows=10000000)
-- BUILD: B-tree on created_at (per-column DESC redundant; index is bidirectional)
CREATE INDEX idx_orders_created_at ON orders (created_at);
-- AFTER: index scan, no sort
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, total
FROM orders
ORDER BY created_at DESC
LIMIT 50;
-- Limit (cost=0.43..2.18 rows=50)
-- -> Index Scan Backward using idx_orders_created_at on orders (cost=0.43..350421 rows=10000000)
Step-by-step explanation.
- The before plan reads every row (Seq Scan, 10M rows), runs an external sort to find the 50 newest, and returns them. Cost dominated by the sort.
-
CREATE INDEX idx_orders_created_at ON orders (created_at)builds a B-tree sorted bycreated_atascending. Build cost is one full table scan + log-N writes. - The after plan walks the index backward (right-to-left) — Postgres prefix
Index Scan Backward— collecting the 50 newest entries, then heap-fetchesid, totalfor each. No sort node. Cost drops from345128to2.18. - To eliminate the heap fetch as well (index-only scan), include
totalin the index:CREATE INDEX … ON orders (created_at, total)or useINCLUDE (total)in Postgres 11+.
Output.
| Stage | Plan | Latency (10M rows) |
|---|---|---|
| Before | Seq Scan + Top-K Sort | ~3,000 ms |
| After (single-column index) | Index Scan Backward + heap fetch | ~5 ms |
| After (covering index) | Index Only Scan Backward | ~1 ms |
Rule of thumb. Any "latest N" / "top N" endpoint on a table > 100k rows should have a B-tree index whose leading column matches the leftmost ORDER BY column. Cover the SELECT list if you want index-only scans.
SQL interview question on a slow-paginated feed
A senior probe: "We have SELECT * FROM posts ORDER BY created_at DESC, id DESC LIMIT 50 OFFSET 100000;. The first page returns in 5 ms; page 2,000 takes 1.5 seconds. The table has an index on created_at. What's happening, and how do you fix it without changing the API contract?"
The OFFSET-aging problem: at OFFSET 100,000 the engine reads 100,050 index entries, heap-fetches each, then discards the first 100,000. The index walk is O(N) in the offset. The fix is keyset pagination on the (created_at, id) tuple plus a matching composite index.
Solution Using a composite (created_at, id) index + keyset cursor
-- One-time: composite index matching the ORDER BY tuple
CREATE INDEX idx_posts_created_at_id ON posts (created_at DESC, id DESC);
-- Replace OFFSET with keyset
SELECT id, title, created_at
FROM posts
WHERE (created_at, id) < (:cursor_created_at, :cursor_id)
ORDER BY created_at DESC, id DESC
LIMIT 50;
Step-by-step trace.
| Strategy | Index | Plan | Latency at depth 100k |
|---|---|---|---|
OFFSET 100000 + single-col idx (created_at)
|
(created_at) |
Index Scan Backward + 100k heap fetches | ~1,500 ms |
OFFSET 100000 + composite idx (created_at DESC, id DESC)
|
(created_at DESC, id DESC) |
Index Scan + 100k row skips | ~1,200 ms |
Keyset (created_at, id) < (?, ?) + composite idx |
(created_at DESC, id DESC) |
Index Seek → 50-row forward walk | ~3 ms |
The crucial change is eliminating the offset, not just adding the index. Even with a perfect index, OFFSET 100,000 walks 100k entries.
Output:
| Page depth | OFFSET strategy | Keyset strategy |
|---|---|---|
| 1 (depth 0) | 3 ms | 3 ms |
| 100 (depth 5k) | 90 ms | 3 ms |
| 1,000 (depth 50k) | 700 ms | 3 ms |
| 2,000 (depth 100k) | 1,500 ms | 3 ms |
| 10,000 (depth 500k) | 8,000 ms | 3 ms |
Why this works — concept by concept:
-
Composite index matching the sort tuple —
(created_at DESC, id DESC)stores keys in the exact order the query wants them. The planner can seek to the cursor and walk forward; no sort step, no heap thrash. -
Tuple comparison as a single seek —
(created_at, id) < (C, I)translates to a B-tree seek operation. Postgres, MySQL 8+, DuckDB, and SQLite all support this pattern. - Keyset vs OFFSET — OFFSET is "skip N rows the engine has to materialise"; keyset is "jump to the cursor and walk." For depths > 1000 the difference is two to four orders of magnitude.
-
Backward index walk for DESC — Postgres' planner reads the same B-tree right-to-left for descending sorts. You do not need a "DESC" index for descending order on a single column — but for mixed direction (e.g.
created_at DESC, id ASC), explicit per-column index direction matters. -
Stable under inserts — the cursor is anchored to actual values. New rows with
created_at > cursor.created_atsimply appear on earlier pages (where they belong) without shifting the current page. - Cost — keyset pagination = O(log N + page_size) regardless of page depth; OFFSET pagination = O(offset + page_size), which is linear in depth.
SQL
Topic — order by + limit
ORDER BY + LIMIT and indexed sorts (SQL)
Mixed-direction sorts and per-column index direction
A B-tree on (a) serves both ORDER BY a ASC and ORDER BY a DESC because the engine can walk the same index in either direction. The story changes the moment your ORDER BY has mixed directions across columns — ORDER BY a ASC, b DESC cannot be served by a single-direction index on (a, b) without an additional sort step.
The single-column case (easy).
-
ORDER BY a ASC↔ index(a)walked forward. -
ORDER BY a DESC↔ index(a)walked backward. - No need for a
DESCindex on a single column.
The multi-column same-direction case (still easy).
-
ORDER BY a ASC, b ASC↔ index(a, b)walked forward. -
ORDER BY a DESC, b DESC↔ index(a, b)walked backward.
The multi-column mixed-direction case (the trap).
-
ORDER BY a ASC, b DESC— no single-direction index(a, b)can produce this physically. The engine either sorts in memory or uses a per-column direction index:CREATE INDEX … ON t (a ASC, b DESC). - Postgres has supported per-column direction since 8.3. MySQL since 8.0. SQL Server: yes. Snowflake: indirect (it picks the right micro-partition order). BigQuery: it's a columnar store, sort works regardless.
When to build the directional composite index.
- Mixed-direction sort is hot on a large table (e.g. a feed).
- The sort is part of pagination — repeated execution.
- You see
Sortstep in the EXPLAIN plan with high cost.
For one-off reports or low-traffic queries, accept the sort step and don't proliferate indexes.
The covering / INCLUDE index trick.
Postgres 11+ supports INCLUDE to append non-key columns to the leaf nodes of a B-tree:
CREATE INDEX idx_orders_created_at_inc
ON orders (created_at)
INCLUDE (total, customer_id);
The index now serves SELECT total, customer_id FROM orders ORDER BY created_at DESC LIMIT 50 as an Index Only Scan — the heap is never touched. The cost drops from "log N seeks + 50 heap fetches" to "log N seeks + zero heap fetches", which is often a 5-10× improvement on cold cache.
Worked example — feed query with mixed-direction sort
Detailed explanation. A social feed sorts posts by pinned DESC, created_at DESC, id DESC. pinned is a boolean flag, mostly false. Without the right index the engine does a full scan + sort; with the right per-column direction index it does an index walk.
Question. A social-network app has a posts table with 50M rows. The feed query is SELECT id, author_id, title, created_at FROM posts ORDER BY pinned DESC, created_at DESC, id DESC LIMIT 30;. Show the index you'd build, and what the EXPLAIN looks like before and after.
Input.
| Column | Type | Cardinality |
|---|---|---|
| id | bigserial PK | 50M |
| author_id | bigint | 1M |
| pinned | boolean | 0.01% true |
| created_at | timestamptz | unique enough |
| title | text | n/a |
Code.
-- Composite, per-column direction. Postgres 8.3+ / MySQL 8.0+ / SQL Server / DuckDB
CREATE INDEX idx_posts_feed
ON posts (pinned DESC, created_at DESC, id DESC)
INCLUDE (author_id, title);
-- The feed query
SELECT id, author_id, title, created_at
FROM posts
ORDER BY pinned DESC, created_at DESC, id DESC
LIMIT 30;
Step-by-step explanation.
- The index's leading column
pinned DESCmatches the sort's leading column —truerows are visited first. - The second column
created_at DESCis also descending, matching the sort. - The third column
id DESCis the tie-breaker. Withidbeing the primary key, the sort tuple is now strictly ordered with no ties. - The
INCLUDE (author_id, title)clause appends two read-only columns into the leaf nodes — the SELECT can be served entirely from the index. No heap fetch. - Postgres' plan changes from
Sort + Seq Scan + Limit(cost ~ 1.5M) toIndex Only Scan + Limit(cost ~ 30).
Output.
| Stage | Plan | Latency (50M rows, cold cache) |
|---|---|---|
| Before (no index) | Seq Scan + Top-K Sort | ~18,000 ms |
| After (sort-aligned index, key only) | Index Scan + 30 heap fetches | ~40 ms |
| After (covering index with INCLUDE) | Index Only Scan, no heap | ~3 ms |
Rule of thumb. For high-traffic feed / leaderboard / chronological-list queries on large tables, build a per-column direction composite index that matches the ORDER BY tuple exactly, and INCLUDE the SELECT-list columns to get an index-only scan. Cost amortises after a few hundred queries.
SQL interview question on a hot leaderboard with frequent ties
A senior probe: "We have a leaderboard SELECT user_id, score FROM leaderboard ORDER BY score DESC LIMIT 100; running 500 times/second. There are 200 million rows. The current index (score) works at low traffic but starts queuing under load. Walk through the index changes and explain what the planner is doing."
The pattern is: composite index (score DESC, user_id ASC) with INCLUDE (display_name) if display_name is in the SELECT, plus keyset pagination for "infinite scroll" requests.
Solution Using a covering composite index + keyset
-- The covering index
CREATE INDEX idx_leaderboard_score
ON leaderboard (score DESC, user_id ASC)
INCLUDE (display_name);
-- Top-100 query
SELECT user_id, score, display_name
FROM leaderboard
ORDER BY score DESC, user_id ASC
LIMIT 100;
-- Keyset pagination (page N anchored to cursor)
SELECT user_id, score, display_name
FROM leaderboard
WHERE (score, user_id) < (:cursor_score, :cursor_user_id)
ORDER BY score DESC, user_id ASC
LIMIT 100;
Step-by-step trace.
| Request | Index path | Heap fetches | Latency |
|---|---|---|---|
| Top-100 (single-col idx, no INCLUDE) | Index Scan Backward on (score)
|
100 | ~3 ms |
| Top-100 (covering idx with INCLUDE) | Index Only Scan on (score DESC, user_id ASC)
|
0 | ~0.4 ms |
| Page 100 via OFFSET 10000 + covering idx | Index Only Scan + 10k row skips | 0 | ~80 ms |
| Page 100 via keyset + covering idx | Index Seek to cursor + 100-row walk | 0 | ~0.4 ms |
Output:
| Strategy | Throughput (queries/sec single core) | p99 latency |
|---|---|---|
| Seq Scan + Top-K Sort (no index) | ~5 | 2,000 ms |
(score) index |
~3,000 | 4 ms |
Covering (score DESC, user_id ASC) INCLUDE (display_name)
|
~25,000 | 0.6 ms |
| Covering + keyset for pagination | ~25,000 | 0.6 ms across all pages |
Why this works — concept by concept:
-
Sort-aligned composite —
(score DESC, user_id ASC)is exactly the sort tuple. The planner walks the index in order; no sort step, no buffer. -
Unique tie-breaker —
user_idis the primary key; no ties exist on the full tuple. Cursor stability is guaranteed. - Covering with INCLUDE — the SELECT-list columns live inside the index leaves, so the heap is never touched. This is the largest single-step optimisation for read-heavy workloads.
-
Index Only Scan visibility — Postgres can only do Index Only Scans if the visibility map says all referenced pages are all-visible. After a heavy update workload run
VACUUMto refresh the visibility map. - Keyset replaces OFFSET — same covering index, but pagination becomes O(log N + page_size) regardless of depth. Combined with the covering trick, even page 10,000 stays under a millisecond.
- Cost — read latency = O(log N) seek + O(page_size); index write amplification ≈ 2× (key plus included columns); storage ≈ 1.5× of the original table for this index. Worth it on read-dominated paths.
SQL
Topic — order by + limit
Top-K and leaderboard problems (SQL, indexed sorts)
Cheat sheet — ORDER BY recipes
-
Stable pagination (cursor anchor).
ORDER BY created_at DESC, id DESC LIMIT 50— always end with a unique column. -
NULL-safe sort (portable).
ORDER BY col IS NULL, col ASC— false-before-true puts NULLs last on every dialect. -
NULL-safe sort (Postgres / Snowflake / BigQuery).
ORDER BY col ASC NULLS LAST— native and self-documenting. -
Custom priority order.
ORDER BY CASE WHEN status='urgent' THEN 0 WHEN status='normal' THEN 1 ELSE 2 END, created_at DESC— drives non-alphabetical priority sequences. -
Sort by computed expression.
ORDER BY LENGTH(name),ORDER BY revenue / NULLIF(orders, 0) DESC,ORDER BY MD5(id::text)(deterministic shuffle). -
Multi-key, mixed direction.
ORDER BY country ASC, revenue DESC, created_at DESC, id ASC— read every direction explicitly. -
Descending keyset pagination.
WHERE (created_at, id) < (?, ?) ORDER BY created_at DESC, id DESC LIMIT 50— production pagination on any depth. -
Ascending keyset pagination.
WHERE (created_at, id) > (?, ?) ORDER BY created_at ASC, id ASC LIMIT 50— symmetric. -
Index recipe. Build
(sort_col_1, sort_col_2, …, unique_tie_breaker)matching the ORDER BY tuple. AddINCLUDEcolumns to enable index-only scans. -
Top-K reporting (small K).
ORDER BY metric DESC LIMIT 10is a top-K query — a sort node with a heap of size K is O(N log K); index walks are O(log N + K). Always index the metric. -
Versioned strings.
ORDER BY string_to_array(version, '.')::int[](Postgres) — natural sort for dotted-numeric versions; mirror for MySQL withSUBSTRING_INDEX, SQL Server withPARSENAME. -
Window functions need their own ORDER BY.
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC)— independent of the outer ORDER BY of the query.
Frequently asked questions
Can I ORDER BY a column that is not in the SELECT list?
Yes, in nearly every modern dialect. ORDER BY runs after SELECT but it has access to every column of every table named in the FROM clause, not just the projected list. SELECT name FROM users ORDER BY created_at DESC is valid. The two exceptions are: (1) SELECT DISTINCT queries, where the ORDER BY can only reference columns that appear in the SELECT (because DISTINCT collapses identical rows and the engine can't see the originals), and (2) queries with GROUP BY, where the ORDER BY can reference only grouped columns or aggregates. Outside those two cases, you can sort by anything in scope.
Why does MySQL not support NULLS LAST?
Historical reasons. MySQL's ORDER BY predates the SQL:2003 standard that introduced explicit NULLS FIRST / NULLS LAST keywords, and the original MySQL behaviour fixed NULL as the smallest value (so ASC puts NULLs first, DESC puts NULLs last). Adding the keywords now would change the meaning of existing queries on some edge cases, so the MySQL team has consistently chosen the portable workaround pattern instead: ORDER BY col IS NULL, col toggles between "NULLs last" and "NULLs first" in a fully ANSI-compatible way that works on every dialect.
Does ORDER BY use an index?
It can, if the index is aligned with the sort key list. A B-tree index on the leftmost ORDER BY column lets the planner walk the index in order and avoid a sort node — that's the "Index Scan" line in EXPLAIN. For multi-column sorts, the index must match the leading prefix of the sort tuple (and per-column direction on engines that support directional indexes). Without a matching index, the engine falls back to an external sort, which scales as O(N log N). For top-K queries with LIMIT, an index brings the cost down from O(N log N) to O(log N + K) — often a 100× to 1000× speedup at scale.
ORDER BY plus LIMIT — what's the cost?
With a matching index, ORDER BY col LIMIT K is O(log N + K) — the planner seeks the start of the sorted range and walks K entries. Without an index, the engine does a top-K sort: it scans every row but only maintains a heap of size K, so the cost is O(N log K) instead of O(N log N). At 10M rows, top-K with K=10 is still ~30M comparisons — fast in memory but several seconds on disk. The lesson: build the index. ORDER BY + LIMIT is the single highest-leverage place to add a B-tree.
Why is my paginated query slow at page 1000?
OFFSET-aging. LIMIT 50 OFFSET 100000 makes the engine fetch 100,050 rows and discard 100,000 of them. Even with a perfect index, the cost grows linearly with the offset — page 1 is 5 ms, page 1000 is 1500 ms. The production fix is keyset pagination: keep a cursor (sort_col, id) from the last row of the previous page and query WHERE (sort_col, id) < (cursor_sort, cursor_id) ORDER BY sort_col DESC, id DESC LIMIT 50. Tuple comparison plus a matching composite index gives O(log N + page_size) regardless of depth.
Does ORDER BY guarantee stable order across runs?
Only if the sort key list is unique. SQL engines pick sort algorithms based on the planner's cost model — they may use a parallel quicksort (unstable), an external merge sort (stable), or an in-memory sort (depends on implementation). Two consecutive runs of the same query can return tied rows in different orders. The portable guarantee comes from making ties impossible: end the ORDER BY list with a unique column (typically the primary key). After that, every run produces the same row sequence, every cursor is well-defined, and every pagination boundary is stable.
Practice on PipeCode
- Drill the sorting practice library → for end-to-end ORDER BY problems on multi-column sorts, NULLs, and tie-breakers.
- Rehearse the ORDER BY + LIMIT drills → for pagination, top-K, and keyset cursors.
- Sharpen SQL sorting problems → when the interviewer wants production-style SQL, not pseudocode.
- Pair sorting with NULL handling drills (SQL) → for the dialect-aware NULL positioning patterns.
- Stretch into window function problems → where ORDER BY inside the window drives
ROW_NUMBERandLAG / LEAD. - Practise the company-tagged drills: Meta sorting problems →, Amazon sorting problems →, Walmart sorting problems →.
- For the broader interview surface, read top data engineering interview questions →.
- Stack the prerequisites with the only 5 skills you need to become a data engineer →.
- Take the deep dive with the full SQL for data engineering interviews — from zero to FAANG → course.
- For the design-round muscles, work through ETL system design for DE interviews →.
- Reinforce the modelling side with data modelling for DE interviews →.
Pipecode.ai is Leetcode for Data Engineering — every ORDER BY concept above ships with hands-on practice rooms where you sort real datasets, debug real NULL placement, and trace real keyset cursors against real B-tree indexes. Start with the sorting library and work outward; PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine.





Top comments (0)