DEV Community

Cover image for SQL ORDER BY, NULLS FIRST/LAST & Multi-Column Sorts
Gowtham Potureddi
Gowtham Potureddi

Posted on

SQL ORDER BY, NULLS FIRST/LAST & Multi-Column Sorts

The sql order by clause is the only thing standing between a relational engine and the inconvenient truth that a table has no inherent order. The standard says so explicitly: a SELECT without ORDER BY returns a set, and the engine is free to hand you back rows in whatever sequence the planner finds cheapest — index scan order, hash-bucket order, parallel-worker arrival order, or the physical order of pages on disk after the last VACUUM. The moment you need a deterministic sequence — for pagination, for a report, for a leaderboard, for a window function that depends on row order — you have to say so.

That sounds easy until you start writing order by sql for real systems and meet the four traps that quietly break production queries: sql order by descending applies only to the column it touches (not the whole key list), NULL values sort in different positions on different engines, order by descending sql plus LIMIT can either be a zero-cost index walk or a full table sort depending on whether you indexed the column, and OFFSET-based pagination silently turns into a sequential scan once users page past row 10,000. This guide takes those four traps apart in detail, with worked interview answers, dialect-by-dialect notes for Postgres, MySQL, SQL Server, Snowflake, and BigQuery, and the keyset-pagination pattern every senior data engineering interview now expects you to reach for.

PipeCode blog header for a SQL ORDER BY deep dive — bold white headline 'SQL ORDER BY' with subtitle 'NULLS FIRST/LAST · Multi-Column · Pagination' and a stylised sorted-list ASC/DESC visual on a dark gradient with purple, green, orange, and blue accents and a small pipecode.ai attribution.

When you want hands-on reps immediately after reading, browse sorting practice library →, drill the ORDER BY + LIMIT drills →, and rehearse on SQL sorting problems →.


On this page


1. Why ORDER BY is more subtle than it looks

The relational rule: a table is a set; ORDER BY is the only thing that imposes a sequence

The one-sentence invariant: a SQL table has no order, a SQL result set has no order, and only a top-level ORDER BY produces a deterministic row sequence. Subqueries, CTEs, views, derived tables, and the inputs to set operations like UNION are all unordered — even if you wrote ORDER BY inside them, the optimiser may legally drop it. That subtlety is the seed of every "but it was sorted on my laptop" bug in junior code.

The four things ORDER BY actually does.

  • Imposes a total order on the final result set. The clause runs after SELECT produces rows, so it can reference output columns, aliases, and column positions.
  • Drives LIMIT / OFFSET / FETCH FIRST semantics. "Top 10 by revenue" is meaningless without an ORDER BY revenue DESCLIMIT 10 over an unsorted set is "any 10 rows the engine felt like returning."
  • Feeds window functions that are order-dependent. ROW_NUMBER() OVER (ORDER BY created_at) and LAG() / LEAD() need a deterministic ordering inside the window. (Note: the ORDER BY inside a window clause is unrelated to the outer ORDER BY of the query.)
  • Stabilises pagination. Cursor-based / keyset pagination is built directly on the ORDER BY key list — if the key isn't unique, page boundaries are non-deterministic.

Where ORDER BY fits in the logical execution order.

  • FROM — locate base tables.
  • WHERE — row-level filtering.
  • GROUP BY — collapse rows into groups.
  • HAVING — group-level filtering.
  • SELECT — project columns / aliases / expressions.
  • ORDER BYafter SELECT, so you can sort by aliases.
  • LIMIT / OFFSET — applied last.

That ordering is why order by in sql can reference column aliases (SELECT price * qty AS revenue FROM sales ORDER BY revenue DESC) in nearly every modern dialect — ORDER BY sees the projected columns. The handful of exceptions are old Oracle versions and a couple of niche embedded engines; assume the alias works on Postgres, MySQL 8+, SQL Server, Snowflake, BigQuery, DuckDB, ClickHouse, and Redshift.

Where ORDER BY silently does nothing.

  • Inside a CTE that is not the final query. WITH x AS (SELECT … ORDER BY a) SELECT * FROM x — the inner ORDER BY may be discarded; only the outer SELECT * FROM x ORDER BY a is guaranteed.
  • Inside a view. Postgres in particular warns that a view's ORDER BY is a "hint" and is dropped when the planner inlines the view.
  • Inside a subquery feeding UNION / INTERSECT / EXCEPT. The set operation has no ordering input. The only meaningful ORDER BY is on the outer statement.
  • Inside INSERT … SELECT for some engines. Postgres makes no guarantee that rows are physically inserted in the SELECT order; MySQL preserves it for MyISAM but not InnoDB.

Dialect defaults at a glance.

  • Postgres / Snowflake / BigQuery / Oracle: ASC default → NULLS LAST; DESC default → NULLS FIRST. (Postgres lets you flip with explicit NULLS FIRST / NULLS LAST.)
  • MySQL / SQL Server / SQLite: NULL is treated as the smallest valueASC puts NULLs first, DESC puts NULLs last. No native NULLS FIRST/LAST syntax (added only to recent Snowflake-aligned MySQL variants).
  • DuckDB / Vertica / DB2: support NULLS FIRST/LAST syntax explicitly.

These differences look academic until you migrate a report from Postgres to SQL Server and discover that "top 10 unpaid invoices" suddenly shows a different first row because NULL paid_at values moved.

Visual diagram showing ASC vs DESC defaults — top half shows ORDER BY a, b DESC and explains only b is DESC, a stays ASC (subtle gotcha); bottom half shows the string-vs-numeric sorting trap ('1, 10, 11, 2' vs 1, 2, 10, 11); on a light PipeCode card.

What interviewers listen for.

  • Do you say "a table has no order" without prompting? — senior signal.
  • Do you reach for an explicit NULLS FIRST / NULLS LAST when NULLs are part of the column? — required.
  • Do you mention that order by sql can reference aliases and computed expressions? — useful tutoring signal.
  • Do you flag OFFSET pagination ageing as a known scaling pitfall, and propose keyset pagination? — distinguishing senior answer.

Worked example — does this query have a deterministic order?

Detailed explanation. Junior engineers often assume that a SELECT with a GROUP BY "comes out grouped" — and on small datasets it sometimes does, by accident, because the hash aggregate iterates buckets in insertion order. Move to a parallel plan, switch engines, or grow the data, and the apparent order vanishes. The only way to guarantee output order is ORDER BY on the outer query.

Question. Given the orders table below, which of the following queries has a guaranteed row order?

  1. SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id;
  2. SELECT * FROM (SELECT * FROM orders ORDER BY id) AS o;
  3. SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id ORDER BY total DESC;

Input.

id customer_id amount
1 7 50
2 7 30
3 9 10
4 9 80

Code.

-- 1. Group only — undefined order
SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id;

-- 2. ORDER BY inside a derived table — engine may drop it
SELECT * FROM (SELECT * FROM orders ORDER BY id) AS o;

-- 3. Outer ORDER BY — deterministic
SELECT customer_id, SUM(amount) AS total
FROM orders
GROUP BY customer_id
ORDER BY total DESC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Query 1 produces two rows but the engine never promised an order. On Postgres a hash aggregate could return 9, 7 or 7, 9 depending on hash collisions and parallel-worker arrival.
  2. Query 2's inner ORDER BY id is not guaranteed to propagate. The standard treats the subquery as a set; the planner is allowed (and on many engines does) discard the ORDER BY.
  3. Query 3 is the only deterministic option — the outer ORDER BY total DESC is binding. If two customers tie on total, the order between them is still undefined; for a fully deterministic answer add a tie-breaker like ORDER BY total DESC, customer_id ASC.

Output.

Query Deterministic? Why
1 No no outer ORDER BY
2 No inner ORDER BY may be dropped
3 Almost — needs tie-breaker outer ORDER BY is binding; ties undefined

Rule of thumb. Every query whose output is consumed by a human, a UI, a CSV download, or a paginated API should end with a top-level ORDER BY whose key list ends in a unique column (typically a primary key) — that final column eliminates ties.

SQL interview question on deterministic ordering

A senior interviewer often shapes this round as: "Here's a paginated leaderboard endpoint that occasionally repeats users across page boundaries. The query is SELECT user_id, score FROM leaderboard ORDER BY score DESC LIMIT 50 OFFSET ?. Why does it repeat, and how would you fix it without breaking the UI contract?"

The fix is to make the order by in sql key list unique by appending a tie-breaker. A single score value shared by ten users will let pages 2 and 3 reshuffle that block on every request — keyset pagination on (score, user_id) is the deterministic answer.

Solution Using ORDER BY score DESC, user_id DESC and a keyset cursor

-- v1: append a unique tie-breaker — fixes deterministic ordering even with OFFSET
SELECT user_id, score
FROM leaderboard
ORDER BY score DESC, user_id DESC
LIMIT 50 OFFSET 100;

-- v2: keyset / cursor-based pagination — stays O(log N) and resists row insertions
SELECT user_id, score
FROM leaderboard
WHERE (score, user_id) < (:cursor_score, :cursor_user_id)
ORDER BY score DESC, user_id DESC
LIMIT 50;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Page Cursor (score, user_id) Predicate Result block
1 (none) LIMIT 50 only top 50 rows ordered by (score DESC, user_id DESC)
2 (980, 9123) — last row of page 1 (score, user_id) < (980, 9123) next 50 rows strictly after the cursor
3 (940, 1734) (score, user_id) < (940, 1734) next 50 rows
New row inserted at score=970 between pages 2 and 3 (940, 1734) predicate unchanged page 3 unaffected — the new row would have been on page 2 only if the cursor was older

Output:

Endpoint Page 2 (OFFSET style) cost Page 2 (keyset) cost Behaviour under inserts
Leaderboard top 1000 O(N) — engine scans 100 rows it then discards O(log N + 50) — index seek + 50 scans OFFSET shifts rows; keyset stable per cursor

Why this works — concept by concept:

  • Unique tie-breaker — appending user_id to the sort key collapses every tie group of length > 1 into a strict total order. Without it, the optimiser is free to return tied rows in any sequence each time the query runs.
  • Tuple comparison(score, user_id) < (s, u) is a single tuple comparison; Postgres, MySQL 8+, SQLite, and DuckDB all rewrite it to the lexicographic equivalent so a (score DESC, user_id DESC) index can serve both the comparison and the sort with no row sort step.
  • Keyset vs OFFSET — OFFSET pagination scans-and-discards every row up to the offset; keyset uses the index to seek directly to the cursor and walks forward. The cost gap explodes past page 100.
  • Insert resilience — keyset pagination is anchored to the last row's key, so new rows inserted after the cursor is captured do not shift page boundaries.
  • Cost — OFFSET pagination = O(offset + page_size); keyset pagination = O(log N + page_size). For 10k+ rows of pagination depth, the gap is 100× to 10,000×.

SQL
Topic — order by + limit
ORDER BY + LIMIT and pagination problems (SQL)

Practice →

Where ORDER BY is silently dropped — the four "looked sorted on my laptop" bugs

Junior engineers usually meet order by in sql through the obvious top-level SELECT … ORDER BY … form. The bugs start when ORDER BY appears inside something else — a CTE, a view, a subquery, an INSERT … SELECT — and the engine quietly drops it. Knowing exactly where ORDER BY is binding and where it isn't is the difference between a query that ships and a query that ships and still works after the planner upgrades.

Bug 1 — ORDER BY inside a CTE.

WITH ranked AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
    FROM logins
    ORDER BY created_at DESC      -- this is dropped
)
SELECT * FROM ranked WHERE rn = 1;
Enter fullscreen mode Exit fullscreen mode

The ORDER BY inside the CTE is removed by the planner. The good news is the ROW_NUMBER() OVER (… ORDER BY created_at DESC) is independent of the outer ORDER BY — window functions have their own ordering. The bug fires only when someone assumes the CTE's rows arrive in the outer query in sorted order.

Bug 2 — ORDER BY in a view.

Postgres explicitly warns "ORDER BY in a view is a hint; it may be dropped." The fix is to put ORDER BY on the outer query that selects from the view, not on the view definition itself.

Bug 3 — ORDER BY in a subquery feeding UNION.

(SELECT id, name FROM users WHERE region = 'APAC' ORDER BY name)
UNION ALL
(SELECT id, name FROM users WHERE region = 'EMEA' ORDER BY name);
Enter fullscreen mode Exit fullscreen mode

The inner ORDER BYs are decoration. UNION ALL outputs rows in whatever order is cheapest. To get a final sort, wrap the whole thing: SELECT * FROM (… UNION ALL …) t ORDER BY name.

Bug 4 — INSERT … SELECT … ORDER BY.

Most engines do not guarantee that rows are inserted in the SELECT's order. Sequential primary keys may not match the ORDER BY order. The only guarantee is the final SELECT's order if you re-query with an ORDER BY.

Worked example — surface the row's rank correctly across a CTE

Detailed explanation. A common pattern: select the latest record per user. The naive impulse is to ORDER BY inside a CTE and pick the first row per user. The robust pattern uses ROW_NUMBER() because the window function's own ORDER BY is binding, while a CTE-level ORDER BY may be dropped.

Question. From a logins table, write a query that returns the most recent login row per user_id, plus the running rank of each login per user. The result must be stable on Postgres, MySQL 8+, and Snowflake.

Input.

user_id login_at
7 2026-05-25 09:00
7 2026-05-27 14:00
7 2026-05-26 08:00
9 2026-05-22 10:00
9 2026-05-28 11:00

Code.

WITH ranked AS (
    SELECT
        user_id,
        login_at,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_at DESC) AS rn
    FROM logins
)
SELECT user_id, login_at
FROM ranked
WHERE rn = 1
ORDER BY user_id ASC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Inside the CTE the window's ORDER BY ranks each user's logins from newest to oldest — rn = 1 is the most recent.
  2. There is intentionally no outer-CTE ORDER BY — adding one would be ignored anyway, and it would mislead the reader.
  3. The outer SELECT filters down to one row per user, then imposes a final, binding ORDER BY on user_id to produce a deterministic result set.
  4. Using ROW_NUMBER() rather than RANK() guarantees exactly one row per user even when two logins land on the same timestamp; add a unique tie-breaker (e.g. login_id ASC) inside the OVER clause if your data permits ties.

Output.

user_id login_at
7 2026-05-27 14:00
9 2026-05-28 11:00

Rule of thumb. Put the ORDER BY where it binds: inside OVER (… ORDER BY …) for window functions, and on the outermost SELECT for the final result. Anywhere else, treat it as decoration that the planner is free to discard.


2. ASC vs DESC and the default behaviour

sql order by descending is per-column, not per-key-list — every column needs its own DESC

The single most common ORDER BY bug is reading ORDER BY a, b DESC as "sort both columns descending" — the actual semantics are "sort by a ascending (the default), then within each tie, sort by b descending." DESC binds only to the column immediately before it, exactly like a unary postfix operator. To sort every column descending you have to write each DESC explicitly.

The minimum rules for ASC and DESC.

  • ASC is the default. ORDER BY a is identical to ORDER BY a ASC.
  • DESC is per-column. ORDER BY a, b DESCORDER BY a ASC, b DESC. To reverse both, write ORDER BY a DESC, b DESC.
  • Direction interacts with collation. Strings sort according to the column's collation — 'A' and 'a' may compare equal under case-insensitive collations and unequal under binary collations.
  • Direction does not affect equality. ORDER BY x DESC does not change how x = y is evaluated elsewhere; it only flips the comparator inside the sort.

The three traps every junior engineer hits at least once.

  • The DESC-binds-to-one trap. ORDER BY a, b DESC → "I asked for both DESC, but only b is."
  • The stringly-typed-number trap. ORDER BY id_text on a VARCHAR column with values '1', '2', '10', '11' yields '1', '10', '11', '2' — the lexicographic order. Fix: ORDER BY CAST(id_text AS INTEGER) or fix the schema.
  • The collation trap. ORDER BY name on a Postgres column with C collation sorts 'Apple', 'apple', 'banana' as 'Apple', 'apple', 'banana' (uppercase first); with en_US.utf8 it sorts case-insensitively as 'Apple', 'apple', 'banana' or 'apple', 'Apple', 'banana' depending on locale ties.

Computed expressions in ORDER BY.

You can ORDER BY any expression, not just a bare column. The engine evaluates the expression after the SELECT projection and sorts on the result. This is how you build:

  • Case-driven custom order. ORDER BY CASE WHEN status = 'urgent' THEN 0 WHEN status = 'normal' THEN 1 ELSE 2 END, created_at DESC.
  • Sort by length, hash, modulo. ORDER BY LENGTH(name), ORDER BY MOD(user_id, 7), ORDER BY MD5(email) (deterministic shuffle for sampling).
  • Sort by alias. SELECT price * quantity AS revenue … ORDER BY revenue DESC.
  • Sort by column position. ORDER BY 1, 2 — concise but fragile; if someone re-orders the SELECT list, the ORDER BY silently changes. Acceptable in throwaway queries, dangerous in stored procedures.

Direction with mixed columns — the worked patterns.

Goal Correct ORDER BY
Newest first, then alphabetical name created_at DESC, name ASC
Highest revenue, ties broken oldest first revenue DESC, created_at ASC, id ASC
Status priority, then SLA breach, then id CASE status WHEN 'urgent' THEN 0 WHEN 'normal' THEN 1 ELSE 2 END, sla_breach_at ASC, id ASC
Lexicographic reverse of name name DESC (binary collation) or REVERSE(name) ASC

Worked example — fix a "both columns descending" bug

Detailed explanation. Production-bug story: a paginated table view was supposed to show "most recent first, highest priority first." The reviewer wrote ORDER BY created_at, priority DESC and shipped. The list looked roughly right because most recent rows happened to also be high priority — until a backlog of week-old high-priority tickets quietly drifted off the first page. The cause was the DESC binding only to priority; created_at was still ascending.

Question. Given the tickets table, write the ORDER BY clause that sorts by created_at descending then priority descending (1 = highest), so the most recent high-priority ticket appears first.

Input.

id created_at priority summary
1 2026-05-25 09:00 1 DB outage
2 2026-05-26 11:00 2 UI bug
3 2026-05-26 14:00 1 API 500
4 2026-05-27 09:00 3 typo
5 2026-05-27 10:00 1 payment fail

Code.

-- WRONG — only priority is DESC; created_at stays ASC
SELECT * FROM tickets ORDER BY created_at, priority DESC;

-- RIGHT — both columns explicitly DESC
SELECT * FROM tickets ORDER BY created_at DESC, priority DESC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. In the wrong version, ORDER BY created_at, priority DESC is parsed as created_at ASC, priority DESC. The oldest row (id=1) appears first.
  2. In the right version, created_at DESC puts the newest day first; within the newest day, priority DESC (where 1 = highest priority means 1 < 3) actually puts priority=3 first. Watch for this — if "priority" is a small-is-high-priority numeric value you usually want priority ASC ("smallest priority number first").
  3. Re-reading the requirement — "highest priority first" with priority=1 = highest — the correct sort is created_at DESC, priority ASC, id ASC.

Output (corrected).

id created_at priority
5 2026-05-27 10:00 1
4 2026-05-27 09:00 3
3 2026-05-26 14:00 1
2 2026-05-26 11:00 2
1 2026-05-25 09:00 1

Rule of thumb. When the column semantics are "smaller number = better" (priority 1 = highest, rank 1 = top), use ASC. The instinct to write DESC for "I want the best first" is the bug — sort direction must match the numeric semantics, not the English semantics.

SQL interview question on string sorting

The probe usually sounds like: "We have a version column of type VARCHAR with values like '1.2.0', '1.10.1', '1.2.3'. The product team complains that ORDER BY version returns 1.10.1 before 1.2.3. Why, and how do you fix it?"

This is the lexicographic-vs-natural-sort trap. SQL string sort is character-by-character on the collation's comparator — '1.10.1' and '1.2.3' are compared character 1 ('1' = '1'), character 2 ('.' = '.'), character 3 ('1' < '2') — so '1.10.1' sorts first.

Solution Using STRING_TO_ARRAY + numeric cast

-- Postgres-native: parse and cast the version string to an int array, sort on the array
SELECT id, version
FROM releases
ORDER BY
    string_to_array(version, '.')::int[],   -- natural-sort
    id ASC;                                 -- unique tie-breaker
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

version (string) string_to_array → int[] sort key
1.2.0 {1, 2, 0} (1, 2, 0)
1.10.1 {1, 10, 1} (1, 10, 1)
1.2.3 {1, 2, 3} (1, 2, 3)
2.0.0 {2, 0, 0} (2, 0, 0)

After the array sort:

  1. (1, 2, 0) — major=1, minor=2, patch=0.
  2. (1, 2, 3) — same major+minor, patch=3 wins over patch=0.
  3. (1, 10, 1) — major=1, minor=10 sorts after minor=2 because integers, not strings.
  4. (2, 0, 0) — different major bucket.

Output:

id version
11 1.2.0
12 1.2.3
19 1.10.1
41 2.0.0

Why this works — concept by concept:

  • Natural-sort vs lexicographic sort — version strings are dotted numeric tuples, not free text. Comparing them character-by-character mis-orders any minor that crosses the 10/100 boundary. Parsing them into integer arrays restores the natural semantics.
  • Array comparison in Postgresint[] arrays compare element-by-element (lexicographic on integers), which is exactly the multi-column tuple comparison you want for (major, minor, patch).
  • Schema-level fix — if you control the schema, store (major INT, minor INT, patch INT) columns separately and ORDER BY major, minor, patch. The query-time string_to_array cast is a workaround for tables you don't own.
  • Cross-engine equivalents — MySQL: ORDER BY CAST(SUBSTRING_INDEX(version, '.', 1) AS UNSIGNED), CAST(SUBSTRING_INDEX(SUBSTRING_INDEX(version, '.', 2), '.', -1) AS UNSIGNED), …. SQL Server: PARSENAME(REPLACE(version, '.', ' '), …). Snowflake / BigQuery: SPLIT(version, '.') then cast each element.
  • Cost — natural sort is O(N log N) on K-element arrays = O(N · K · log N); for typical version strings K ≤ 4, so essentially linear-log.

SQL
Topic — sorting (SQL)
Sorting problems (SQL)

Practice →

Collation and case sensitivity — when "alphabetical" disagrees with itself

ORDER BY name looks innocent, but in production it interacts with the column's collation: the rule the engine uses to compare characters. Two sound-alike rows can sort in a different relative position on different engines, or even on the same engine after a locale upgrade. Knowing the collation surface is the difference between a stable report and a list that subtly reshuffles after the next OS / glibc upgrade.

The collation triad.

  • Character set — the alphabet (ASCII, Latin-1, UTF-8). Determines which bytes are legal.
  • Collation — the comparison rule (C / en_US.utf8 / de_DE.utf8 / ja_JP). Determines comparison order.
  • Case sensitivity — a property of the collation. C is case-sensitive (uppercase before lowercase by ASCII code). en_US.utf8 is locale-aware and dictionary-style (typically case-insensitive on Postgres for ties).

Sorting strings — the four common gotchas.

  • Case order flips by collation. 'Apple' < 'apple' on C, but 'Apple' = 'apple' on a case-insensitive collation.
  • Accent-folding. 'café' < 'cafz' on C (because the é byte is high). On a locale-aware collation, 'café' < 'cafz' may compare like 'cafe' < 'cafz' — same direction but for different reasons.
  • Diacritic order varies by language. Swedish puts 'å' after 'z', German puts 'ä' near 'a'.
  • Numbers as strings. Already covered: '1', '10', '11', '2' lexicographic; cast to integer to fix.

Per-query collation overrides.

Postgres lets you ORDER BY with an explicit collation:

-- Force binary (byte-wise) compare
SELECT name FROM users ORDER BY name COLLATE "C";

-- Force English locale-aware compare
SELECT name FROM users ORDER BY name COLLATE "en_US.utf8";
Enter fullscreen mode Exit fullscreen mode

SQL Server uses COLLATE Latin1_General_BIN or COLLATE Latin1_General_CI_AS. MySQL uses ORDER BY name COLLATE utf8mb4_bin. The principle is identical across engines: collation is a property of the comparison, and it can be overridden per query.

Worked example — case-insensitive sort with deterministic tie-break

Detailed explanation. A dropdown of "tags" must sort case-insensitively ('JavaScript' and 'javascript' should appear next to each other), but the output must be stable across query runs (so a UI snapshot test doesn't flake). Lowercasing inside ORDER BY plus a unique tie-breaker is the cleanest pattern.

Question. Given the tags table, write an ORDER BY that sorts tags case-insensitively in ascending order, with a deterministic tie-break.

Input.

id name
1 JavaScript
2 python
3 Java
4 javascript
5 Python

Code.

SELECT id, name
FROM tags
ORDER BY LOWER(name) ASC, id ASC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. LOWER(name) normalises every row's name to lowercase for comparison purposes only — the SELECT still returns the original mixed-case name.
  2. The sort happens on the lowercased string, so 'JavaScript' and 'javascript' are equal and 'Java' < 'JavaScript' < 'Python' < 'python'.
  3. Wait — 'JavaScript' and 'javascript' both lowercase to 'javascript', so they tie. The id ASC tie-breaker resolves: id=1 (JavaScript) before id=4 (javascript).
  4. The same logic applies to the two Python rows.
  5. A locale-aware alternative is ORDER BY name COLLATE "en_US.utf8" on Postgres or ORDER BY name COLLATE utf8mb4_general_ci on MySQL — same result, different implementation.

Output.

id name
3 Java
1 JavaScript
4 javascript
2 python
5 Python

Rule of thumb. When user-visible text might mix cases, ORDER BY on LOWER() (or a case-insensitive collation) plus a unique tie-breaker. The collation approach is faster (no function call) when an expression index exists; the LOWER() approach is more portable.


3. NULLS FIRST / NULLS LAST across dialects

Where NULL sorts is a dialect decision, and the answer is different on Postgres, MySQL, and SQL Server

NULL is not a value — it's the absence of a value — which means equality comparisons against NULL are themselves NULL (NULL = NULL is not true). For sorting, every engine had to make a pragmatic decision: pick a position. They picked differently, and the choice is part of the SQL personality of each dialect.

Visual dialect-matrix of NULL positioning in ORDER BY — four columns (Postgres, MySQL, SQL Server, Snowflake/BigQuery), two rows (ASC default, DESC default); each cell shows where NULL goes and the supported syntax; a workaround pill (ORDER BY col IS NULL, col) at the bottom; on a light PipeCode card.

The dialect matrix in five bullets.

  • PostgreSQL. Default behaviour: ASC → NULLS LAST; DESC → NULLS FIRST. Explicit NULLS FIRST / NULLS LAST syntax is fully supported. Treats NULL as "infinitely large" by default.
  • MySQL (8.0+). Default behaviour: NULL is treated as smaller than any valueASC puts NULLs first; DESC puts NULLs last. No native NULLS FIRST / NULLS LAST syntax (the recent 8.x line accepts the keywords only on certain function returns, not on plain columns).
  • SQL Server. Default behaviour: NULL is the smallest value → ASC puts NULLs first; DESC puts NULLs last. No NULLS FIRST / NULLS LAST keywords. Use CASE WHEN col IS NULL THEN 1 ELSE 0 END or IIF(col IS NULL, ...) as a workaround.
  • Snowflake / BigQuery. Default behaviour: ASC → NULLS LAST; DESC → NULLS FIRST, with full NULLS FIRST / NULLS LAST syntax. Same direction as Postgres.
  • Oracle / DB2. Default behaviour and syntax match Postgres: ASC → NULLS LAST default; explicit NULLS FIRST / NULLS LAST works.

The universally-portable workaround.

ORDER BY col IS NULL, col — works on every dialect because IS NULL returns a boolean (true / false), which sorts false (0) before true (1). So:

  • ORDER BY col IS NULL, col ASC → non-NULLs ascending, NULLs last.
  • ORDER BY col IS NULL DESC, col DESC → NULLs first, non-NULLs descending.
  • ORDER BY col IS NOT NULL, col DESC → NULLs first, non-NULLs descending (alternate phrasing).

This is the safe pattern for cross-dialect queries, ORM-generated SQL that may run on multiple backends, and migration windows where you're moving from one engine to another.

Common NULL traps in ORDER BY.

  • Joining sparse columns. A LEFT JOIN against payments produces paid_at = NULL for unpaid orders. ORDER BY paid_at DESC on Postgres puts NULLs first — your "most recent payment" report leads with unpaid orders. Fix: ORDER BY paid_at DESC NULLS LAST.
  • NULL in a pagination key. Keyset pagination falls over if any sort-key column can be NULL — the (col, id) < (?, ?) tuple comparison treats NULL specially. Either filter NULLs out, or use COALESCE(col, sentinel).
  • NULL in a window function ORDER BY. ROW_NUMBER() OVER (ORDER BY last_login) numbers users with NULL last_login as either the lowest or highest depending on engine — verify before relying on it.

The portable patterns side-by-side.

Goal Postgres / Snowflake / BigQuery MySQL SQL Server Universal
NULLs last, ascending ORDER BY col ASC NULLS LAST ORDER BY col IS NULL, col ASC ORDER BY CASE WHEN col IS NULL THEN 1 ELSE 0 END, col ASC ORDER BY col IS NULL, col ASC
NULLs first, descending ORDER BY col DESC NULLS FIRST ORDER BY col IS NULL DESC, col DESC ORDER BY CASE WHEN col IS NULL THEN 0 ELSE 1 END, col DESC ORDER BY col IS NULL DESC, col DESC
NULLs first, ascending ORDER BY col ASC NULLS FIRST ORDER BY col ASC (NULL is smallest) ORDER BY col ASC (NULL is smallest) ORDER BY col IS NULL DESC, col ASC
NULLs last, descending ORDER BY col DESC NULLS LAST ORDER BY col DESC (NULL is smallest, so last in DESC) ORDER BY col DESC (same) ORDER BY col IS NULL, col DESC

Worked example — "most recent payment, blanks at the bottom"

Detailed explanation. A reporting query joins orders with payments and shows orders sorted by paid_at descending. Some orders haven't been paid yet, so paid_at IS NULL. On Postgres the default would place those NULL rows first, which is the opposite of what the reviewer wants.

Question. Write the ORDER BY clause so that paid orders appear first, most-recently-paid at the top, and unpaid orders sink to the bottom, on Postgres, MySQL, and as a portable variant.

Input.

id paid_at
1 2026-05-27
2 NULL
3 2026-05-25
4 NULL
5 2026-05-26

Code.

-- Postgres / Snowflake / BigQuery / Oracle — explicit NULLS LAST
SELECT id, paid_at
FROM orders
ORDER BY paid_at DESC NULLS LAST, id ASC;

-- MySQL — no NULLS LAST syntax; use the IS NULL trick
SELECT id, paid_at
FROM orders
ORDER BY paid_at IS NULL, paid_at DESC, id ASC;

-- SQL Server — no NULLS LAST; use a CASE expression
SELECT id, paid_at
FROM orders
ORDER BY
    CASE WHEN paid_at IS NULL THEN 1 ELSE 0 END,
    paid_at DESC,
    id ASC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The Postgres version is the cleanest. paid_at DESC NULLS LAST flips the engine's default (which would put NULLs first in DESC).
  2. The MySQL version uses paid_at IS NULL as a boolean — false (0) sorts before true (1), so non-NULLs come first.
  3. The SQL Server version is functionally identical to MySQL — CASE WHEN col IS NULL THEN 1 ELSE 0 END is a 0/1 sort key prepended to the real sort.
  4. The unique tie-breaker id ASC at the end ensures a deterministic order between any two orders paid on the same day.

Output.

id paid_at position
1 2026-05-27 1 (most recent paid)
5 2026-05-26 2
3 2026-05-25 3
2 NULL 4 (NULLs LAST)
4 NULL 5

Rule of thumb. Whenever you ORDER BY a column that can be NULL, ask the requirement explicitly: "Where do the NULLs go?" — and write NULLS LAST / NULLS FIRST (or the IS NULL workaround) accordingly. Default behaviour is dialect-dependent and is the single most common source of "the report changed when we migrated" bugs.

SQL interview question on NULL positioning across dialects

A senior probe sounds like: "We're moving a report from MySQL to Postgres. The query has ORDER BY last_login DESC LIMIT 100. The product team reports the top of the list has changed. Explain why and write a portable fix."

The cause is the dialect difference: MySQL treats NULL as the smallest value, so last_login DESC puts NULLs last. Postgres treats NULL as the largest by default, so last_login DESC puts NULLs first. Suddenly the top of the list is full of last_login = NULL (i.e. users who never logged in) instead of the recently-active users the report wanted.

Solution Using a portable IS NULL sort prefix

-- Portable form — works identically on Postgres, MySQL, SQL Server, Snowflake, BigQuery
SELECT user_id, last_login
FROM users
ORDER BY
    last_login IS NULL,         -- false (0) first → non-NULLs lead
    last_login DESC,
    user_id ASC                  -- unique tie-breaker
LIMIT 100;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

user_id last_login IS NULL → sort key
7 2026-05-27 09:00 0 (0, 2026-05-27 09:00)
9 NULL 1 (1, NULL)
4 2026-05-27 12:00 0 (0, 2026-05-27 12:00)
2 2026-05-26 08:00 0 (0, 2026-05-26 08:00)
11 NULL 1 (1, NULL)

After sorting by the two-key tuple (IS NULL ASC, last_login DESC, user_id ASC):

  1. All rows with IS NULL = 0 come first, sorted by last_login DESC.
  2. Within those, ties on last_login (none here) would be resolved by user_id ASC.
  3. The two last_login = NULL rows come last, ordered by user_id ASC because all other keys are equal.

Output:

user_id last_login
4 2026-05-27 12:00
7 2026-05-27 09:00
2 2026-05-26 08:00
9 NULL
11 NULL

Why this works — concept by concept:

  • Boolean as sort keyIS NULL returns a boolean; every modern engine sorts booleans as 0/1 (or false/true), and false sorts before true. That gives you a free, portable "NULLs last" prefix.
  • Decoupling NULL placement from value direction — the IS NULL term decides position (last / first); the last_login DESC term decides direction. They no longer fight each other.
  • Deterministic tie-breakuser_id ASC on the end guarantees a stable, repeatable order even across multiple NULL rows.
  • Portability — the same query runs identically on every dialect that supports IS NULL, which is all of them. No more migration surprises.
  • Cost — adding a leading boolean key barely affects sort cost — it's a single byte of comparison per row. The dominant cost is still the index scan or external sort on last_login.

SQL
Topic — NULL handling (SQL)
NULL handling and ordering (SQL)

Practice →

COALESCE as a NULL-aware sort sentinel — when you need NULLs to participate in the comparison

The IS NULL boolean prefix is the cleanest way to position NULLs at one end. The other pattern is COALESCE(col, sentinel) — replacing NULLs with a chosen value so they sort inside the value range rather than at one end. This is useful when "missing" should mean "treat as default value X" (zero, infinity, today, end-of-time).

When to reach for which.

  • NULLs at the end / start (no value semantics). Use IS NULL boolean prefix or NULLS FIRST/LAST. Cleanest.
  • NULLs should behave like a specific value. Use COALESCE(col, sentinel). Lets the NULL "blend in" with the real values.
  • NULL means "earliest possible". COALESCE(start_date, '1900-01-01') — places NULL-start records ahead of everything else.
  • NULL means "latest possible / not yet expired". COALESCE(end_date, '9999-12-31') — places NULL-end records after every real end date.

Example pattern.

-- Sort subscriptions by end_date ascending; NULL means "not expired" → treat as far future
SELECT subscription_id, end_date
FROM subscriptions
ORDER BY COALESCE(end_date, DATE '9999-12-31') ASC, subscription_id ASC;
Enter fullscreen mode Exit fullscreen mode

This is functionally equivalent to ORDER BY end_date ASC NULLS LAST on Postgres, but stays consistent on engines without that syntax. It is also the right pattern when "NULL" carries a domain meaning (active subscription, infinite end date) that you want to make explicit in the sort.

Worked example — sort users by subscription status with a sensible NULL default

Detailed explanation. A reporting view of subscriptions: paying customers first (sorted by amount descending), then trial users (no amount yet, but they have a trial_started_at), then churned customers (canceled_at IS NOT NULL). NULL amount for trials must place them after paying customers, not before.

Question. Sort subscriptions so paying customers appear first by amount descending, then trial customers (NULL amount) by trial_started_at ascending, then canceled customers last by canceled_at descending. Make the final order deterministic.

Input.

id status amount trial_started_at canceled_at
1 paying 99 NULL NULL
2 paying 199 NULL NULL
3 trial NULL 2026-05-25 NULL
4 trial NULL 2026-05-26 NULL
5 canceled NULL NULL 2026-05-20
6 canceled NULL NULL 2026-05-22

Code.

SELECT id, status, amount, trial_started_at, canceled_at
FROM subscriptions
ORDER BY
    CASE status
        WHEN 'paying'   THEN 0
        WHEN 'trial'    THEN 1
        WHEN 'canceled' THEN 2
        ELSE                 3
    END,
    amount DESC NULLS LAST,
    trial_started_at ASC NULLS LAST,
    canceled_at DESC NULLS LAST,
    id ASC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The CASE expression maps the categorical status to a numeric priority bucket. Paying customers (bucket 0) sort before trial customers (bucket 1) and canceled customers (bucket 2).
  2. Within each bucket only one of the remaining keys is meaningful — paying rows have non-NULL amount, trial rows have non-NULL trial_started_at, canceled rows have non-NULL canceled_at. NULLS LAST ensures NULLs in a column don't drift ahead of the meaningful values.
  3. The final id ASC is the deterministic tie-breaker; without it, two paying customers with the same amount would be in undefined order.
  4. On engines without NULLS LAST (MySQL, SQL Server), substitute IS NULL boolean prefix per term — e.g. amount IS NULL, amount DESC — but readability suffers.

Output.

id status amount trial_started_at canceled_at
2 paying 199 NULL NULL
1 paying 99 NULL NULL
3 trial NULL 2026-05-25 NULL
4 trial NULL 2026-05-26 NULL
6 canceled NULL NULL 2026-05-22
5 canceled NULL NULL 2026-05-20

Rule of thumb. When a column has a domain-specific meaning for NULL (active / infinite / unknown / pending), encode that meaning either in a CASE bucket (categorical) or a COALESCE sentinel (continuous). Don't ship a sort that depends on the engine's default NULL position when the data has a real intent.


4. Multi-column sorts and tie-breaking

Multi-column ORDER BY is a cascade — later keys only matter when earlier keys tie

The mental model in one line: each column in the ORDER BY list resolves the ties left by the columns before it. Until ties exist, later columns are irrelevant; once a tie group forms, the next column gets a chance. The full sort is stable across the whole key list as long as the last column produces no ties (i.e. is unique).

Visual diagram of multi-column sort tie-breaking — left a mini-table sorted first by country (ASC), tied rows then sorted by city (ASC), tied rows then sorted by name (ASC); arrows show how ties cascade through keys; a small annotation about pagination + unique tie-breaker; on a light PipeCode card.

The five sort-list patterns every interview probes.

  • Lexicographic on a tuple. ORDER BY country, city, name — strict left-to-right tie resolution.
  • Mixed directions. ORDER BY rating DESC, price ASC, created_at DESC — each column has its own direction; they're independent.
  • Computed expressions. ORDER BY CASE WHEN status='urgent' THEN 0 ELSE 1 END, sla_at — case-driven priority then ascending SLA.
  • Column position. ORDER BY 1, 2 DESC — same semantics, terse, fragile.
  • Stable pagination. ORDER BY created_at DESC, id DESC — always end the key list with a unique column for cursor stability.

Why "stable sort" is not a thing you can rely on.

Computer scientists talk about stable sorts (rows that compare equal preserve their input order) and unstable sorts (no such guarantee). SQL engines pick algorithms based on the planner's cost model — they may use a parallel quicksort (unstable), an external merge sort (stable), or an in-memory sort (depends on implementation). Two consecutive runs of the same query can therefore return tied rows in different orders. The portable fix is to make ties impossible by adding a unique tie-breaker column (typically the primary key) as the last sort term.

Tie-breakers and pagination — the cursor invariant.

Cursor-based pagination keeps a "last seen row's sort key" and queries WHERE (sort_keys) < (cursor). If the sort key list is not unique, the cursor straddles ties — page 2 may re-show rows from page 1, or skip rows entirely. The rule is non-negotiable: the last column in the ORDER BY list must be unique.

Composable case-when patterns.

You can drive arbitrary priority orders with CASE expressions inside the ORDER BY:

-- "Active first, then suspended, then deleted; within each, by latest activity"
ORDER BY
    CASE status
        WHEN 'active' THEN 0
        WHEN 'suspended' THEN 1
        ELSE 2
    END,
    last_active_at DESC,
    user_id ASC;
Enter fullscreen mode Exit fullscreen mode

This is functionally equivalent to mapping status to a numeric priority, then sorting numerically. It composes cleanly with DESC per term, and the engine evaluates the CASE once per row during the sort.

Worked example — three-key cascade with mixed direction

Detailed explanation. A common report: "Top sellers per region, by revenue, with newest-first as the inner tie-break." The natural key is (region, revenue, created_at) — but the directions differ per column: region ascending (alphabetical), revenue descending, created_at descending, and a final unique id to keep things deterministic.

Question. Given the sellers table, write the ORDER BY clause that produces "region ascending, revenue descending, then most recent first, then deterministic by id."

Input.

id region revenue created_at
1 APAC 100 2026-05-10
2 APAC 100 2026-05-12
3 EMEA 200 2026-05-09
4 EMEA 150 2026-05-15
5 APAC 80 2026-05-13

Code.

SELECT id, region, revenue, created_at
FROM sellers
ORDER BY
    region      ASC,
    revenue     DESC,
    created_at  DESC,
    id          ASC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. region ASC produces two buckets: APAC (ids 1, 2, 5) and EMEA (ids 3, 4).
  2. Inside the APAC bucket, revenue DESC ranks revenue=100 above revenue=80. So far the order is APAC/100 (ids 1, 2), APAC/100 (tie), then APAC/80.
  3. The two APAC/100 rows tie on revenue. created_at DESC resolves: id=2 (2026-05-12) before id=1 (2026-05-10).
  4. The final id ASC doesn't change anything in this dataset (no remaining ties) but is the safety net.
  5. EMEA bucket: only two rows, revenue=200 beats revenue=150. No ties.

Output.

position id region revenue created_at
1 2 APAC 100 2026-05-12
2 1 APAC 100 2026-05-10
3 5 APAC 80 2026-05-13
4 3 EMEA 200 2026-05-09
5 4 EMEA 150 2026-05-15

Rule of thumb. Read the requirement aloud; for every "then" in English add a column to the ORDER BY. End with a unique column — there is no exception worth defending.

SQL interview question on stable cursor pagination

A senior probe: "Our infinite-scroll feed sometimes shows the same post twice. The query is SELECT * FROM posts ORDER BY popularity DESC LIMIT 50 OFFSET ?. Walk through why it duplicates and write the keyset-pagination fix."

The duplicate root cause is that popularity is not unique — hundreds of posts share popularity = 500. Between page 1 and page 2 the engine returns the tied block in different orders (because sort is not stable across runs), so a row in position 50 on page 1 can re-appear in position 1 on page 2. The fix is a unique tie-breaker and keyset pagination on the tuple.

Solution Using (popularity, id) keyset cursor

-- Page 1
SELECT id, title, popularity
FROM posts
ORDER BY popularity DESC, id DESC
LIMIT 50;

-- Page N (anchored to the last row of the previous page)
SELECT id, title, popularity
FROM posts
WHERE (popularity, id) < (:cursor_popularity, :cursor_id)
ORDER BY popularity DESC, id DESC
LIMIT 50;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Page Cursor (popularity, id) Predicate Rows returned
1 (none) none top 50 by (popularity DESC, id DESC)
2 (500, 9123) — last row of page 1 (popularity, id) < (500, 9123) next 50 strictly less than the cursor
3 (480, 1734) — last row of page 2 (popularity, id) < (480, 1734) next 50

The tuple comparison (popularity, id) < (P, I) is shorthand for "popularity < P OR (popularity = P AND id < I)" — exactly the lexicographic break the multi-column sort defines.

Output:

Pagination strategy Duplicate / skip risk Deepest-page cost Resilient to inserts?
LIMIT 50 OFFSET 1000 (no tie-breaker) high — ties reorder O(offset + page) no — new rows shift pages
LIMIT 50 OFFSET 1000 + id DESC tie-breaker none O(offset + page) no — same shift
Keyset (popularity, id) < (?, ?) none O(log N + page) yes — cursor anchored

Why this works — concept by concept:

  • Unique tie-breaker — appending id to the sort key collapses every popularity tie group to a strict total order. Two consecutive runs return rows in the same order.
  • Tuple comparison(popularity, id) < (P, I) expresses the lexicographic compare exactly; Postgres rewrites it to use a (popularity DESC, id DESC) index seek in one step.
  • Index alignment — a B-tree on (popularity DESC, id DESC) can serve the query with a single seek to the cursor plus a 50-row forward walk. No sort step, no temp buffer.
  • Cursor stability under inserts — rows inserted after the cursor's popularity / id pair fall to a later page automatically; rows inserted before the cursor would have appeared on earlier pages — they don't retroactively shift the current page boundary.
  • Cost — keyset pagination is O(log N + page_size) regardless of page depth, vs OFFSET pagination's O(offset + page_size) which degrades linearly with depth.

SQL
Topic — pagination
Pagination problems (SQL, keyset cursors)

Practice →

ORDER BY by column position — concise, but treat it like a footgun

ORDER BY 1, 2 DESC references the column position in the SELECT list rather than the column name. It is part of the SQL standard, supported on every modern engine, and saves typing — but it is fragile in production code and a known reviewer red flag in long-lived queries.

Why it tempts people.

  • Short. ORDER BY 1, 2 is shorter than ORDER BY product_category, sales_total.
  • No alias collisions. When SELECT contains computed expressions, positional ORDER BY avoids the need to alias them.
  • Quick exploration. Ad-hoc analytics in a notebook benefit from terseness.

Why production code avoids it.

  • Reordering SELECT silently changes the sort. Someone reorders the SELECT list for readability — ORDER BY 1, 2 now sorts on completely different columns. No syntax error, just wrong results.
  • GROUP BY by position has the same brittleness. Mixing positional GROUP BY and positional ORDER BY multiplies the risk.
  • Linters / formatters often discourage it. SQLFluff defaults to flagging it; sqlformat adds a warning comment.

The two production-acceptable uses are:

  • Computed expressions you do not want to alias. SELECT a + b, c FROM t ORDER BY 1 DESC — the expression is unique to the query, and aliasing it just to reference it in ORDER BY adds noise.
  • Throwaway analytics in a notebook. Anything you wouldn't commit to git.

For long-lived queries, always sort by name or by alias.

Worked example — refactor a fragile positional ORDER BY

Detailed explanation. A reviewer flags the query below for using positional ORDER BY. The risk is real — a future contributor reorders the SELECT list to put region before revenue, and the report silently starts sorting by region first.

Question. Refactor SELECT region, revenue, customers FROM regional_sales ORDER BY 2 DESC, 1 ASC, 3 DESC; to use named columns and add a deterministic tie-breaker.

Input.

region revenue customers
APAC 1200 30
EMEA 900 22
APAC 1200 40
AMER 1200 25
EMEA 900 18

Code.

-- BEFORE (fragile)
SELECT region, revenue, customers
FROM regional_sales
ORDER BY 2 DESC, 1 ASC, 3 DESC;

-- AFTER (refactored)
SELECT region, revenue, customers, id
FROM regional_sales
ORDER BY
    revenue   DESC,
    region    ASC,
    customers DESC,
    id        ASC;     -- unique tie-breaker
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The original ORDER BY references positions 2, 1, 3 — revenue, region, customers. Reordering the SELECT to (revenue, region, customers) would silently change the sort.
  2. The refactor restates the columns by name; the sort is now self-documenting and survives SELECT-list reorderings.
  3. Adding id (the primary key) as the last sort key guarantees deterministic order between rows that tie on all visible columns — important if a downstream consumer (UI snapshot test, CSV diff) depends on row order.
  4. The cost is identical to the positional version; this is a readability and correctness refactor.

Output.

region revenue customers
AMER 1200 25
APAC 1200 40
APAC 1200 30
EMEA 900 22
EMEA 900 18

Rule of thumb. Use positional ORDER BY only when the column it refers to is a one-off expression in the SELECT. For every other case, name the column.


5. ORDER BY plus LIMIT, indexes, and performance

A B-tree index on the ORDER BY column turns "sort the table" into "walk the index"

ORDER BY plus LIMIT is one of the highest-leverage optimisation targets in any OLTP system, because the same query — SELECT * FROM events ORDER BY created_at DESC LIMIT 10 — runs in a millisecond with the right index or several seconds without one at 10 million rows. The difference is whether the planner can satisfy the sort by walking an index in order, or has to read every row and run an external sort.

Visual diagram of ORDER BY + LIMIT performance — left a table without index showing a full-scan + sort cost spike, right the same table with a B-tree index on the ORDER BY column showing index-walk + LIMIT 10 (no sort); a small keyset-pagination annotation; on a light PipeCode card.

The five facts every senior interview wants you to land.

  • A B-tree index physically stores keys in sorted order. Walking the index produces rows in ORDER BY column ASC order with no sort step. Postgres and MySQL can also walk it backward for DESC — same index, same physical structure, both directions.
  • LIMIT pushes down through ORDER BY. The engine stops as soon as it has the top K rows. With an index this is K index-tuples; without one, it's still O(N log K) (a heap-based "top-K sort") rather than a full sort.
  • Multi-column ORDER BY needs a composite index. ORDER BY a, b DESC is best served by an index on (a, b DESC) (Postgres supports per-column direction on indexes; MySQL since 8.0 too). Without the matching composite, the engine falls back to a sort.
  • OFFSET is the silent killer. LIMIT 50 OFFSET 1000000 makes the engine fetch the first 1,000,050 rows, throw away 1,000,000, and return the last 50. Use keyset pagination at scale.
  • Index-only scan beats heap fetch. If SELECT col only references columns covered by the index, Postgres can skip the heap entirely (visibility map permitting). This is the difference between 30 µs and 3 ms per row at scale.

Index direction and the ASC/DESC duality.

A Postgres B-tree on (created_at) stores keys in ascending order. The planner can:

  • Walk the index left-to-right to satisfy ORDER BY created_at ASC (no sort).
  • Walk the index right-to-left to satisfy ORDER BY created_at DESC (no sort, same index).
  • For mixed-direction multi-column sorts (e.g. ORDER BY a ASC, b DESC), the single index (a) is not enough; either build (a ASC, b DESC) or accept a sort step.

MySQL since 8.0 supports descending indexes (CREATE INDEX … (a DESC, b ASC)) — earlier versions ignored the DESC keyword and stored ascending, then walked backward in memory.

Composite-index column order.

For ORDER BY a, b you need the index column order to match the leftmost prefix of the sort key. (a, b) works; (b, a) does not. The same rule governs which queries the index can serve:

Query Index (a, b) Index (b, a)
WHERE a = ? ORDER BY b yes (range walk under leading equality) no — wrong leading column
WHERE b = ? ORDER BY a no yes
ORDER BY a, b yes (full index walk) no
ORDER BY b, a no yes

This is also why "add an index for every column" is bad advice — one composite index typically replaces three single-column ones and serves more query shapes.

The EXPLAIN reading shortcut.

  • See Sort in the plan? — the engine ran a sort. Cost is O(N log N) or O(N log K) for top-K; if N is large, you want an index.
  • See Index Scan with no Sort above it? — the index served the order. The cost line will read like cost=0.43..8.46 rows=10 instead of cost=0..43012 rows=10 (after sort).
  • See Index Only Scan? — the engine didn't even touch the heap. This is the gold standard.
  • See Bitmap Heap Scan under an ORDER BY? — the bitmap path is unordered; a sort step always follows. Bitmap is the wrong path for ORDER BY + LIMIT queries.

LIMIT + OFFSET pitfalls — the production reality.

Pagination via OFFSET scales linearly with depth. The cost at page 1 is the same as the cost at page 1,000,000 minus the offset. For real applications:

  • Page 1–10: OFFSET is fine — the cost difference is invisible.
  • Page 10–100: OFFSET adds noticeable latency (10–100 ms).
  • Page 100+: OFFSET becomes the dominant cost (hundreds of ms).
  • Page 10,000+: OFFSET is unusable. Switch to keyset pagination — it stays O(log N + page) regardless of depth.

Worked example — turn a 3-second sort into a 5 ms index walk

Detailed explanation. A common reality: the "latest 50 orders" endpoint slows linearly with table size. At 100k rows it's fine, at 10M rows it's 3 seconds. The plan shows a sequential scan plus a sort. Adding a B-tree on created_at is a 30-second migration that takes the query to 5 ms.

Question. The query SELECT id, total FROM orders ORDER BY created_at DESC LIMIT 50; runs in 3 seconds at 10M rows. Show the EXPLAIN before, build the right index, and show EXPLAIN after.

Input (schema before).

Column Type
id bigserial PK
customer_id bigint
total numeric(10,2)
created_at timestamptz

No indexes other than the PK on id.

Code.

-- BEFORE: sequential scan + top-K sort
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, total
FROM orders
ORDER BY created_at DESC
LIMIT 50;
--   Limit  (cost=320145..320150 rows=50)
--     ->  Sort  (cost=320145..345128 rows=10000000)
--           Sort Key: created_at DESC
--           ->  Seq Scan on orders  (cost=0..183120 rows=10000000)

-- BUILD: B-tree on created_at (per-column DESC redundant; index is bidirectional)
CREATE INDEX idx_orders_created_at ON orders (created_at);

-- AFTER: index scan, no sort
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, total
FROM orders
ORDER BY created_at DESC
LIMIT 50;
--   Limit  (cost=0.43..2.18 rows=50)
--     ->  Index Scan Backward using idx_orders_created_at on orders  (cost=0.43..350421 rows=10000000)
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The before plan reads every row (Seq Scan, 10M rows), runs an external sort to find the 50 newest, and returns them. Cost dominated by the sort.
  2. CREATE INDEX idx_orders_created_at ON orders (created_at) builds a B-tree sorted by created_at ascending. Build cost is one full table scan + log-N writes.
  3. The after plan walks the index backward (right-to-left) — Postgres prefix Index Scan Backward — collecting the 50 newest entries, then heap-fetches id, total for each. No sort node. Cost drops from 345128 to 2.18.
  4. To eliminate the heap fetch as well (index-only scan), include total in the index: CREATE INDEX … ON orders (created_at, total) or use INCLUDE (total) in Postgres 11+.

Output.

Stage Plan Latency (10M rows)
Before Seq Scan + Top-K Sort ~3,000 ms
After (single-column index) Index Scan Backward + heap fetch ~5 ms
After (covering index) Index Only Scan Backward ~1 ms

Rule of thumb. Any "latest N" / "top N" endpoint on a table > 100k rows should have a B-tree index whose leading column matches the leftmost ORDER BY column. Cover the SELECT list if you want index-only scans.

SQL interview question on a slow-paginated feed

A senior probe: "We have SELECT * FROM posts ORDER BY created_at DESC, id DESC LIMIT 50 OFFSET 100000;. The first page returns in 5 ms; page 2,000 takes 1.5 seconds. The table has an index on created_at. What's happening, and how do you fix it without changing the API contract?"

The OFFSET-aging problem: at OFFSET 100,000 the engine reads 100,050 index entries, heap-fetches each, then discards the first 100,000. The index walk is O(N) in the offset. The fix is keyset pagination on the (created_at, id) tuple plus a matching composite index.

Solution Using a composite (created_at, id) index + keyset cursor

-- One-time: composite index matching the ORDER BY tuple
CREATE INDEX idx_posts_created_at_id ON posts (created_at DESC, id DESC);

-- Replace OFFSET with keyset
SELECT id, title, created_at
FROM posts
WHERE (created_at, id) < (:cursor_created_at, :cursor_id)
ORDER BY created_at DESC, id DESC
LIMIT 50;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Strategy Index Plan Latency at depth 100k
OFFSET 100000 + single-col idx (created_at) (created_at) Index Scan Backward + 100k heap fetches ~1,500 ms
OFFSET 100000 + composite idx (created_at DESC, id DESC) (created_at DESC, id DESC) Index Scan + 100k row skips ~1,200 ms
Keyset (created_at, id) < (?, ?) + composite idx (created_at DESC, id DESC) Index Seek → 50-row forward walk ~3 ms

The crucial change is eliminating the offset, not just adding the index. Even with a perfect index, OFFSET 100,000 walks 100k entries.

Output:

Page depth OFFSET strategy Keyset strategy
1 (depth 0) 3 ms 3 ms
100 (depth 5k) 90 ms 3 ms
1,000 (depth 50k) 700 ms 3 ms
2,000 (depth 100k) 1,500 ms 3 ms
10,000 (depth 500k) 8,000 ms 3 ms

Why this works — concept by concept:

  • Composite index matching the sort tuple(created_at DESC, id DESC) stores keys in the exact order the query wants them. The planner can seek to the cursor and walk forward; no sort step, no heap thrash.
  • Tuple comparison as a single seek(created_at, id) < (C, I) translates to a B-tree seek operation. Postgres, MySQL 8+, DuckDB, and SQLite all support this pattern.
  • Keyset vs OFFSET — OFFSET is "skip N rows the engine has to materialise"; keyset is "jump to the cursor and walk." For depths > 1000 the difference is two to four orders of magnitude.
  • Backward index walk for DESC — Postgres' planner reads the same B-tree right-to-left for descending sorts. You do not need a "DESC" index for descending order on a single column — but for mixed direction (e.g. created_at DESC, id ASC), explicit per-column index direction matters.
  • Stable under inserts — the cursor is anchored to actual values. New rows with created_at > cursor.created_at simply appear on earlier pages (where they belong) without shifting the current page.
  • Cost — keyset pagination = O(log N + page_size) regardless of page depth; OFFSET pagination = O(offset + page_size), which is linear in depth.

SQL
Topic — order by + limit
ORDER BY + LIMIT and indexed sorts (SQL)

Practice →

Mixed-direction sorts and per-column index direction

A B-tree on (a) serves both ORDER BY a ASC and ORDER BY a DESC because the engine can walk the same index in either direction. The story changes the moment your ORDER BY has mixed directions across columns — ORDER BY a ASC, b DESC cannot be served by a single-direction index on (a, b) without an additional sort step.

The single-column case (easy).

  • ORDER BY a ASC ↔ index (a) walked forward.
  • ORDER BY a DESC ↔ index (a) walked backward.
  • No need for a DESC index on a single column.

The multi-column same-direction case (still easy).

  • ORDER BY a ASC, b ASC ↔ index (a, b) walked forward.
  • ORDER BY a DESC, b DESC ↔ index (a, b) walked backward.

The multi-column mixed-direction case (the trap).

  • ORDER BY a ASC, b DESC — no single-direction index (a, b) can produce this physically. The engine either sorts in memory or uses a per-column direction index: CREATE INDEX … ON t (a ASC, b DESC).
  • Postgres has supported per-column direction since 8.3. MySQL since 8.0. SQL Server: yes. Snowflake: indirect (it picks the right micro-partition order). BigQuery: it's a columnar store, sort works regardless.

When to build the directional composite index.

  • Mixed-direction sort is hot on a large table (e.g. a feed).
  • The sort is part of pagination — repeated execution.
  • You see Sort step in the EXPLAIN plan with high cost.

For one-off reports or low-traffic queries, accept the sort step and don't proliferate indexes.

The covering / INCLUDE index trick.

Postgres 11+ supports INCLUDE to append non-key columns to the leaf nodes of a B-tree:

CREATE INDEX idx_orders_created_at_inc
ON orders (created_at)
INCLUDE (total, customer_id);
Enter fullscreen mode Exit fullscreen mode

The index now serves SELECT total, customer_id FROM orders ORDER BY created_at DESC LIMIT 50 as an Index Only Scan — the heap is never touched. The cost drops from "log N seeks + 50 heap fetches" to "log N seeks + zero heap fetches", which is often a 5-10× improvement on cold cache.

Worked example — feed query with mixed-direction sort

Detailed explanation. A social feed sorts posts by pinned DESC, created_at DESC, id DESC. pinned is a boolean flag, mostly false. Without the right index the engine does a full scan + sort; with the right per-column direction index it does an index walk.

Question. A social-network app has a posts table with 50M rows. The feed query is SELECT id, author_id, title, created_at FROM posts ORDER BY pinned DESC, created_at DESC, id DESC LIMIT 30;. Show the index you'd build, and what the EXPLAIN looks like before and after.

Input.

Column Type Cardinality
id bigserial PK 50M
author_id bigint 1M
pinned boolean 0.01% true
created_at timestamptz unique enough
title text n/a

Code.

-- Composite, per-column direction. Postgres 8.3+ / MySQL 8.0+ / SQL Server / DuckDB
CREATE INDEX idx_posts_feed
ON posts (pinned DESC, created_at DESC, id DESC)
INCLUDE (author_id, title);

-- The feed query
SELECT id, author_id, title, created_at
FROM posts
ORDER BY pinned DESC, created_at DESC, id DESC
LIMIT 30;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The index's leading column pinned DESC matches the sort's leading column — true rows are visited first.
  2. The second column created_at DESC is also descending, matching the sort.
  3. The third column id DESC is the tie-breaker. With id being the primary key, the sort tuple is now strictly ordered with no ties.
  4. The INCLUDE (author_id, title) clause appends two read-only columns into the leaf nodes — the SELECT can be served entirely from the index. No heap fetch.
  5. Postgres' plan changes from Sort + Seq Scan + Limit (cost ~ 1.5M) to Index Only Scan + Limit (cost ~ 30).

Output.

Stage Plan Latency (50M rows, cold cache)
Before (no index) Seq Scan + Top-K Sort ~18,000 ms
After (sort-aligned index, key only) Index Scan + 30 heap fetches ~40 ms
After (covering index with INCLUDE) Index Only Scan, no heap ~3 ms

Rule of thumb. For high-traffic feed / leaderboard / chronological-list queries on large tables, build a per-column direction composite index that matches the ORDER BY tuple exactly, and INCLUDE the SELECT-list columns to get an index-only scan. Cost amortises after a few hundred queries.

SQL interview question on a hot leaderboard with frequent ties

A senior probe: "We have a leaderboard SELECT user_id, score FROM leaderboard ORDER BY score DESC LIMIT 100; running 500 times/second. There are 200 million rows. The current index (score) works at low traffic but starts queuing under load. Walk through the index changes and explain what the planner is doing."

The pattern is: composite index (score DESC, user_id ASC) with INCLUDE (display_name) if display_name is in the SELECT, plus keyset pagination for "infinite scroll" requests.

Solution Using a covering composite index + keyset

-- The covering index
CREATE INDEX idx_leaderboard_score
ON leaderboard (score DESC, user_id ASC)
INCLUDE (display_name);

-- Top-100 query
SELECT user_id, score, display_name
FROM leaderboard
ORDER BY score DESC, user_id ASC
LIMIT 100;

-- Keyset pagination (page N anchored to cursor)
SELECT user_id, score, display_name
FROM leaderboard
WHERE (score, user_id) < (:cursor_score, :cursor_user_id)
ORDER BY score DESC, user_id ASC
LIMIT 100;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Request Index path Heap fetches Latency
Top-100 (single-col idx, no INCLUDE) Index Scan Backward on (score) 100 ~3 ms
Top-100 (covering idx with INCLUDE) Index Only Scan on (score DESC, user_id ASC) 0 ~0.4 ms
Page 100 via OFFSET 10000 + covering idx Index Only Scan + 10k row skips 0 ~80 ms
Page 100 via keyset + covering idx Index Seek to cursor + 100-row walk 0 ~0.4 ms

Output:

Strategy Throughput (queries/sec single core) p99 latency
Seq Scan + Top-K Sort (no index) ~5 2,000 ms
(score) index ~3,000 4 ms
Covering (score DESC, user_id ASC) INCLUDE (display_name) ~25,000 0.6 ms
Covering + keyset for pagination ~25,000 0.6 ms across all pages

Why this works — concept by concept:

  • Sort-aligned composite(score DESC, user_id ASC) is exactly the sort tuple. The planner walks the index in order; no sort step, no buffer.
  • Unique tie-breakeruser_id is the primary key; no ties exist on the full tuple. Cursor stability is guaranteed.
  • Covering with INCLUDE — the SELECT-list columns live inside the index leaves, so the heap is never touched. This is the largest single-step optimisation for read-heavy workloads.
  • Index Only Scan visibility — Postgres can only do Index Only Scans if the visibility map says all referenced pages are all-visible. After a heavy update workload run VACUUM to refresh the visibility map.
  • Keyset replaces OFFSET — same covering index, but pagination becomes O(log N + page_size) regardless of depth. Combined with the covering trick, even page 10,000 stays under a millisecond.
  • Cost — read latency = O(log N) seek + O(page_size); index write amplification ≈ 2× (key plus included columns); storage ≈ 1.5× of the original table for this index. Worth it on read-dominated paths.

SQL
Topic — order by + limit
Top-K and leaderboard problems (SQL, indexed sorts)

Practice →


Cheat sheet — ORDER BY recipes

  • Stable pagination (cursor anchor). ORDER BY created_at DESC, id DESC LIMIT 50 — always end with a unique column.
  • NULL-safe sort (portable). ORDER BY col IS NULL, col ASC — false-before-true puts NULLs last on every dialect.
  • NULL-safe sort (Postgres / Snowflake / BigQuery). ORDER BY col ASC NULLS LAST — native and self-documenting.
  • Custom priority order. ORDER BY CASE WHEN status='urgent' THEN 0 WHEN status='normal' THEN 1 ELSE 2 END, created_at DESC — drives non-alphabetical priority sequences.
  • Sort by computed expression. ORDER BY LENGTH(name), ORDER BY revenue / NULLIF(orders, 0) DESC, ORDER BY MD5(id::text) (deterministic shuffle).
  • Multi-key, mixed direction. ORDER BY country ASC, revenue DESC, created_at DESC, id ASC — read every direction explicitly.
  • Descending keyset pagination. WHERE (created_at, id) < (?, ?) ORDER BY created_at DESC, id DESC LIMIT 50 — production pagination on any depth.
  • Ascending keyset pagination. WHERE (created_at, id) > (?, ?) ORDER BY created_at ASC, id ASC LIMIT 50 — symmetric.
  • Index recipe. Build (sort_col_1, sort_col_2, …, unique_tie_breaker) matching the ORDER BY tuple. Add INCLUDE columns to enable index-only scans.
  • Top-K reporting (small K). ORDER BY metric DESC LIMIT 10 is a top-K query — a sort node with a heap of size K is O(N log K); index walks are O(log N + K). Always index the metric.
  • Versioned strings. ORDER BY string_to_array(version, '.')::int[] (Postgres) — natural sort for dotted-numeric versions; mirror for MySQL with SUBSTRING_INDEX, SQL Server with PARSENAME.
  • Window functions need their own ORDER BY. ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) — independent of the outer ORDER BY of the query.

Frequently asked questions

Can I ORDER BY a column that is not in the SELECT list?

Yes, in nearly every modern dialect. ORDER BY runs after SELECT but it has access to every column of every table named in the FROM clause, not just the projected list. SELECT name FROM users ORDER BY created_at DESC is valid. The two exceptions are: (1) SELECT DISTINCT queries, where the ORDER BY can only reference columns that appear in the SELECT (because DISTINCT collapses identical rows and the engine can't see the originals), and (2) queries with GROUP BY, where the ORDER BY can reference only grouped columns or aggregates. Outside those two cases, you can sort by anything in scope.

Why does MySQL not support NULLS LAST?

Historical reasons. MySQL's ORDER BY predates the SQL:2003 standard that introduced explicit NULLS FIRST / NULLS LAST keywords, and the original MySQL behaviour fixed NULL as the smallest value (so ASC puts NULLs first, DESC puts NULLs last). Adding the keywords now would change the meaning of existing queries on some edge cases, so the MySQL team has consistently chosen the portable workaround pattern instead: ORDER BY col IS NULL, col toggles between "NULLs last" and "NULLs first" in a fully ANSI-compatible way that works on every dialect.

Does ORDER BY use an index?

It can, if the index is aligned with the sort key list. A B-tree index on the leftmost ORDER BY column lets the planner walk the index in order and avoid a sort node — that's the "Index Scan" line in EXPLAIN. For multi-column sorts, the index must match the leading prefix of the sort tuple (and per-column direction on engines that support directional indexes). Without a matching index, the engine falls back to an external sort, which scales as O(N log N). For top-K queries with LIMIT, an index brings the cost down from O(N log N) to O(log N + K) — often a 100× to 1000× speedup at scale.

ORDER BY plus LIMIT — what's the cost?

With a matching index, ORDER BY col LIMIT K is O(log N + K) — the planner seeks the start of the sorted range and walks K entries. Without an index, the engine does a top-K sort: it scans every row but only maintains a heap of size K, so the cost is O(N log K) instead of O(N log N). At 10M rows, top-K with K=10 is still ~30M comparisons — fast in memory but several seconds on disk. The lesson: build the index. ORDER BY + LIMIT is the single highest-leverage place to add a B-tree.

Why is my paginated query slow at page 1000?

OFFSET-aging. LIMIT 50 OFFSET 100000 makes the engine fetch 100,050 rows and discard 100,000 of them. Even with a perfect index, the cost grows linearly with the offset — page 1 is 5 ms, page 1000 is 1500 ms. The production fix is keyset pagination: keep a cursor (sort_col, id) from the last row of the previous page and query WHERE (sort_col, id) < (cursor_sort, cursor_id) ORDER BY sort_col DESC, id DESC LIMIT 50. Tuple comparison plus a matching composite index gives O(log N + page_size) regardless of depth.

Does ORDER BY guarantee stable order across runs?

Only if the sort key list is unique. SQL engines pick sort algorithms based on the planner's cost model — they may use a parallel quicksort (unstable), an external merge sort (stable), or an in-memory sort (depends on implementation). Two consecutive runs of the same query can return tied rows in different orders. The portable guarantee comes from making ties impossible: end the ORDER BY list with a unique column (typically the primary key). After that, every run produces the same row sequence, every cursor is well-defined, and every pagination boundary is stable.

Practice on PipeCode

Pipecode.ai is Leetcode for Data Engineering — every ORDER BY concept above ships with hands-on practice rooms where you sort real datasets, debug real NULL placement, and trace real keyset cursors against real B-tree indexes. Start with the sorting library and work outward; PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine.

Practice sorting now →
ORDER BY + LIMIT drills →

Top comments (0)