Joud Awad

Posted on May 30

24/30 Days System Design Questions

#systemdesign #softwareengineering #softwaredevelopment #database

Your API returns 10 million orders.
The frontend asks for "page 5."

Offset 40, limit 10. Simple enough — until your DBA messages you at 2 AM.

Query time: 4.2 seconds. Table scans climbing. Prod is lagging.

Here's the stack: PostgreSQL, 50M rows in orders, sorted by created_at DESC. Users can filter by status. You're paginating an admin dashboard used by 200 concurrent support agents.

Offset pagination worked fine at 10K rows. At 50M, it's reading and discarding 40 million rows just to return 10.

You need to fix this before the next sprint ships the customer-facing version — which will have 10x the traffic.

Here's your setup:

• Table: orders — 50M rows, indexed on created_at, user_id, status
• Query: sorted by created_at DESC, filtered by status
• Client: needs "previous/next" navigation + jump-to-page
• SLA: p99 < 200ms

Four engineers propose four different approaches:

A) Cursor pagination — encode the last seen created_at + id into a token, use it as a WHERE clause anchor
B) Keyset pagination — paginate using WHERE (created_at, id) < (last_created_at, last_id) directly
C) Deferred join — fetch the IDs first with offset, then join back to get full rows
D) Keep offset pagination but add a covering index on (status, created_at, id)

All four will improve query performance. Only one actually solves the problem at scale without breaking the UX your dashboard needs.

Pick one — A, B, C, or D — and tell me why. I'll drop the full breakdown in the comments (including why one of these is a senior engineer trap that looks right but falls apart under a specific constraint hiding in plain sight).

If your team is arguing about pagination strategy, this is the breakdown they need. Share it with them.

Drop your answer 👇

30DaysOfSystemDesign #SystemDesign #PostgreSQL #BackendEngineering

Top comments (4)

Joud Awad • May 30

Option C — Deferred join

JOIN ( SELECT id FROM orders WHERE status = 'pending' ORDER BY created_at DESC OFFSET 500000 LIMIT 10 ) ids ON o.id = ids.id

This is a legitimate optimization for offset pagination. The inner query fetches only IDs (fits in index, much faster), then the outer join fetches full rows for just 10 records.

It's real. It works. Shopify uses a version of this.

But: you still pay O(offset) cost on the inner query at deep pages. It helps — it does not solve. At page 50,000, you're still scanning 500,000 index entries. You shaved 80% of the cost and kept 20% of the problem.

Joud Awad • May 30

Option A — Cursor pagination ✅ CORRECT

Encode the last seen (created_at, id) into an opaque token. On the next request, decode it and use WHERE created_at < :last_ts OR (created_at = :last_ts AND id < :last_id) as your anchor.

Why this wins here:

• Query always hits the index at the anchor point — no scan, no discard
• p99 stays flat whether you're on page 1 or page 50,000
• Works with the status filter: WHERE status = 'pending' AND (created_at, id) < (anchor)
• The cursor is position-stable: new rows inserted before your cursor don't shift your page

The UX tradeoff: no jump-to-page. Only previous/next. But this is the right tradeoff for the dashboard. Support agents navigate sequentially. Nobody needs "jump to page 3,847."

At 50M rows with concurrent filters, cursor pagination is the only approach that hits p99 < 200ms reliably.

Joud Awad • May 30 • Edited

Option B — Keyset pagination (senior engineer trap ⚠️)

Keyset looks identical to cursor on the surface. The difference is subtle: keyset is a query pattern, cursor is a product decision.

Keyset uses WHERE (created_at, id) < (val1, val2) as a SQL construct. Cursor wraps that into an encoded token the client can't inspect or manipulate.

Here's where it falls apart: the status filter breaks row-tuple comparison in most query planners. WHERE status = 'pending' AND (created_at, id) < (anchor) doesn't use a composite index efficiently in PostgreSQL when the equality column (status) isn't part of the tuple comparison. You end up with a partial index scan that degrades under high cardinality filters.

Cursor pagination handles this by encoding the full WHERE context, not just the positional anchor. That's the gap. It's subtle, and it's the thing that bites you in production three months after you ship it....

Joud Awad • May 30

Option D — Covering index

Adding CREATE INDEX ON orders(status, created_at DESC, id DESC) is a good idea regardless of which approach you pick. It removes heap fetches for status-filtered queries.

But an index doesn't change the algorithmic complexity of OFFSET. It makes your table scans faster, not shorter. This is the "throw hardware at it" answer. It buys you time, not a solution.