Mohamed Idris

Posted on May 10

Offset Pagination Step by Step (with Sharding)

#nestjs #pagination #database #beginners

Why pagination exists

Imagine a workers table with 1,000,000 rows. A request like GET /workers that returns all of them would:

send a huge JSON payload over the wire
crash the browser trying to render it
hammer the database

The fix is simple: send a small slice at a time. That slice is called a page.

That is all pagination is. We will build the rest from this idea.

Cute mental model

Think of a deck of 100 cards. You cannot hand someone the whole deck and ask them to find one card fast. Instead you say:

"Take 10 cards at a time. When you finish, ask me for the next 10."

10 cards = page size
"which group of 10" = page number
"ask me for the next 10" = next page link

That is the whole game.

Step 1: the two numbers we always need

To grab a slice from a database, every ORM needs two things:

take — how many rows to return (the slice size)
skip — how many rows to ignore from the start

In SQL these are LIMIT and OFFSET.

SELECT * FROM workers ORDER BY id ASC LIMIT 10 OFFSET 20;

That says: skip 20 rows, then give me 10. That is page 3 if each page has 10 rows.

Step 2: where the famous formula comes from

We do not want clients sending skip and take directly. That is awkward. Clients think in page numbers: "give me page 1, page 2, page 3".

So we accept a page number and convert it to skip/take ourselves.

If page size is 10:

Page	Skip	Take
1	0	10
2	10	10
3	20	10
4	30	10

See the pattern? skip = (page - 1) * size.

That is the formula in our codebase:

return {
  take: page.size,
  skip: (page.num - 1) * page.size,
  // ...
};

Why `(page - 1)` and not just `page`?

Because OFFSET means "how many rows to skip", not "which row index to start at". Page 1 should skip nothing (offset 0). Page 2 should skip one page worth of rows (offset 10). Page 3, two pages (offset 20). So we subtract 1 before multiplying.

If we used page * size instead, page 1 would skip 10 rows and the user would never see the first 10 records. The first page would be unreachable. The smallest valid offset is 0 (skip nothing). It is not 1, even though page numbers start at 1.

Step 3: how does the client know there is a next page?

Two common answers:

Option A — return a total count. "There are 237 workers. You are on page 3 of 24."
Pros: client can render Page 3 / 24.
Cons: requires a COUNT(*) query on every request, which is slow on big tables.

Option B — just tell them if a next page exists. Return a next link or null.
Pros: cheap, simple, scales.
Cons: cannot show "page 3 of 24" in the UI.

This codebase picks Option B. The response shape is:

{
  "data": [ ... 10 workers ... ],
  "links": { "next": "https://api.example.com/workers?page=2" }
}

When next is missing, you have hit the end.

Step 4: how do we know if a next page exists?

The trick: peek. Run a second query that looks at the next page. If it has at least one row, send a next link. If not, end.

So every request runs two SQL queries:

-- 1. Fetch the page the client asked for (page 2, size 10)
SELECT * FROM workers ORDER BY id ASC LIMIT 10 OFFSET 10;

-- 2. Peek at the next page (page 3 — count its rows)
SELECT COUNT(*) FROM workers ORDER BY id ASC LIMIT 10 OFFSET 20;

If query 2 returns > 0, page 3 has data → send links.next = ?page=3.
If it returns 0, we are at the end → no next link.

A worked example

Table has 23 rows, page size 10. Client asks for page 2:

Query	What it returns
`LIMIT 10 OFFSET 10`	rows 11–20 (10 rows, sent to client)
Peek `OFFSET 20` count	3 → next page exists

Client follows the link to page 3:

Query	What it returns
`LIMIT 10 OFFSET 20`	rows 21–23 (3 rows, sent to client)
Peek `OFFSET 30` count	0 → end

Why not just check "did I get a full page back?"

A common shortcut: if you asked for 10 and got 10, assume there is more; if you got fewer than 10, you are done. It works except when the total is an exact multiple of the page size — with 30 rows, page 3 returns exactly 10, the shortcut says "probably more", client requests page 4 and gets an empty list. One wasted round trip per such case. The peek query avoids that.

A cheaper peek

You do not actually need to count all rows on the next page. You only need to know if at least one exists:

SELECT EXISTS (
  SELECT 1 FROM workers ORDER BY id ASC LIMIT 1 OFFSET 20
);

This stops at the first match. Faster than COUNT(*), especially on big tables. The codebase uses COUNT, which is fine for small page sizes — but EXISTS + LIMIT 1 is the production-grade version.

Step 5: the page object

Instead of passing (num, size, shard) everywhere, we wrap them in a single object:

interface Page {
  num: number;
  size: number;
  shard?: number;
}

And a tiny helper builds it with safe defaults:

export function getPage(pageNum?: number, shard?: number): Page {
  return {
    num: pageNum ? pageNum : FIRST_PAGE,        // default to page 1
    size: PAGE_SIZE,                             // fixed at 10
    shard: shard !== undefined ? shard : DEFAULT_SHARD,
  };
}

If the client sends nothing, they get page 1, size 10, shard 0. Nice and forgiving.

Step 6: parsing the request (the NestJS decorator)

Clients send pagination as query params: ?page=2&shard=0. We turn that into a Page object once, in one place:

export const PaginationPage = createParamDecorator((_data, ctx) => {
  const request = ctx.switchToHttp().getRequest();
  const page = parseOptionalInt(request.query.page);
  const shard = parseOptionalInt(request.query.shard);
  return getPage(page, shard);
});

Now any controller can do:

async get(@PaginationPage() page: Page) { ... }

No manual parsing in every handler. Very clean.

Step 7: building the next link

Once we know the next page exists, we build a URL the client can call directly. Important detail: keep all the other query params the user sent (filters, sorting), only change pagination.

const url = new URL(`${request.protocol}://${request.get("Host")}${request.originalUrl}`);
const searchParams = new URLSearchParams(url.search);

searchParams.set("page", nextPage.num.toString());
if (nextPage.shard !== undefined) {
  searchParams.set("shard", nextPage.shard.toString());
}

searchParams.set overwrites just those keys. If the original URL was /workers?location=NY&page=1, the next link becomes /workers?location=NY&page=2. The filter survives.

This pattern is called HATEOAS: the server tells the client where to go next, instead of the client guessing the URL shape.

Step 8: now the twist — sharding

Heads up: if sharding is new to you, the basics above are already enough to understand pagination. This step is a bonus that explains the extra logic in this codebase. You can skim it on the first read and come back later.

Sharding means splitting one big table into smaller logical buckets. Each row has a shard column (0, 1, 2, ...). Queries always filter by one shard.

Why? On gigantic tables it spreads load and lets you query smaller subsets. In this project shards are limited to MAX_SHARDS = 10.

The pagination has to walk through shard 0 first, then shard 1, then shard 2... When shard 0 is exhausted, jump to shard 1 page 1. That is the second half of getNextPage:

// no more rows in current shard, try next shard
const nextShard = (currentPage.shard ?? DEFAULT_SHARD) + 1;

if (nextShard > MAX_SHARDS) {
  return undefined;       // we have walked all shards, truly done
}

const pageInNextShard = getPage(FIRST_PAGE, nextShard);
const countInNextShard = await countOnPage(pageInNextShard, ...);

if (countInNextShard > 0) {
  return pageInNextShard;
}

return undefined;

Reading top to bottom:

Try the next page in the same shard. Has data? Return it.
Otherwise, jump to page 1 of the next shard. Has data? Return it.
Otherwise, end.

A picture

shard 0:  [page 1] -> [page 2] -> [page 3] -> done in this shard
                                                      |
                                                      v
shard 1:  [page 1] -> [page 2] -> done in this shard
                                          |
                                          v
shard 2:  [page 1] -> ...

The client never sees this complexity. They just keep following links.next.

Step 9: putting it all together

The full request flow for GET /workers?page=2:

1. Decorator parses ?page=2 into a Page object { num: 2, size: 10, shard: 0 }
2. Service calls queryParameters(page) -> { skip: 10, take: 10, where: { shard: 0 } }
3. Prisma runs SELECT ... LIMIT 10 OFFSET 10 WHERE shard = 0
4. Service calls getNextPage(...) which counts the next slice
5. Controller maps rows to DTOs and builds the next link
6. Client gets { data: [...], links: { next: "...?page=3&shard=0" } }

Every piece has one job. That is why each function looks small.

Useful notes you should not forget

Always include ORDER BY in paginated queries. Without it, databases can return rows in any order, and the same row could appear on two pages or be skipped. The codebase uses orderBy: { id: "asc" } for this reason.
Page size should be capped on the server, or clients can abuse it. A request like ?size=1000000 is the abuse: in one call, the server has to ask the DB for a million rows (heavy query, hogs a connection), hold them in memory (RAM spike), serialize them to JSON (CPU spike), and send them over the wire (bandwidth spike). One client doing that occasionally hurts. A few clients doing it on purpose is a denial-of-service against your own API.

Note that page does not carry the same risk. Whether the client asks for page=1 or page=500, the response size stays the same (one page worth). A deep OFFSET makes the query slow, but the server will not run out of memory. size is the dangerous knob — it controls how much work happens per request — so it must be bounded. Two common defenses: hard-code it (this codebase uses PAGE_SIZE = 10), or accept ?size=N from the client but clamp to a max (e.g. 100).

Default to page 1 if the param is missing or invalid. Be forgiving.
Skipping is O(N). OFFSET 100000 makes the database scan and discard 100,000 rows. That is fine for small offsets, painful for huge ones. See the next section.

When offset pagination is the wrong choice

You will hit two problems eventually:

Problem 1: deep pages are slow. OFFSET 1000000 LIMIT 10 makes the database walk through a million rows just to throw them away.

Problem 2: shifting data. If a row is inserted while the user paginates, page boundaries shift. They might see the same row twice or miss one.

The fix for both is cursor pagination: instead of "page 2", the client sends "give me 10 rows after id=42". The query becomes WHERE id > 42 ORDER BY id LIMIT 10, which uses an index and is fast no matter how deep you go.

You give up the ability to jump to "page 47" directly. You can only go forward (and sometimes backward). For infinite-scroll feeds this is perfect. For admin tables with page numbers, offset pagination is fine.

This project uses offset pagination because the page sizes are small and the use case suits it. Knowing the alternative is gold in interviews.

Recap in one screen

Pagination = serve big lists in small slices.
Formula: skip = (page - 1) * size, take = size.
Page numbers are 1-based (page 1 is the first page), but OFFSET is a count of rows to skip. Page 1 must skip 0 rows. That is why we subtract 1 before multiplying.
Always order results.
Return a next link instead of a total count when you do not need page numbers in the UI.
To know if next exists, peek at the next slice with a count query.
Sharding adds an outer loop: walk pages within a shard, then jump to the next shard.
For very large datasets or live feeds, switch to cursor pagination.