SEN LLC

Posted on Apr 15

The Four Pagination Strategies Every Backend Dev Should Know

#php #slim #api #tutorial

The Four Pagination Strategies Every Backend Dev Should Know

I got tired of arguing about pagination in code review, so I wrote a single PHP service that exposes the same dataset through all four strategies side by side. Then I wrote an endpoint that inserts one row and hands you back the exact row that would be counted twice.

Every CRUD API eventually hits pagination. Every team rediscovers the same trade-offs the hard way. And every backend tutorial I've ever read shows offset pagination and calls it done, which is roughly like teaching SELECT * and calling it database design.

This is a post about the four pagination strategies that actually exist — offset, page-number, keyset, and cursor — and when each one is correct. It's backed by a small PHP service called pagination-demo that implements all four over the same in-memory dataset of 1000 rows, so you can curl them against each other and see the behaviour instead of just reading about it.

📦 GitHub: https://github.com/sen-ltd/pagination-demo

The problem in one paragraph

You have a list endpoint. Real users will have tens of thousands of rows in it. You can't return them all in one response. You need to let clients ask for a slice. How you do that has three consequences: how the client API feels, whether the database will scale, and whether the results are correct when rows are being inserted concurrently. Most pagination discussions only mention the first. The other two are where things go wrong.

The four strategies

Here's the trade-offs table. We'll earn each row by the end of the article.

Strategy	URL shape	Random-access	Mutation-safe	Cost per page
Offset	`?page=3&per_page=20`	yes	no	`O(N)` scan
Page	`?page=3&per_page=20` + `Link:`	yes	no	`O(N)` scan
Keyset	`?last_id=42&per_page=20`	no	yes	`O(log N)` seek
Cursor	`?cursor=eyJpZCI6NDJ9`	no	yes	`O(log N)` seek

Two things jump out immediately: the two "easy" strategies are unsafe, and the two "safe" strategies give up random access. That's not an accident. It's the entire essay.

Strategy 1: Offset pagination

public function paginate(Dataset $dataset, array $params): array
{
    $perPage = self::clampPerPage((int) ($params['per_page'] ?? self::DEFAULT_PER_PAGE));
    $page    = max(1, (int) ($params['page'] ?? 1));
    $offset  = ($page - 1) * $perPage;

    $all   = $dataset->allById();
    $total = count($all);
    $items = array_slice($all, $offset, $perPage);

    return [
        'items'      => $items,
        'pagination' => [
            'strategy'    => 'offset',
            'page'        => $page,
            'per_page'    => $perPage,
            'offset'      => $offset,
            'total'       => $total,
            'total_pages' => $total === 0 ? 0 : (int) ceil($total / $perPage),
        ],
    ];
}

This is the one you already know. In real SQL it's LIMIT 20 OFFSET 40. The client says "give me page 3", the server does the multiplication, everyone goes home.

It has two problems, and both of them are genuinely bad.

Problem one: OFFSET N is O(N). Every database I know of implements OFFSET 40 as "read forty rows, throw them away, then start reading". Page 500 of a million-row table forces the database to materialize half a million rows it's going to immediately discard. If you have ever seen a background job slow down dramatically as it deepens through pagination of an export — that is this.

Problem two: concurrent writes shift the results. This is the one nobody mentions in the tutorial. Here's the failure mode:

Client asks for ?page=1&per_page=20. Gets rows 1..20 (newest first).
A new row lands on top.
Client asks for ?page=2&per_page=20. Gets rows 20..39 — which used to be rows 19..38.

The row that was at position 19 on page 1 is now at position 0 on page 2. The client walks through all the pages, and one unfortunate row appears on both page 1 and page 2. If the client is a feed reader, the user sees a duplicate. If the client is an export job that writes rows to a file, the file contains duplicates. If the client is a worker that processes rows and marks them done, you process the same row twice and hope the downstream work is idempotent.

I built a demo endpoint in pagination-demo that does this on purpose. GET /demo/mutation-hazard takes a DESC-by-id snapshot of page 1, inserts one new row at the top, re-fetches page 1 and page 2, and returns JSON that shows you the exact row that has been shifted from "end of old page 1" to "start of new page 2". You can go poke it after you run the container.

Strategy 2: Page-number pagination

Page-number pagination is offset pagination with a bow on it. The URL is a bit friendlier, sometimes there's a Link: header (RFC 5988) so clients don't have to build their own next/prev URLs, but the semantics are identical. Same O(N) scan. Same mutation hazard.

$jsonResp = $jsonResponse($response, $result, 200);
$linkParts = [];
foreach ($result['links'] as $rel => $state) {
    $q = http_build_query([
        'page'     => $state['page'],
        'per_page' => $state['per_page'],
    ]);
    $linkParts[] = sprintf('<%s?%s>; rel="%s"', $base, $q, $rel);
}
$jsonResp = $jsonResp->withHeader('Link', implode(', ', $linkParts));

I keep it as a separate strategy in the demo because the article's point is that page and offset should be judged together. Every team I've seen adopt page-based pagination thinks they've upgraded away from the offset hazards. They have not. They've made the URL prettier.

The one real upgrade here is the Link header itself, which is genuinely useful: it means the client library can just walk rel="next" until it runs out, instead of guessing URL shapes. But this is a quality-of-life improvement on top of a broken foundation.

Strategy 3: Keyset pagination

Keyset pagination — sometimes called seek pagination — is the one most backend engineers underuse. The client doesn't say "page 3". The client says "give me the next 20 rows after the id I last saw". In SQL it's:

SELECT * FROM rows WHERE id > :last_id ORDER BY id LIMIT :per_page

And the implementation in pagination-demo:

public function paginate(Dataset $dataset, array $params): array
{
    $perPage = OffsetStrategy::clampPerPage((int) ($params['per_page'] ?? self::DEFAULT_PER_PAGE));
    $lastId  = max(0, (int) ($params['last_id'] ?? 0));

    $all   = $dataset->allById();
    $taken = [];
    foreach ($all as $row) {
        if ($row['id'] <= $lastId) {
            continue;
        }
        $taken[] = $row;
        if (count($taken) >= $perPage) {
            break;
        }
    }

    $nextLastId = count($taken) === $perPage
        ? $taken[count($taken) - 1]['id']
        : null;

    return [
        'items'      => $taken,
        'pagination' => ['strategy' => 'keyset', 'per_page' => $perPage, 'last_id' => $lastId],
        'next'       => $nextLastId === null ? null : ['last_id' => $nextLastId],
    ];
}

What does this buy you?

It's fast. WHERE id > :last_id ORDER BY id LIMIT 20 is a single btree seek on an indexed column. It is O(log N) no matter how deep into the feed you are. Page 1 and page 500 take the same time. The offset version's gradient of "slower the deeper you go" disappears completely.

It's correct under writes. New rows get inserted with a larger id than anything the client has seen. The query "rows with id > 10 limit 20" will never accidentally return the same row twice, no matter how many rows you insert at the top while the client is paginating. The row the client saw yesterday is still at id=10, today's new rows are at id=1001 upward, and the client just asks for "after 10" and gets 11..30 like nothing happened.

But you give up random access. A client cannot jump to "page 5". This is the trade-off. You have to walk the pages in order. For most list endpoints — feeds, exports, background workers, notification inboxes — that's exactly what the client wants anyway. For a classic admin grid with a "jump to page 87" dropdown, keyset is a bad fit.

You also need a column that is indexed and monotonically increasing. Auto-increment primary keys work perfectly. created_at alone does not — two rows can share a millisecond timestamp and your pagination ends up skipping or duplicating. You need a tiebreaker. Which leads us to…

Strategy 4: Cursor pagination

Cursor pagination is keyset pagination with the "where am I?" state encoded into an opaque base64 blob. Same query, same behaviour. The interesting parts are:

The client no longer knows which columns are the sort key. It just hands the server back the string the server handed it.
The cursor can carry a composite position like (created_at, id) so you can sort by a non-unique column with a deterministic tiebreak.
Because the cursor is opaque, you can change the sort columns on the server later without breaking existing clients — new cursors start using the new shape.

The encode/decode in pagination-demo:

public static function encodeCursor(array $state): string
{
    $json = json_encode($state, JSON_UNESCAPED_SLASHES);
    $b64  = base64_encode($json);
    // base64url, because cursors get pasted into query strings
    // where `+` and `/` get mangled.
    return rtrim(strtr($b64, '+/', '-_'), '=');
}

public static function decodeCursor(string $cursor): ?array
{
    if ($cursor === '') {
        return null;
    }
    $padded = strtr($cursor, '-_', '+/');
    $pad    = strlen($padded) % 4;
    if ($pad !== 0) {
        $padded .= str_repeat('=', 4 - $pad);
    }
    $json = base64_decode($padded, true);
    if ($json === false) {
        throw new InvalidCursorException('invalid base64');
    }
    $data = json_decode($json, true);
    if (!is_array($data) || !isset($data['id'], $data['created_at'])) {
        throw new InvalidCursorException('missing cursor fields');
    }
    return ['id' => (int) $data['id'], 'created_at' => (string) $data['created_at']];
}

Three things here are worth calling out.

base64url, not base64. Cursors get pasted into query strings. Regular base64 uses + and /, which then have to be URL-encoded, which then get mangled by various middleboxes that also think they should URL-decode once more. Using - and _ instead sidesteps the whole mess. Strip the = padding too while you're at it.

Tamper handling. The decoder returns null for an empty cursor (meaning "start at the beginning") but throws for anything malformed. In the Slim route that wraps it, I catch the exception and return a 400 with {"error": "invalid_cursor"}. Don't silently treat a garbled cursor as "start over" — the client will paginate the entire dataset twice and wonder why their job took forever.

Composite sort key. In the demo I sort by (created_at DESC, id DESC) instead of just id. The cursor carries both fields, and the paginate loop uses a composite comparison so ties on created_at break on id. This is the general pattern: if your sort column isn't unique, the cursor must carry enough columns to uniquely identify a row.

The recommendation table

Here's what I actually tell teams. This is opinionated.

Use case	Strategy	Why
Feed, timeline, notification inbox	Keyset or Cursor	Users scroll forward, new rows arrive, offset would double-count
Export / dump / background worker	Keyset	Deep pagination, mutation safety both matter
Admin grid with "jump to page N" UX	Page (with `Link` header)	Users need random access, row churn is low, and you control the UX
Search results over changing data	Cursor	Composite sort, opaque state lets you change the sort later
Anything over a relational DB bigger than ~10k rows	Not offset	You will be sorry

"Don't use offset" is too strong for small tables and admin-facing tools where nothing is being written during pagination. But for any list that grows without bound and has concurrent writers, offset is a liability you will be debugging eventually.

Try it in 30 seconds

git clone https://github.com/sen-ltd/pagination-demo
cd pagination-demo
docker build -t pagination-demo .
docker run --rm -p 8000:8000 pagination-demo

Then in another terminal:

# Offset — easy but unsafe
curl -s "http://localhost:8000/offset?page=1&per_page=5"

# Keyset — give me rows after id=0
curl -s "http://localhost:8000/keyset?last_id=0&per_page=5"

# Cursor — opaque state, composite sort
curl -s "http://localhost:8000/cursor?per_page=5"

# Follow the cursor
FIRST=$(curl -s "http://localhost:8000/cursor?per_page=5")
NEXT=$(echo "$FIRST" | php -r 'echo json_decode(stream_get_contents(STDIN), true)["next"]["cursor"];')
curl -s "http://localhost:8000/cursor?per_page=5&cursor=$NEXT"

# Page — with RFC 5988 Link header
curl -sI "http://localhost:8000/page?page=2&per_page=5" | grep -i ^link

# The mutation hazard — inserts a row, shows the double-count
curl -s "http://localhost:8000/demo/mutation-hazard"

The mutation hazard endpoint is the one I'd run first. It returns JSON that literally tells you "this row you saw at position 19 on the first fetch is now at position 0 on the second fetch, after one row got inserted at the top". Once you've seen it in a real response body you will stop forgetting this property.

Closing

Pagination is one of those topics where the standard answer is wrong just often enough that teams keep rediscovering the same bugs. The fix isn't exotic: keyset pagination is just WHERE id > :last_id LIMIT 20. It's easier to implement than offset in some ways, because you don't need a total count. The real obstacle is that every tutorial teaches the broken version first, and by the time you hit the bugs in production you've already shipped ?page= in your API contract.

If you're designing a new list endpoint today: default to keyset. Reach for cursor if you need a composite sort. Reach for page-based only if users genuinely need to jump around, and accept the mutation hazard. Don't reach for raw offset at all unless the table is small enough that you'd also be comfortable returning the whole thing.

The full source is on GitHub under MIT — 39 PHPUnit tests, a multi-stage alpine Dockerfile that ships at 52 MB, no external dependencies beyond Slim and its PSR-7 bridge. If you've been meaning to write a "pagination explained" post for your juniors, feel free to point them at this one or steal the demo endpoint.

Top comments (1)

Rahul Joshi • Apr 21

This is an excellent breakdown—the 'mutation hazard' demo is such a powerful way to illustrate why offset pagination fails in high-concurrency environments, and it's a great reminder that keyset/cursor strategies are often the superior choice for scalable APIs.