The Offset Massacre — Why Cursor Pagination is Mandatory (2026)

#sql #offset #cursors

Efficient Pagination: Moving Beyond OFFSET for Scalable Data Retrieval

Many applications rely on pagination to display large datasets, from product catalogs to social media feeds. While the OFFSET and LIMIT clauses are commonly taught for this purpose, they often become a significant performance bottleneck as data volumes grow. This article explores the inherent issues with OFFSET-based pagination and presents a more robust, scalable alternative: cursor-based pagination.

The Hidden Costs of Deep Pagination

Consider a scenario where an automated scraper systematically requests pages from a large product catalog API. As the scraper delves deeper into the dataset, perhaps reaching page=80000 on a table containing 20 million records, the database begins to struggle. A single query for this deep page, intended to retrieve 50 items, might force the database to scan and discard millions of preceding rows before identifying the target subset. This sequential processing, especially under sustained load from multiple requests, can quickly exhaust CPU resources, leading to service degradation or even outages. Such experiences often highlight the critical need to re-evaluate the underlying pagination strategy.

The Performance Bottleneck of OFFSET

The fundamental flaw of OFFSET-based pagination lies in its execution. When a query specifies OFFSET N LIMIT M, the database doesn't magically "jump" to the Nth record. Instead, it typically performs a full scan from the beginning of the sorted result set, processes N records, discards them, and then retrieves the subsequent M records.

This linear scan means that the time taken to retrieve data scales proportionally with the offset value, resulting in O(N) complexity. Accessing the first page might be instantaneous, but retrieving data from page 10,000 in a large table could involve scanning hundreds of thousands or millions of rows. This leads to unacceptable latency, increased CPU utilization, and poor database scalability.

Inconsistent User Experience

Beyond performance, OFFSET pagination introduces significant user experience issues, particularly in dynamic datasets. Imagine browsing a social media feed where new posts are constantly added. If a user views the first page and then requests the "next" page using OFFSET, any new items added before the current offset will shift existing records. This can lead to users seeing duplicate items across pages or, conversely, missing items entirely if records are deleted. This inconsistency stems from the OFFSET value being a fixed numerical position, which becomes unreliable in a rapidly changing data environment.

Leveraging Cursor-Based Pagination

The solution to these challenges is cursor-based pagination. Instead of relying on a numerical offset, this method uses a "bookmark" or "cursor" to mark the last item retrieved. Typically, this cursor is a unique, indexed column like a primary key ID or a timestamp.

When a client requests the next set of data, it provides the cursor value of the last item it saw. The database then leverages its B-Tree index to efficiently locate this specific record and retrieve subsequent items. This approach transforms the lookup from an O(N) linear scan to an O(log N) indexed lookup, providing consistent, fast performance regardless of how deep into the dataset the user navigates.

Practical Implementation Example

Implementing cursor-based pagination is straightforward and doesn't require complex libraries. The core idea is to pass the identifier of the last item from the previous page as a parameter for the next request.

Consider this simplified FastAPI example, demonstrating the pattern:

from fastapi import APIRouter, Query
from typing import List, Optional

router = APIRouter()

# Assume FeedItem is a SQLAlchemy model or similar ORM object
# with an 'id' column that is indexed and ordered.
class FeedItem:
    def __init__(self, id: int, content: str):
        self.id = id
        self.content = content

# Mock database interaction for demonstration purposes
# In a real application, this would be a database query.
_mock_db = [FeedItem(i, f"Item {i}") for i in range(1, 1000001)]

@router.get("/api/v1/feed", response_model=dict)
def get_paginated_feed(
    # For the initial request, last_id can be 0 or None
    last_id: int = Query(0, description="The ID of the last item seen in the previous batch."),
    page_size: int = Query(50, ge=1, le=100)
) -> dict:
    """
    Retrieves a paginated list of feed items using cursor-based pagination.
    """

    # The critical SQL pattern: WHERE id > last_id ORDER BY id ASC LIMIT page_size
    # This leverages the index on 'id' for efficient lookup.

    # Simulate database query:
    # In a real application, this would be an ORM query like:
    # results = session.query(FeedItem).filter(FeedItem.id > last_id).order_by(FeedItem.id.asc()).limit(page_size).all()

    filtered_items = [item for item in _mock_db if item.id > last_id]
    sorted_items = sorted(filtered_items, key=lambda x: x.id) # Ensure order for consistent pagination
    results = sorted_items[:page_size]

    # Determine the cursor for the next request
    next_cursor: Optional[int] = results[-1].id if results else None

    return {
        "data": [{"id": item.id, "content": item.content} for item in results],
        "next_cursor": next_cursor
    }

When a client makes the initial request (e.g., /api/v1/feed), last_id defaults to 0. The server returns the first page_size items and the id of the last item in that batch as next_cursor. For subsequent requests, the client sends /api/v1/feed?last_id={next_cursor_value}, allowing the database to directly locate and retrieve the next set of records without rescanning.

Architectural Trade-offs

While cursor-based pagination offers superior performance and data consistency, it introduces a specific constraint on the user interface: the inability to directly jump to an arbitrary "page number." Since a cursor only points to the next logical item in a sequence, it inherently supports only "next" and "previous" navigation (though "previous" requires careful cursor management, often involving ordering in reverse).

This limitation is why many applications employing cursor pagination, such as social media feeds, opt for an "infinite scroll" UI pattern. This design choice prioritizes backend scalability and responsiveness over random-access navigation, effectively transforming a technical constraint into a seamless user experience.

Verifying Performance Gains

To empirically demonstrate the performance difference, consider a practical experiment. A simple backend application can be set up to simulate both OFFSET and cursor-based pagination against a large dataset (e.g., 1,000,000 records).

When querying a deep "page" using OFFSET (e.g., retrieving items starting at offset 999,950), the execution time will visibly increase, reflecting the database's need to sequentially process and discard nearly a million rows. In contrast, a cursor-based query for the same data, using last_id=999950, will complete almost instantaneously. This stark difference in execution time, often orders of magnitude faster for cursor pagination, directly illustrates the efficiency gained by leveraging database indexes for direct data access.