This article was originally published in my Personal Blog
Traditional pagination existed for many years. But it’s not useful for platforms with huge amount of user generated content. Luckily, there is an alternative solution for such systems.
Traditional pagination
Before we jump into the solution, let’s look at traditional pagination and it’s drawbacks. Let’s say we have a database table for forum posts that looks like this:
| ID | Title | Content | LastActiveDate |
| --- | --------- | ------------ | ------------------- |
| 1 | Some post | Some content | 2022-11-26T14:00:00 |
In this case, LastActiveDate
—indicates the date when the post was last active (i.e., a comment was published).
We want to display a front page with all the posts sorted by LastActiveDate
. Since we don’t know how much data we can have, we decide on an arbitrary page size of 10. And then, in order to display the posts on a particular page, we will use the following query:
SELECT * FROM posts
ORDER BY LastActiveDate
LIMIT ${pageSize}
OFFSET ${(currentPage - 1) * pageSize}
The math is very simple. Given a specific page, say 2, we will select all items starting from offset (2 - 1) * 10 = 10
, limiting the selection to 10 elements, and the result will be all the items in group of [10,20)
.
Usually, with traditional pagination, we also need to have another query in order to generate the visual pagination itself. And for that we need to know the total amount of pages we have, which can be done by:
const totalItems = await sql("SELECT count(ID) FROM posts") as number;
const totalPages = Math.ceil(totalItems / pageSize);
Drawbacks
This pagination style, however, has one major drawback. Let’s say I’ve navigated to page 1 of the forum and then decided to go make some tea. After coming back and seeing that no posts on the first page are of interest to me, I navigate to the second page and find out that it includes 2 posts from the first page.
This happens because during the time I was making and drinking my tea, some older posts that were not among the 10 posts on the first page — were updated, hence changing their LastActiveDate
to be the most recent. Upon navigating to the second page, I now see 10 more posts, but because of the shift in order, 2 posts from the first 10 — were pushed into the second 10 on the second page.
Usually, on platforms with low update rates, like a small forum — this drawback is not that critical. Occasionally users will see content on the next page, which appeared on the previous page — but the frequency during which it happens — is low. We can increase the page size to say 15, 20, 25, or even 30 posts per page — and since forums often have activity concentrated in few popular or new posts on the first page, the chance of seeing the same data on next pages — is small, and we can live with that.
However, for high throughput platforms like Twitter, or machine-to-machine streaming services — this case is unacceptable. Image navigating Twitter only to see the same content over and over again, because the rate at which content is created on Twitter is rapid, and thus content from the previous page is always pushed down to the next page.
In order to solve this problem, let me introduce — Cursor based pagination.
Cursor based pagination
Let’s say we want our forum pagination to never display data from previous pages. When a user navigates to the second page — she will always see data that suppose to be on that page. This can be achieved with cursor based pagination.
How cursor based pagination works
In cursor based pagination, we do not have logical pages. Instead, each page returns the data that suppose to be displayed on that page as well as a token to retrieve the next page. The token is usually opaque, and for the end user it has no meaning — but our system knows to read it and generate the next page based on it.
This brings me to the first drawback of cursor based pagination:
[!attention]
Cursor based pagination is forward. It does not support going back (i.e. navigating from second page back to first, or from third back to second); nor it supports skipping pages (i.e. going to page 5 from page 2).
On a platform like forum, this might not be very useful. Forums are typically meant to be browsed, so users always can jump to a specific page, navigate backwards and forwards, or event jump to the last page. However, — on platforms with infinite scrolling navigation, like feeds, this solution is perfect.
Cursor based pagination example
Let’s say our product manager came up with the terrible idea of turning our 90s looking forum with traditional pagination, to 2000s looking forum with infinite scroll feed. We’ve tried to warn him that this is a bad idea, but he insisted that users wanted it. And so let’s turn our forums’ traditional pagination — to cursor based instead.
First, we need to select the cursor. Since forum posts are often sorted in chronological order from newest to oldest post, it makes sense to use the LastActiveDate
as a cursor. Hence, the retrieval of a particular pages’ data can be done like this:
async function getPosts(cursor: null | Date) {
const posts = await table("posts").select("*").orderBy("LastActiveDate", "DESC").limit(PAGE_SIZE).where(q => {
if(cursor) {
q.where("LastActiveDate", "<", cursor);
}
});
return {posts, nextPageCursor: posts.length ? posts[posts.length - 1]["LastActiveDate"] : null}
}
Disclaimer: The query builder is a pseudocode, but can be achieved with tools like KnexJS
For every call, we fetch the first PAGE_SIZE
posts and provide a cursor to the next page, which is equal to the LastActiveDate
of the last post in our current batch (or null
if no posts exist). If the cursor
is null
— we start from the first page.
Since our posts are always sorted in chronologically descending order, our pagination becomes consistent.
Let’s try to do a visualization. Let’s imagine we have the following database:
| ID | Title | …. | LastActiveDate |
| --- | ------- | --- | ------------------- |
| 1 | Title 1 | | 2022-01-10T00:00:00 |
| 2 | Title 2 | | 2022-01-09T00:00:00 |
| 3 | Title 3 | | 2022-01-08T00:00:00 |
| 4 | Title 4 | | 2022-01-07T00:00:00 |
| 5 | Title 5 | | 2022-01-06T00:00:00 |
[!disclaimer]
I’ve intentionally sorted the posts byLastActiveDate
to easier visualize the process. Obviously, the data does not need to be sorted in advance, since SQLsSORT BY LastActiveDate DESC
handles the sorting.
For the sake of simplicity, let’s set our page size to 2. Upon first execution, we will get posts 1 and 2 and our nextPageCursor
will be equal to 2022-01-09T00:00:00
(which is the LastActiveDate
of the second post).
As we would like to retrieve the next page, we will provide the nextPageCursor
to our function, and since the query looks at posts older than 2022-01-09T00:00:00
we will get two more posts — IDs 3 and 4.
Now — let's come back to our tea scenario. Imagine I’m looking at the forum's first page with posts 1 and 2, and before clicking next, I go to drink tea. During that time, someone posts a new comment in post ID 3 — thus altering its LastActiveDate
to be today (let’s say today is 2022-01-11T00:00:00
). The table will now look like this:
| ID | Title | …. | LastActiveDate |
| --- | ------- | --- | ------------------- |
| 3 | Title 3 | | 2022-01-11T00:00:00 |
| 1 | Title 1 | | 2022-01-10T00:00:00 |
| 2 | Title 2 | | 2022-01-09T00:00:00 |
| 4 | Title 4 | | 2022-01-07T00:00:00 |
| 5 | Title 5 | | 2022-01-06T00:00:00 |
When I come back to the computer, I click next page. In our traditional navigation — my action would have returned posts with IDs 2 and 4, thus “leaking” the second post from the first page to the second.
Cursor based navigation, with the cursor of 2022-01-09T00:00:00
, would however return posts with IDs 4 and 5 as expected. Notice how post ID 3 is skipped and will never be seen by me, unless I start over from page 1. It’s important to point out that this flaw is not a problem of the cursor based pagination — but rather a problem with the way we’ve implemented cursor based pagination. Different use cases — have different implementations. It is possible to implement cursor base pagination which takes into account newer content as well. It, however, makes little sense to do so in a context of a forum, since in forums — going forward in pagination — means seeing older content.
As I’ve said before, we do lose some functionality — namely: going backwards (from page 2 to page 1 for example) and skipping pages (from page 2 to page 7 for example). So this approach is not useable for browsing content in a paginated manner, but it is a good approach to implement infinite scrolling without encountering duplicate content.
Bonus—don’t use dates and or timestamps as cursor
I could have ended the article here, but I’m pretty sure if I did so — I’d get criticized for using date as a cursor. And this will be a well deserved criticism.
You see — the entire point of cursor based navigation — is to create a consistent stream of data without repetitions. Yes — you give away some features, but in exchange you get fresh data for each page. As I’ve said — some systems might benefit from it.
However, — by using dates as cursors — you run into the same problem. You can encounter content repetition. Let’s assume we have a high traffic forum with a very active community. Let’s look at a subset of such forums posts table:
| ID | Title | …. | LastActiveDate |
| --- | ------- | --- | ------------------- |
| 675 | Title 1 | … | 2022-01-10T12:31:22 |
| 123 | Title 2 | … | 2022-01-10T12:30:48 |
| 534 | Title 3 | … | 2022-01-10T12:30:23 |
| 301 | Title 4 | …| 2022-01-10T12:30:23 |
| 231 | Title 5 | …| 2022-01-10T12:30:23 |
| 945 | Title 6 | …| 2022-01-10T12:30:17 |
Disclaimer: as earlier, the data is sorted by LastActiveDate
for demonstration purposes only.
For the sake of example, let’s say our page size is 2. And since we are looking at a subset of data, let’s say we hold a nextPageCursor
of 2022-01-10T12:31:22
. What will be the result of the invocation of getPosts("2022-01-10T12:31:22");
? If you’ve read through the article and understood it well, you would be correct to answer: posts 123 and 534. And what will be the nextPageCursor
value for that invocation? The correct answer is 2022-01-10T12:30:23
.
Now, here’s the problem. Let’s say you want to navigate to the next page using nextPageCursor = '2022-01-10T12:30:23'
. What would be the result? That’s true — post ID 945. Since the query we’ve written earlier, translates to:
SELECT * FROM posts
ORDER BY LastActiveDate DESC
WHERE LastActiveDate < '2022-01-10T12:30:23'
LIMIT 2
We are skipping posts ID 301 and ID 231. The hasty among you might just run and replace <
to <=
, but this introduces another issue. Now, instead of getting post ID 945 you get posts ID 534 and ID 301. So not only you are getting post ID 534 again, your nextPageCursor
is set to 2022-01-10T12:30:23
so you're essentially stuck in an endless loop (unless someone decides to update one of the posts).
Dates, although provide chronological order, are not providers of uniqueness. In a system with high rate of content, you might end up with collisions. And I know what you are thinking about: I’ll just add milliseconds! While milliseconds might help you to prevent this issue in human generated content (although there is still a chance of collision, depending on the traffic and the nature of the system) — they are not a solution for machine generated content in systems with high rate of content generation. There are other solutions, like content versioning, to provide uniqueness of the cursor.
Also keep in mind that cursors do not have to be bound to one specific dimension. I once designed a system where the cursor was a JSON that held multiple vectors for data retrieval. This JSON was then base64
encoded and was used as a cursor for navigation. So you could combine multiple fields, and any other data you would want, inside the cursor. As long as you can pass the cursor in the URL and your system can decode it — it’s a viable solution.
Top comments (0)