What is Pagination?
Pagination comes into the picture when you have a large dataset to be sent to the user and you choose to send it in chunks instead of sending all of it in a single response. It is pretty common that you would find its implementation in most of the sites that you visit on a daily basis.
There are two types of paginations that gets implemented.
Offset based pagination
Let's take dev.to for our 1st type example.
When you open dev.to
, in the feed, you see a list of posts sorted by some filter. This is the request that fetches the posts,
https://dev.to/search/feed_content?per_page=15&page=1&sort_by=hotness_score&sort_direction=desc&approved=&class_name=Article
There aren't page numbers on dev.to
similar to the ones you find on Amazon product listing page or a google search page.
The reason is that dev.to
has implemented an infinite scrolling wherein the next set of items are fetched in the background periodically to give the user a seamless experience.
When you scroll the feed continuously, these are the requests that are sent in the background.
Only two parameters change for every request in this type of pagination
- per_page - number of items required per page
- page - current page number
In the backend, the query would look something similar to this, (made it simple to understand the query better)
select * from posts order by rank limit 15 offset 0;
This is for the first page.
For any page, the variable that changes is the offset.
offset = current_page_number * number of items per page
The query for different pages would look like this,
select * from posts order by rank limit 15 offset 750; #page50
select * from posts order by rank limit 15 offset 775; #page51
select * from posts order by rank limit 15 offset 1500; #page100
There are a few important caveats of using this approach,
Performance Hit:
Before the explanation, hit these APIs on the browser directly and observe the time taken for each.
Request for page 1
https://dev.to/search/feed_content?per_page=15&page=1&sort_by=hotness_score&sort_direction=desc&approved=&class_name=Article
Request for page 650
https://dev.to/search/feed_content?per_page=15&page=650&sort_by=hotness_score&sort_direction=desc&approved=&class_name=Article
You should feel a slight delay in the second response.
Now you would also get an idea why some of your pagination APIs are taking more time as the page number increases.
Explanation:
If you look at the offset for 650th page, it is 9750.
What this essentially means is that to pick 15 items for the 650th page, MySQL has to pick 9750 records from the table and discard the rest of 9735 records one by one.
Imagine a use case where the offset is even higher, let's say 100,000. In this case, MySQL has to discard 99975 records to pick just 15 records and this happens on each request.
There has to be an efficient way to pick these 15 records right?
Blind Pagination
Since it relies on the limit and offset, this has a chance of showing duplicate data and skipping some data.
- Skipping some data
Let's say the user has seen the response of first two pages(30 posts), now consider I delete my post that was earlier on the first page.
Now when the posts are fetched for the third page, the user wouldn't notice any change but behind the scenes,
select * from posts order by rank limit 15 offset 30;
The post which was at 31st position would have now gone to the 30th position in the table since I deleted my post which was at 7th position.
As a result, the user would never be able to see this post unless he refreshes the page.
- Duplicate data
The same applies when a post jumps from 31st position to 30th position and now the 30th position post which the user has seen already will be shown again at the 31st position.
Having said these, offset based pagination is easy to implement and gives the flexibility to the user to jump to any specific page.
If you're using Ruby, there are a lot of gems that help you do offset based pagination within a few minutes. Here a few kaminari, will_paginate,pagy.
Cursor based pagination
Let's take Twitter feed for this example.
https://twitter.com/i/api/2/timeline/home.json?&cursor=HBbM%2Fv%2FnttbR2iQAAA%3D%3D&lca=true&ext=mediaStats%2ChighlightedLabel&pc=0
The cursor is the key parameter in this approach. The client receives a variable called cursor with the response. The cursor is a pointer to a specific item which is to be sent with the following request. The server uses the cursor to fetch the next set of items.
Here's an example of the response.
{
"posts": [...],
"next_cursor": "123456", # the post id of the next item
}
For example, this cursor could be the id of the first element in the next dataset. The simplified query,
select * from posts where id >= 123456 limit 15;
Furthermore, if the visited page is the last page, then the next_cursor will be empty.
{
"posts": [...],
"next_cursor": ""
}
The advantage of this approach is that MySQL picks only the required records (15 records) from the database.
Performance-wise, this approach gives the result in a constant time O(limit) irrespective of the page number.
Consider for the 10 millionth request
offset based approach - O(limit+offset) = O(15+10M) MySQL picks 10M records from the database and discards nearly everything.
cursor based approach - O(limit) = O(15) MySQL picks only 15 records
Look at the performance uplift. Cursor based approach is mighty impressive but there are a few things that stop developers from using it without a second thought.
It is not possible to jump to a random page in this approach as the server relies on the cursor to fetch the records.
This approach is not straightforward like offset and is trickier to implement sometimes.
The cursor must be based on a unique or sequential column in the table and pagination goes for a toss if the edge cases aren't tested.
There aren't a lot of gems/libraries that support the cursor-based approach compared to the offset based approach. So development time increases as well.
pagy-cursor, paging-cursor support cursor based pagination.
Cursor based pagination is an outright better option over offset based pagination in terms of performance but the decision to choose the approach depends on the use case and the kind of impact the pagination is going to have on the product itself. For much simpler paginations with comparatively less dataset, one might still prefer offset based pagination.
What is more important is to, not choose a particular pagination blindly. Instead, understand the trade-offs between the two and choose the one that suits the most.
Latest comments (2)
Good article! Thanks for sharing it. For what I understood the "Duplicate Data" and "Skipping some data" problems are common in every pagination technique. The fact that is not highlighted into the cursor strategy is because it is based in the ID that is a sequence. If for instance, we use the rank field do we have the same problem?
Yes, that is correct. We might face the same problem.
That's why I have quoted this,