Introduction
Working with LIMIT
and OFFSET
is one of the easiest way to create a paginated query. However it is not very performance friendly.
A better approach for a paginated query is using "cursor based" approach. In this article I have tried to compare both "cursor based" pagination and "offset based" to give a better understanding of them and find the best approach that fit our project.
What is the "Cursor based" pagination?
In this method instead of using OFFSET
OR SKIP
to scroll data to the last retrieved data a pointer to the last item is used for example user_id
.
Usually this pointer is used as a condition in our query, like $gte
in MongoDB or >
in SQL;
In the following sections of this article, I will implement "cursor based" method in a table with 20000 row (MySQL).
To keep reading time short I will implement Mongodb version in separate article
Preparing test in MySQL
To do a comparison between different methods I will create 20000 row in a table.
Let's start wih creating database
for our test.
CREATE DATABASE benchmark_pagination;
USE benchmark_pagination;
now create a table with 20000 row
CREATE TABLE users (
user_id BINARY(16) PRIMARY KEY DEFAULT (UUID_TO_BIN(UUID())),
first_name VARCHAR(50),
last_name VARCHAR(50),
birth_date DATE,
phone VARCHAR(50),
password VARCHAR(50),
INDEX (phone)
);
INSERT INTO users (first_name, last_name, birth_date, phone, password) values ('Hatty', 'Massenhove', '1998-06-19', '268-515-7621', 'mJOGJ36lAO');
.
.
.
Doing a simple select query for this table will result:
I use SQL_NO_CACHE
to disable query cache layer for testing
SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
ORDER BY user_id ASC
LIMIT 10 OFFSET 10000
STATE | TOTAL_TIME | TIME_PERCENT | CALLS | TIME |
---|---|---|---|---|
Executing | 57.2 ms | 98.53% | 1 | 57.2 ms |
6a48c2b8-ba91-11ea-9d0e-0242ac130004 Hatty Massenhove
6a4ef217-ba91-11ea-9d0e-0242ac130004 Inga Scoyles
6a536ef0-ba91-11ea-9d0e-0242ac130004 Kenyon Della Scala
6a57c3bc-ba91-11ea-9d0e-0242ac130004 Devondra Gosselin
6a5bf956-ba91-11ea-9d0e-0242ac130004 Corney Perring *
6a611955-ba91-11ea-9d0e-0242ac130004 Shina Billo
6a65396f-ba91-11ea-9d0e-0242ac130004 Opalina Kienle
6a6a08ad-ba91-11ea-9d0e-0242ac130004 Riobard Godsmark
6a702fbe-ba91-11ea-9d0e-0242ac130004 Emalee Ferrarini
6a748fba-ba91-11ea-9d0e-0242ac130004 Netta Bellworthy
Look at above results as the base result for next two queries. First with LIMIT
and OFFSET
and second with Cursor Based
method.
We use row with
*
as the last item for better understanding and comparison
First query and results:
SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
ORDER BY user_id ASC
LIMIT 10 OFFSET 10005
STATE | TOTAL_TIME | TIME_PERCENT | CALLS | TIME |
---|---|---|---|---|
Executing | 55.1 ms | 99.47% | 1 | 55.1 ms |
6a611955-ba91-11ea-9d0e-0242ac130004 Shina Billo
6a65396f-ba91-11ea-9d0e-0242ac130004 Opalina Kienle
6a6a08ad-ba91-11ea-9d0e-0242ac130004 Riobard Godsmark
6a702fbe-ba91-11ea-9d0e-0242ac130004 Emalee Ferrarini
6a748fba-ba91-11ea-9d0e-0242ac130004 Netta Bellworthy
6a78f019-ba91-11ea-9d0e-0242ac130004 Cornelia Perocci
6a7f6f72-ba91-11ea-9d0e-0242ac130004 Isabeau Griss
6a83a96e-ba91-11ea-9d0e-0242ac130004 Hubie Dawkes
6a87e9ea-ba91-11ea-9d0e-0242ac130004 Tabor Royson
6a8dfb29-ba91-11ea-9d0e-0242ac130004 Izaak Lombardo
Second query and results:
SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
WHERE user_id > UUID_TO_BIN('6a5bf956-ba91-11ea-9d0e-0242ac130004')
ORDER BY user_id ASC
LIMIT 10
STATE | TOTAL_TIME | TIME_PERCENT | CALLS | TIME |
---|---|---|---|---|
Executing | 7.7 ms | 86.66% | 1 | 7.7 ms |
Things to consider (Unique field)!
It is obvious that using cursor to find last item is significantly better in term of response time but there is a problem, what should I do if I had to ORDER BY first_name
. Actually to do cursor baser
pagination you should use a unique field for sorting, In our case user_id
is a unique field so to do a select query with pagination with ORDER BY first_name
we can do as follow:
Again a simple select query:
SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
ORDER BY first_name ASC, user_id ASC
LIMIT 10 OFFSET 10000
STATE | TOTAL_TIME | TIME_PERCENT | CALLS | TIME |
---|---|---|---|---|
Executing | 59.7 ms | 99.49% | 1 | 59.7 ms |
288fd856-ba91-11ea-9d0e-0242ac130004 Jammie Quigley
308ea632-ba91-11ea-9d0e-0242ac130004 Jammie Petrelli
5290078b-ba91-11ea-9d0e-0242ac130004 Jammie Quigley
57c98456-ba91-11ea-9d0e-0242ac130004 Jammie Petrelli
71daa45d-ba91-11ea-9d0e-0242ac130004 Jammie Quigley *
78f70797-ba91-11ea-9d0e-0242ac130004 Jammie Petrelli
8b085688-ba91-11ea-9d0e-0242ac130004 Jammie Quigley
90709ac7-ba91-11ea-9d0e-0242ac130004 Jammie Petrelli
3955aab9-ba91-11ea-9d0e-0242ac130004 Jan Semechik
6301bd18-ba91-11ea-9d0e-0242ac130004 Jan Semechik
Just like before we use this query's results as the base result for next two queries. First with LIMIT
and OFFSET
and second with Cursor Based
method.
First query and results:
SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
ORDER BY first_name ASC, user_id ASC
LIMIT 10 OFFSET 10005
STATE | TOTAL_TIME | TIME_PERCENT | CALLS | TIME |
---|---|---|---|---|
Executing | 57 ms | 99.39% | 1 | 57 ms |
78f70797-ba91-11ea-9d0e-0242ac130004 Jammie Petrelli
8b085688-ba91-11ea-9d0e-0242ac130004 Jammie Quigley
90709ac7-ba91-11ea-9d0e-0242ac130004 Jammie Petrelli
3955aab9-ba91-11ea-9d0e-0242ac130004 Jan Semechik
6301bd18-ba91-11ea-9d0e-0242ac130004 Jan Semechik
8048650c-ba91-11ea-9d0e-0242ac130004 Jan Semechik
981ed0dc-ba91-11ea-9d0e-0242ac130004 Jan Semechik
25e93d81-ba91-11ea-9d0e-0242ac130004 Jana Whitlow
342acad9-ba91-11ea-9d0e-0242ac130004 Jana Aronstein
4e2e9f89-ba91-11ea-9d0e-0242ac130004 Jana Whitlow
Second query and results:
SELECT SQL_NO_CACHE BIN_TO_UUID(user_id) AS user_id,first_name,last_name
FROM users
WHERE
first_name >= 'Jammie'
AND (
user_id > UUID_TO_BIN('71daa45d-ba91-11ea-9d0e-0242ac130004')
OR
first_name > 'Jammie'
)
ORDER BY first_name ASC, user_id ASC
LIMIT 10
STATE | TOTAL_TIME | TIME_PERCENT | CALLS | TIME |
---|---|---|---|---|
Executing | 10.7 ms | 97.47% | 1 | 10.7 ms |
78f70797-ba91-11ea-9d0e-0242ac130004 Jammie Petrelli
8b085688-ba91-11ea-9d0e-0242ac130004 Jammie Quigley
90709ac7-ba91-11ea-9d0e-0242ac130004 Jammie Petrelli
3955aab9-ba91-11ea-9d0e-0242ac130004 Jan Semechik
6301bd18-ba91-11ea-9d0e-0242ac130004 Jan Semechik
8048650c-ba91-11ea-9d0e-0242ac130004 Jan Semechik
981ed0dc-ba91-11ea-9d0e-0242ac130004 Jan Semechik
25e93d81-ba91-11ea-9d0e-0242ac130004 Jana Whitlow
342acad9-ba91-11ea-9d0e-0242ac130004 Jana Aronstein
4e2e9f89-ba91-11ea-9d0e-0242ac130004 Jana Whitlow
Note
So basically using cursor based
pagination in many sceneries is much faster than LIMIT/OFFSET
pagination, but it is not good for all circumstances.
Some time the number of field that you have to sort
is big that makes cursor query complex and some time your table has a few row and has a small growth rate.
In these cases using simple codes are good enough.
Conclusion
This article supposed to be a notice that there are better ways to do simple things with better performance. In real project you may face with some serious data or giant tables so it's good to search before using those simple ways.
I hope this article could help people like me to learn about new topic;
Top comments (1)
Implementing the
cursor based
pagination is much harder than the offset version but the improvement in performance is unreal. Just for a table with 20000 rows, it improved the performance for upto 80%I think it is really useful to implement these queries for a few use cases which require a lot of pagination but doesn't need lots of sorting.