Data Pagination Techniques (3 Part Series)
Server-side pagination is a commonly-used feature in SQL databases. It helps when showing a huge set of results on user interfaces, it's required in RESTful APIs, and it comes to the rescue whenever you need to process large data in bulk and it doesn't fit in memory. The problem is that if it's done wrong it can be as inefficient as loading the full set or more.
There is a number of ways to implement pagination with SQL systems. I will go through the methods in this article with the advantages, disadvantages, and the reasonable scenarios of using each one.
The requested query is passed as it is without pagination. After the client gets the full set of data, it divides the data and shows it paginated to the user.
It reduces the number of requests to the database. The data can even be cached on the client-side to avoid future requests which makes later loads even faster.
Not suitable for regularly changing data. The dataset should be small to fit in memory and not painfully slow the initial load.
Small data sets which are not regularly updated and needed frequently such as the categories.
offset are standard SQL keywords and the first solution that comes to mind for pagination over datasets. The query syntax would look something like this:
SELECT * FROM products ORDER BY sale_date DESC WHERE sold = TRUE LIMIT 10 OFFSET 30;
Easy to implement which made many famous ORM solution use it. It's as easy as chaining the query function like this:
It also allows you to filter the table while pagination with the
WHERE statement. You can sort on any column and it still works. Which makes the query very customizable. Not the whole data is loaded into memory which means no running out of memory.
It can be HORRIBLE. The query performance goes downhill as the offset value increases. Let's say you are fetching page
n. How would offset skip the first
n-1? It wouldn't. It has to linearly scan the table for the first
n-1 and then load page
An offset means that a certain number of records will be skipped from the start, but what if new data where inserted in page
n-1 while page
n is being loaded? rows from page
n-1 will be pushed into page
n causing inconsistency, and worst; if the data is being updated by the paginating process, rows might be processed multiple times causing data inconsistency.
Great for user interfaces, easy to implement and very customizable to different filter/order preferences. As long as the deepness of the search results is limited. Perfect for pages where users paginate down the results with scrolling. Have this functionality on a large dataset with a "last page" button showing up on the interface and embrace yourself for the a database server out of service.
For migrations and large dataset processing, my advice: Offset the
SQL cursors makes the server do the pagination for you. All what you need to worry about is the query you want to paginate. It looks something like this:
-- Create a cursor for the query and open it DECLARE cur CURSOR FOR SELECT * FROM products; OPEN curEmail; -- Retrieve ten rows each time FETCH 10 FROM cur; FETCH 10 FROM cur; -- All done CLOSE cur;
Supports arbitrary queries. No performance drop when going further in the pages.
the implementation details is different from an SQL engine to the other but in general they held resources on the server and create locks. Clients can't share cursors and thus they would have to open each it's own which mean they can't share the same pagination.
One client which needs to paginate over large-sets of data and cares about pagination consistency. There are many types that can suite different applications:
READ_ONLY which avoids locks on the table,
STATIC which copies the result into a temporary table which is good when updates doesn't matter,
KEYSET which only copies the primary keys to provide you with updates but not with new rows,
DYNAMIC same as
KEYSET but primary keys copy is updated so it can also see newly inserted and deleted rows. ...etc. the availability of these solutions depends on the engine.
This technique is my favourite for data migrations. You need an indexed column to use it but that introduces great optimization while staying stateless on the server side.
The first page is fetched with the following statement:
SELECT * FROM products ORDER BY id ASC LIMIT 1000;
For the second page, the query uses the maximum
id value fetched in the first page. Let's say it's 1000, the query is as follows:
SELECT * FROM products WHERE id > 1000 ORDER BY id ASC LIMIT 1000;
An so on for the next pages.
Offers great scalability: It's as fast for the 1,000,000th page as for the first page. Also pagination consistency is preserved. It supports filtering and ordering on any column or multiple columns as long as you have indices on them.
In general, there is no way to jump to a specific page. However, If you have an auto incremented identifier in the dataset, that can be done with some simple calculations. To retrieve page
n, the lower bound for the query would be like this:
n * 1000 where 1000 is the page size or the limit.
Mostly any one. Very convenient for scalable applications where a lot of server requests are expected.