DEV Community

Why You Shouldn't Use OFFSET and LIMIT For Your Pagination

Ivo Pereira on June 16, 2020

This post was originally posted on ivopereira.net. Gone are the days when we wouldn’t need to worry about database performance optimization. W...

Read full post

Aman Vishnani • Jun 17 '20 • Edited

This won't work if I want to sort the rows by a field name (attribute) which is not an ID / Primary Key.

SELECT id, First_Name
from users
WHERE ID > ? // <- this will fail if I don't sort by ID
ORDER by First_name asc
LIMIT 2;

Sergiy Yevtushenko • Jun 17 '20

Algorithm remains the same, except instead of ID we should remember from last record the key by which sorting was performed.

leob • Jun 17 '20 • Edited

But then you also need to have an index on that key (column), otherwise it's back to a full table scan again.

But, the whole point of the blog post is a good one - signalling the fact that a table scan is being performed when using OFFSET.

I wasn't aware of that, probably I was unconsciously assuming (like everyone else) that using OFFSET and LIMIT is somehow magically "performant". And (apparently) that's not the case.

Sergiy Yevtushenko • Jun 17 '20

In general case optimization of the query is based on indexes (and a lot of other other information) anyway. The approach described in post relies on the primary key indexing, but when ordering does not match primary key ordering, then appropriate index is necessary.

Sam Ferree • Jun 16 '20

This is a neat trick when Id’s are incrementing integers, and maybe that’s not a base thing to do in some cases. But ahem:

“ insertions and deletes may happen meanwhile and we don’t outdated data!”

(Laughs in eventually consistent, highly indexed query models)

Consistency is an illusion. Strong Consistency doubly so.

Ivo Pereira • Jun 16 '20

Excellent point on consistency and that quote @sam_ferree !

The point I have made is referring to the benefits in an exact comparison of this cursor-based approach with the traditional SQL approach to this situation.

Would you suggest anything different here?

Sam Ferree • Jun 16 '20

Nope paging by ID is pretty nifty in this use-case

Ben Sinclair • Jun 17 '20

Is it lunch time already?

Saurabh Sharma • Jun 17 '20

IMO offset based paginations have wierd consistency problems

what if someone added a new record, so the offset is shifted by 1 (if sorted by new), and you will get 1 record twice.

Ben Sinclair • Jun 17 '20

People are used to that in pagination. Go to ebay and sort by "ending soonest" ad page through. As you go, items will disappear off the top of the results (because they... ended...) and flipping back through pages gets messy. I don't think that's unacceptable.
The same with sharing a link to page 1234 of some results - you know that by the time the recipient opens it, it'll probably be different, so if it's necessary, a "share results page" function could be built to handle it.

Saurabh Sharma • Jun 17 '20

It would be better if they dont have to

Tolga Bilbey • Jun 17 '20

We have developed a Django pagination app addressing this performance issue. It utilizes primary keys and stores the values in cache. Also, cache timeout can be set to invalidate this cache. You can check this url: github.com/bilbeyt/django-fast-pag...

Abdisalan • Jun 16 '20

What a coincidence, I literally published the same article today!

I like your presentation more though :)

Hugh Jeremy • Jun 17 '20

Why would the database engine perform a full table scan on a limit/offset where the limit/offset factor is indexed? Can you show the query planner output for a query that actually exhibits this behaviour? Perhaps this behaviour is specific to the database engine you are using? I run massive limit/offset pagination queries in Postgres, where the limit/offset columns are indexed, and the performance is fantastic. The query planner is adamant that no table scans are occurring.

Hugh Jeremy • Jun 17 '20

Follow up thought - Are you perhaps doing something funky like limiting/offsetting inside or after a common table expression? Maybe the query planner is dropping the ball at a CTE boundary? That's the most common way I end up blowing my leg off, performance wise... 😭😂

George • Oct 8 '20

Because most popular db engines doesn't a do full table scan, it already has optimizations for limit/offset.

This is a typical low quality article, not researched well enough and it was written for the sake of content...

Mateusz Jarzyna • Jun 16 '20

How to implement the pagination when ID is a UUID?

Sergiy Yevtushenko • Jun 17 '20

There should be no difference. The only problem you may meet is that UUID does not provide any consistent ordering, so output will look rather random. If this is a problem, then you may consider alternative unique ID, for example ULID.

Ivo Pereira • Jun 16 '20

Hi @mateusz ,

I would see that in the same category as the caveat I've described in the post. Having an auto-incremented ID as a key would provide you a much more efficient solution that trying to wrap the strategy around a UUID.

It would be possible to use a workaround solution using a created_at in combination with a UUID, but I would highly advise against that, as for the performance hit you will be having, you would be much better sorting by ID.

Kacper Perzankowski • Jul 5 '20

I thought the same. In my opinion, incremental PKs are rarely used in mature systems.

Michiel Hendriks • Jun 17 '20

What if you do this?

SELECT * FROM table_name ORDER BY id LIMIT 10 OFFSET 8000001;

With PostgreSQL it will use the index to get your offset, no table scan needed.

It works for any b-tree based indexed column. So if you had an index on a column called first_name, you can do:

SELECT * FROM table_name ORDER BY first_name LIMIT 10 OFFSET 8000001;

zakwillis • Jun 18 '20

Hi there thanks for the post.

Strangely, many developers have never had to deal with index based pagination. I have, but many haven't.

The main challenge with this approach is, almost never will a primary key/incrementing column be continuous after any kind of filtering, normally scope related. A good example would be an online trading account. A user wants to see their trades?
So you may end up having to use an analytic function (rownumber over) to create the user trade Id. Requires full resultset availability? This is how offset works?
Alternatively this would need to be a scope based guaranteed continuously incremented value? This becomes a design time choice?
Where this works is if you store a daily table of data which all users will access with different pagination. Then this is powerful.

Good Post and the other guy who wrote something similar.