Discussion on: The SQL I Love. Efficient pagination of a table with 100M records

View post

Replies for: Viach, Thanks for the prompt reply! I like the caching approach, in any case. It would further improve performance, since the user reveals their ...

I can think about the following approach:

We go thru the dataset from the very first record and split it into pages with a predefined size. The goal of the process is to have user_id boundaries as you mentioned before. Example for a page size of 10 000:

page 1, user_ids: 1 - 13 122
page 2, user_ids: 13 125 - 23 421
page 3, user_ids: 23 423 - 35 008
page 4, ...

You may notice that the difference between user_ids in boundaries is more than 10 000 - this is because we can have gaps (deleted users).

When you need to render all users for a page #3, you look into the cache and user the user_id boundaries for the purpose.

But when a user on page #3 is deleted - you need to recalculate the cache for all pages after this one. There is no reason to do that for pages #1 and #2 since boundaries for the users there are unchanged.

Sorry for the delay with the response this time.