DEV Community

Discussion on: The SQL I Love. Efficient pagination of a table with 100M records

Collapse
argysamo profile image
Argyrios Samourkasidis • Edited

Viach,
Thanks for the prompt reply!

I like the caching approach, in any case.
It would further improve performance, since the user reveals their intentions when they submit the first query. Then, the backend partitions the result into pages. (we actually considered this caching technique here!).

I didn't put much thought on dataset immutability, though. Indeed, it seems to be an important factor.

What do you mean by caching though? Caching the user_id boundaries (i.e. the first for every page), or caching all the pages?

Thread Thread
backendandbbq profile image
Viach Kakovskyi Author

I can think about the following approach:

We go thru the dataset from the very first record and split it into pages with a predefined size. The goal of the process is to have user_id boundaries as you mentioned before. Example for a page size of 10 000:

  • page 1, user_ids: 1 - 13 122
  • page 2, user_ids: 13 125 - 23 421
  • page 3, user_ids: 23 423 - 35 008
  • page 4, ...

You may notice that the difference between user_ids in boundaries is more than 10 000 - this is because we can have gaps (deleted users).

When you need to render all users for a page #3, you look into the cache and user the user_id boundaries for the purpose.

But when a user on page #3 is deleted - you need to recalculate the cache for all pages after this one. There is no reason to do that for pages #1 and #2 since boundaries for the users there are unchanged.

Sorry for the delay with the response this time.