DEV Community

Cover image for 5 Strategies for Random Records from DB
kinvoki
kinvoki

Posted on • Originally published at kinvoki.com

5 Strategies for Random Records from DB

TLDR: I'm now using Strategy #5

Table of Contents


If you've ever needed to collect a random record from a database whether using Rails or another framework/language, you may have found that there are several strategies to choose from.

Why Get Random Records?

Before we dive in, let's consider why you'd need random records at all. Random behavior is rarely desired in applications since it leads to unpredictable results. But sometimes business requirements demand it.

Common use cases include displaying random content (like a "random image" feature or "random author" on a book tracking site), or testing/debugging to catch edge cases and unexpected behavior.

Elephant programmer analyzing database queries and performance metrics


Strategy #1 - Use RANDOM()

Use the database's built-in random function (RANDOM() for PostgreSQL/SQLite, RAND() for MySQL/MariaDB). This is the most robust approach, but slow as tables grow—around 260ms on my 1M-record test table.

Pros:

  • Most robust

Cons:

  • Gets slower as the table grows (~260ms on my 1M-record table)
Author.order('RANDOM()').limit(1)
Enter fullscreen mode Exit fullscreen mode

Strategy #2 - Pluck & Array.sample

Pluck all IDs from the database and use Ruby's Array#sample to pick one at random. Faster than Strategy #1 (~50ms on 1M records), but watch out for memory consumption as tables grow.

Pros:

  • Faster (~50ms on 1M records)

Cons:

  • Gets slower as the table grows, and you may run out of memory (especially on small VMs)
Author.find(Author.pluck(:id).sample)
Enter fullscreen mode Exit fullscreen mode

Strategy #3 - find_by with Min/Max ID

A third strategy is to generate a random ID between the minimum and maximum ID values, then use find_by to look up the record. This strategy is extremely fast, taking around 17ms on a table with 1,000,000 records, but it can be brittle if there are deleted records.

Pros:

  • Extremely fast if no deleted records (~17ms on my test data)
  • Table growth doesn't affect lookup speed

Cons:

  • Requires numeric, sequential IDs (no GUIDs or varchar)
  • Breaks with deleted records (missing IDs)
  • Error handling helps, but gets slower with many gaps
Author.find_by(rand(1..Author.count))
Enter fullscreen mode Exit fullscreen mode

Strategy #4 - Random Offset

Use a random offset to look up a record. Not as fast (~120ms on 1M records), but works with any ID type and needs no error handling.

Pros:

  • No error handling needed, works with any ID type

Cons:

  • Slower and inconsistent: averages 120ms but varies from 20ms to 600ms depending on offset position (that's how offset works)
Author.offset(rand(1..Author.count)).limit(1)
Enter fullscreen mode Exit fullscreen mode

Strategy #5 - Where with Min/Max

My preferred approach: generate a random ID between min and max values, then use a where lookup. Extremely fast (1-6ms on 1M records) and handles deleted records gracefully.

Pros:

  • Blazing fast: 1-6ms
  • Table growth doesn't affect speed (faster than offset)
  • Handles deleted records by finding the next available match
  • Chainable with other scopes (e.g., Author.published_today.random_record), though this reimplements offset behavior and may limit your data set. Still faster than using offset directly due to how offset works

Cons:

  • Less intuitive than offset, but offset gets slower with large datasets
# `random_id` Can be memoized / cached,
# but beware of scoping before caching(!!!)
# as you can get empty results where many are expected
random_id = rand(Author.minimum(:id)..Author.maximum(:id))
Author.where("id >= ?", random_id).limit(1)
Enter fullscreen mode Exit fullscreen mode

I used to prefer Strategy #3 (even with occasional deleted records, it stayed fast). Strategy #4 also worked well. However, while answering a StackOverflow question that led to this blog post, I discovered Strategy #5—now my preferred option.

Ruby developer elephant working on database optimization code


Tips and Caveats

A couple of tips to keep in mind when implementing these strategies:

Caching Author.count:

If the exact current count isn't critical, preload Author.count and memoize it (e.g., as Author.total_count), or cache it in an app-load config. This eliminates the count query overhead and speeds up Strategy #3 to under 5ms and Strategy #4 to about 100ms. Here's an example of how to do this:

def self.total_count
  @total_count ||= count
end
Enter fullscreen mode Exit fullscreen mode

Be Careful with Optimizations:

However, I would warn against going crazy with optimizations here, such as caching min/max values, as depending on the context, scoping, etc, it may lead to unexpected results. If you use Strategy #5, the bottleneck won't be in the database, so making 3 quick queries is preferable, unless it really-really matters for your WebScale™ app.

Error Handling:

For error handling, you have two options: retry lookups until finding an existing record, or do multiple Strategy #3 lookups and randomly select from the results. Depending on how many gaps you have in your data set, Strategy #3 may not be the best choice.


Your Thoughts?

Did I miss any approaches, or made a glaring blunder? Please, comment and let me know :)


Originally published: February 20, 2024Last updated: January 15, 2025

Cross-published from my blog: https://kinvoki.com/programming/5-strategies-random-records-from-db/

Top comments (0)