Julien Fitzpatrick

Posted on Sep 10, 2020

Don't have a CLUE how to get the most out of your Active Record queries?

#rails #ruby #database

Improve your Active Record querying skills by playing a game of Clue: it's you against the database! Just want to play and don't want to read this whole thing? Go straight to the repo!

Part One: What is Active Record?

If you've spent any amount of time working in Rails, even as a beginner, you've likely encountered some SQL. It's that not-so-pretty, sometimes quite verbose language you often need to use in order to query your database. Active Record's job is to heroically swoop in and provide a much more human-readable and human-understandable way of interacting with your Rails database.

Active Record refers to the layer of Rails that’s responsible for representing business data and logic, which is basically all the stuff in the database that’s like, for a blog it would be the users, the posts, the comments, etc. The data associated with running the business. It’s called “Active Record” because it’s an implementation of a pattern called “Active Record” that was described by Martin Fowler in his book, Patterns of Enterprise Application Architecture. (It’s also available here: https://www.martinfowler.com/eaaCatalog/activeRecord.html) In Active Record, objects carry both persistent data and behavior which operates on that data. So a User object in our database has both the data that we see in the database, as well as any behavior we’ve defined in its model.

The Active Record pattern, in turn, is actually a description of an Object Relational Mapping system (ORM), which is just a way of connecting the application objects (the users, the posts, the comments, etc) to their associated tables in the database. By using an ORM, we avoid the hassle of having to use SQL to access records in our database. The ORM is like a friendly file clerk, so instead of having to find things the hard way, you can just be like “hey, I want this user named Frank” and they’ll go and get it for you.

As you can see here, Drake is demonstrating his distaste for this SQL statement as compared to his obvious love of its equivalent Active Record query. Look how much simpler it is!

Lazy Loading

Active Record makes use of Active Record Relations. You may have noticed that any Active Record query that returns a collection (like where), as opposed to an individual object (find, find_by, first, and last), is not actually returning the array of database objects that it looks like it’s returning: it’s actually returning an Active Record Relation!

Character.where(name: "Mr. Boddy").class vs. Character.find_by_name("Mr. Boddy").class

For performance reasons, Active Record doesn’t want to tell the database to execute any queries until it absolutely has to, so most of the Active Record queries you run will return a relation rather than the actual database object(s). This allows us to chain a bunch of query methods together without taking a performance hit by executing the actual SQL query behind each one, and when it’s finally time to execute the query, Active Record and SQL will work together to figure out the best way to structure the query.

Of course, while we’re trying this out and running query methods in our Rails console, we’ll notice that queries ARE being executed, because the Rails console implicitly runs something like the .inspect method on relations, which requires queries to be run. So it’s a little misleading if we’re only running these methods in our Rails console.

Eager Loading

Since Active Record is set up to work this way, we can easily get into a situation where we unwittingly create an N+1 problem and absolutely hammer our database with unnecessary queries. For example:

Character.all.each { |char| puts char.weapon.name if char.weapon }

This is not only going to make a query to get all the characters, but because an association exists between characters and weapons, it’s also going to make a query for every single character’s weapon, which is a lot of queries! A common way to avoid this problem is by loading the weapons along with the characters ("side-loading") by using the includes method:

Character.all.includes(:weapon).each { |char| puts char.weapon.name if char.weapon }

So that’s Active Record, lazy loading, eager loading and an intro to a couple of Active Record query methods.

Now, on to the fun part!

Part Two: Rails::Clue

Before we get started, I invite you to fork or just clone my repo here so you can follow along (and try some queries of your own): https://github.com/julienfitz/rails_clue

If you never played Clue as a kid, I’ll give you a quick primer. The premise is that a person has been murdered and it’s up to us, the players, to figure out who the murderer is, where it took place, and which weapon was used. We’ll be doing this by using the Active Record query interface!

In the board game, you start by putting a character card, a weapon card, and a room card in an envelope, and those represent the murderer, the weapon, and the location. Players deduce what’s in the envelope by taking turns moving around the game board, suggesting a variety of combinations of murderers, weapons, and locations, and taking notes along the way.

Similarly, our database has one Envelope object that contains the murderer, the weapon, and the location. Much like the board game, we could just cheat and look at the envelope directly, but that spoils all the fun, so let’s not do that!

Let’s start by exploring what we have in our database right now. Just like the board game, we have Characters, Weapons, and Rooms. Let’s start by talking about pluck().

pluck

So first, let’s list all the Characters’ names. You might be tempted to run Character.all.map(&:name) and that would in fact list all the Character names for you! This is true. But first of all, map is a method on Enumerable. So it’s not part of the Active Record query interface. Instead let’s use pluck!

Character.pluck(:name)

What’s the difference? Why would we choose pluck over map in this instance?
- When we use .all.map, Rails fetches the data from the database SELECT "characters".* FROM "characters", and creates a collection of ActiveRecord objects using that data. Ruby then goes over the whole collection, takes the name property from each object, stores it in a new collection (the array), then returns the array.
- When we use pluck, we move the responsibility of fetching those names from Ruby straight to the database. This executes this query: SELECT "customers"."name" FROM "customers". pluck is much faster because it directly converts a database result into a Ruby Array, without constructing ActiveRecord objects in between. It’s not as noticeable on a small database like the one I'm working with, but very noticeable on larger ones. When I did this locally on my work machine, which is similarly small, there was a difference between 1.7ms and 0.8ms. So almost a full millisecond difference.
- Something else interesting about pluck is that because, unlike most of the methods that operate on relations, it triggers an immediate query and returns an actual array (rather than a relation), so you can’t chain additional query methods on top of it.
Anyway, so let’s get the names of all the Characters, all the Weapons, and all the Rooms using pluck. Cool. Another fun fact about pluck is you can query for more than one attribute at a time! So if I wanted to see all the Characters’ names AND their ids, we can run Character.pluck(:id, :name).

count

.count is also an Active Record query method. Let’s see how many GuestbookEntries we have with GuestbookEntry.count. Wow, that’s a lot of GuestbookEntries.

We can also use .count in combination with an attribute to see how many of that given class of objects have that attribute. This doesn’t really help us right now, since all the database objects have only one or two attributes, and they all have all of them. So if we look at Character.count and Character.count(:name) we’re gonna see the same number. But it’s cool to know you can do that!

find / find_by

One of the most basic query methods that most of us know of the top of our heads is .find(). Both .find() and .find_by() will each return only one result, so find_by will return the first matching result. Let’s pick a character at random. Since we have 10 Characters, each having an ID between 1 and 10, let’s use Character.find(rand(1..10)) to get a “random” character, and we’ll assign this character to a variable to make our lives easier.

How many guestbook entries does this character have?

char = Character.find(rand(1..10))
char.guestbook_entries.count

GuestbookEntries have a time_entered and time_exited, and the two attributes on our Envelope that are actually okay for us to access are murder_start_time and murder_end_time. We can use those attributes to see if our Character was in any particular room during that time.

where

We can use .where to see if any of our Character’s GuestbookEntries are from the time range of the murder! Let’s make our lives easier by assigning the murder start and end times to variables. Then we’ll run a basic where query to see if this character has any GuestbookEntries during the murder times.

murder_start = Envelope.murder_start_time
murder_end = Envelope.murder_end_time

char.guestbook_entries.where(time_entered: murder_start..murder_end, time_exited: murder_start..murder_end)

Now, the where method is super flexible, and there are many ways we can use it. You can pass it a string, an array, or a hash. You can also use where.not to find records that aren’t what you’re searching for. There’s lots of examples in the Active Record documentation.

joins

You know what might also be helpful? Every Weapon belongs to a character, BUT… not every Character has a weapon. What if we narrow down our characters to just those that have weapons? We can use an INNER JOIN, or in Active Record, the joins method! We can do this with:

Character.joins(:weapon)

We can chain pluck on there and get a list of just the names of the Characters who have weapons!

Character.joins(:weapon).pluck(:name)

How about we add rooms to that query so we can make sure we’re looking at characters who have both a weapon and at least one room?

Character.joins(:weapon, :rooms).pluck(:name)

distinct

Oof, what a mess! So many results! The thing about joins is that we’ll see duplicates if more than one room has the same character. So let’s use distinct to only grab a single record per unique value:

Character.joins(:weapon, :rooms).distinct.pluck(:name)

group

Out of curiosity, it won’t help us solve the mystery but just for fun so we can learn about another query method, let’s use group to see how many of these Characters who have weapons and rooms have been in each room, by room name. Group is generally more useful in combination with a math operation like “count” so that’s how we’re gonna use it.

Character.joins(:weapon, :rooms).distinct.group('rooms.name').count

having

We can chain having onto group to specify additional conditions on the query, so let’s see if we can add some conditions around the characters’ GuestbookEntries to help us narrow down who might have done it! Let’s narrow this down to characters who have guestbook entries with time_entered that’s equal to or after the murder_start_time and time_exited that’s equal to or earlier than the murder_end_time:

Character.joins(:weapon, :rooms).distinct.group('rooms.name').having('guestbook_entries.time_entered >= ? AND guestbook_entries.time_exited <= ?', murder_start, murder_end)

OMG! I think we have the murderer! Or at least, the version of this database that I'm currently working with has happened to work out this way (your results will likely vary, since the murderer is generated randomly). Now, this query could have returned more than one character, but it just happens that I currently only have one result, so this has got to be them! Let’s see which room we’re talking about here by tacking on a count real quick:

Character.joins(:weapon, :rooms).distinct.group('rooms.name').having('guestbook_entries.time_entered >= ? AND guestbook_entries.time_exited <= ?', murder_start, murder_end).count

Let’s see if Mrs Peacock is indeed the murderer (I’ve added an is_murderer? method to Characters). Since this query returns an array, and we already know there’s only one result, we can use take similarly to how we might otherwise use first , it’s just that take grabs a single record without any implicit ordering, so it might be a teeny bit faster than first sometimes, because it’s just doing a teeny bit less work.

Character.joins(:weapon, :rooms).distinct.group('rooms.name').having('guestbook_entries.time_entered >= ? AND guestbook_entries.time_exited <= ?', murder_start, murder_end).take.is_murderer?

Sweet! Let’s see if this room is the murder location:

Room.find_by_name("Lounge").is_scene_of_the_crime?

Amazing! All that’s left now is to figure out the murder weapon. If we want to do it the easy way, we can just look up which weapon Mrs. Peacock has, since there’s a relationship there. But! If you want to try your hand at some Active Record query method combos of your own, I invite you to do so to figure this one out!

What’s Mrs. Peacock’s weapon?

Character.find_by_name("Mrs. Peacock").weapon.name

Is it the murder weapon?

Character.find_by_name("Mrs. Peacock").weapon.is_murder_weapon?

Sweet! So we think it was Mrs. Peacock in the Lounge with the rope! Let’s see if we’re correct:

Envelope.take.whodunnit

And there we have it! Mystery solved! If you'd like to try your hand at this, feel free to fork my repo (or just clone it) and play around! https://github.com/julienfitz/rails_clue

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Forem

Don't have a CLUE how to get the most out of your Active Record queries?

Part One: What is Active Record?

Lazy Loading

Eager Loading

Part Two: Rails::Clue

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

Top comments (0)

The Next Generation Developer Platform

Read next

Mastering Data Cleaning: Your Guide to a Cleaner, Reliable Dataset 🚀

Idempotency in DynamoDB

SQL Interview Questions: A Comprehensive Guide for Developers

AI Prostate Cancer Detection Less Accurate Than Claimed Due to Real-World Image Processing Challenges