After reading the title of this post many of you are probably thinking...
DUH, I already know how to do that. But let’s just hold up a minute, because it might not be as obvious as you think. For example, how many of you have written code like this? I know I have!
User.where(:id => user_ids).each do |user|
# Lots of user processing
end
This code looks pretty good, right? If there are no user_ids this block will skip all of the user processing. That seems like a pretty good deal, so it must be fine. Unfortunately, that assumption is not quite right. Let me explain why.
When you execute that where clause even with an empty array it is actually going to hit MySQL.
(pry)> User.where(:id => [])
User Load (1.0ms) SELECT `users`.* FROM `users` WHERE 1=0
=> []
Notice the where 1=0
statement at the end of the SQL. That is how ActiveRecord ensures no records are returned. Sure, it's a quick 1ms
query, but if you are executing this chunk of code millions and millions of times that quick query can easily overwhelm your database and slow you down. So how do you update this code to make it more performant?
You have two options. The first is by not running the MySQL lookup unless you absolutely have to. You can do this by doing an easy peasy array check using Ruby before you execute the block.
return unless user_ids.any?
User.where(:id => user_ids).each do |user|
# Lots of user processing
end
By doing this you can save yourself a worthless database hit and ensure that your database is not being overwhelmed by useless calls. In addition, this will also speed up your code. Say you were running this code 10k times. It is going to take you over half a second to run that MySQL lookup 10k times.
(pry)> Benchmark.realtime do
> 10_000.times { User.where(:id => []) }
> end
=> 0.5508159045130014
If instead, you skip that MySQL query by checking to see if any user_ids are present first, then running a similar code block 10k times takes less than a hundredth of a second!
(pry)> Benchmark.realtime do
> 10_000.times do
> next unless ids.any?
> User.where(:id => [])
> end
> end
=> 0.0006368421018123627
As you can see, there's a significant time difference between hitting MySQL unnecessarily 10k times and executing plain old Ruby 10k times. This difference can have a significant impact on the performance of your application. A lot of people will look at this chunk of code
User.where(:id => user_ids).each do |user|
# Lots of user processing
end
and the first thing they say is well “Ruby is slow.” But that could not be further from the truth since we just saw that the pure Ruby code is hundreds of times faster! In this case, Ruby is not slow, hitting the database is slow! Keep an eye out for situations like this in your code where you might be making database calls you don't expect.
Now some folks might be looking at this code and thinking I’m not exactly writing code like that. Actually, I chained a bunch of scopes to my where clause
users = User.where(:id => user_ids).active.short.single
so I NEED to pass that array of empty user_ids otherwise the scope chain breaks. Thankfully, even though ActiveRecord doesn't handle empty arrays well, it does give you an option for handling empty scopes and that is the none scope.
none is an ActiveRecord query method that allows you to return a chainable relation with zero records without querying the database. Let’s see this in action. From earlier we know that a where clause with an empty set of IDs is going to hit the database.
(pry)> User.where(:id => []).active.tall.single
User Load (0.7ms) SELECT `users`.* FROM `users` WHERE 1=0 AND `users`.`active` = 1 AND `users`.`short` = 0 AND `users`.`single` = 1
=> []
But, if we replace that where clause with the none scope you can see no database call is made and we can still chain our scopes together.
(pry)> User.none.active.tall.single
=> []
Be on the lookout for tools like this in your frameworks that will allow you to work smarter with empty datasets. Even more importantly, NEVER ever assume your framework or gem is not making a database call when asked to process an empty dataset. For more tips on finding calls like these, checkout my blog post on logging.
Real Life Application
This concept of using Ruby to prevent database hits is not just limited to MySQL; it can apply to any type of database! Where we found it useful at Kenna was when it came to building what we call reports. Every night we build these colorful PDF reports for our clients.
These reports start with a reporting object, which holds all the information needed to build that report. Then, to build that pretty report page every night we have to make over 20 Elasticsearch requests along with multiple requests to Redis and MySQL.
We did a lot of work to make sure all these requests were super fast, but it was still taking us hours to build the reports. Eventually, the number of saved reports grew so much that we couldn’t finish them all over night. When my team and I started trying to figure out how to solve this problem, the first thing we did was jump in a console to take a close look at what data our existing reports contained. After some digging, we found that out of the 25k reports in our system over a third of them were blank!!!
(pry)> Report.blank_reports.count
=> 10805
This means they contained no data, and if the reports contained no data, then what was the point of making all of those Elasticsearch, MySQL, and Redis requests when we knew they were going to return nothing?
LIGHTBULB! Don’t hit the databases if the report is empty! By skipping the reports that had no data we took our processing time from over 10 hrs down to 3. By adding a simple line of Ruby
def build(report)
return if report.blank?
# Processing
end
we were able to prevent a bunch of worthless database hits, which in turn, sped up our processing tremendously. This strategy of using Ruby to shield your databases from requests, I like to refer to it as using database guards. In practice it’s simple, but I think its one of the easiest things to overlook when you are writing code. Every database hit is using resources so make them all count!
If you are interested in other ways to prevent database hits using Ruby checkout my Cache Is King speech from RubyConf which is what inspired this post.
Top comments (16)
Nice post 😊 I didn't know about
none
. I wonder why AR queries the db with1=0
instead of doing whatever.none
does 🤔Right? I went into the AR source code one weekend and tried to update it in such a way that none would be used when it saw an empty array. Broke a lot of specs and after 3 hours of trying to fix them I gave up. One day I will go back!
I feel that, lol. So vindicating when you finally get it after numerous independent attempts!
Brilliant Rails post.
Would love to see a post from a dev.to team member about performance optimizations you guys have done with this site to make it so fast. I'd imagine there are lots of neat tricks to share!
There're a couple of things I don't understand about this.
Why would ActiveRecord want to ensure that no records are returned with
WHERE 1=0
? Unless... do you want a "chainable relation" offoo
s to which you can later add items? That seems like a topsy-turvy way of working. What am I getting mixed up on?If the service is smart enough to inject a condition if the query is expected to be empty, why isn't it smart enough to not bother running it in the first place?
Why isn't the query cached, since it's identical to the previous one and most databases will have a cheap memory cache for that sort of thing?
In short, I don't understand why Ruby would be asking the database for something it already knows is going to be empty.
Your assumption is correct, they do this so they can return an active record relation. I dove into the ActiveRecord codebase trying to find a way to update it so that it would not hit the database and would return a None Relation instead but I was never able to figure it out and get all the specs passing.
What we are working with when dealing with these empty arrays is a method that we expect to return an active record relation so that we can chain other things to it. This is why we want to use the none scope, then we don't have to place conditionals all over the code base when using the method since it handles that internally. Here is one of the methods we use it in
Then all over the codebase we can call
connectors.something
and never have to worry about it.I tried to simplify the concept for the example but in real life these queries are being executed across many workers so the only layer of cacheing available is at the database level. Even then, we dont want to have to make a round trip call to the database if we dont have too
It is not actually Ruby doing the asking, it is Rails. When we take Ruby frameworks and use them they can hide a lot of what is going on under the hood and if you are not careful you can end up with lots of these types of useless database hits. The frameworks are trying to be as user friendly as possible and in some cases that can cause a loss in performance optimizations
Hopefully that explanation makes sense. Let me know if you have any other questions!
Love all the code examples and how this is pretty much a broken down white paper but more approachable. This makes it so much easier to learn from your experience! I highly recommend everyone check out Molly's RubyConf talk, she rocked it!
@molly I've watched your speech 2 days ago. Great stuff!
@molly do you have any resources about caching strategies? What approach did you use for cache invalidation?
Shoot, I some how totally missed this comment last month! Sorry for the late reply!
I don't really have any good resources to share since most of what we learned was through trial and error on the job. This is a big reason why I wanted to share our experiences so hopefully others can learn from them. Redis docs are also super awesome!
As for cache invalidation, that is always tricky. You really have to do what works best for your situation and use case. You want it short enough that basically no one really notices its cached, yet long enough to provide performance benefit. For somethings a 5 minute cache gives us huge gains while other caches have to be set for 24 hours to see any benefit. Take your best guess at what you think is a good number and then track your performance. Is it helping? How much is it helping? Is it noticeable at all to clients or customers? You will likely have to tweak your original guess but eventually you will find the sweet spot. We have had to tweak more than one invalidation timeframe getting to where we are now and I suspect we will be tweaking many more in the future.
Cool! Sometimes we think that making a Rails app perform better is hard, but you proved the opposite!
Awesome post! Learned a lot :)
Nice post and good speech 👏 👏
Last time I used WordPress a few years ago, it was making over 80 database calls JUST to render the generic default homepage.. That is just absolutely bonkers!
Nice post!
I would be even happier to have seen the performance improvements guided by monitoring/tracing data ^