The n+1 query problem is one of the most common scalability bottlenecks. If you’re comfortable with Rails, Active Model Serializers and already have a good idea about what our problem is going to be, then may be you can just jump straight into the code here.
Say you’re fetching an array of Post objects at a GET endpoint and you also want to load the respective authors of the posts, embedding an author object within each of the post objects. Here’s a naive way of doing it:
class PostsController < ApplicationController
def index
posts = Post.all
render json: posts
end
end
class Post
belongs_to :author, class_name: 'User'
end
class PostSerializer < ActiveModel::Serializer
attributes :id, :title, :details
belongs_to :author
end
For each of the n Post objects being rendered a query will run to fetch the corresponding User object, hence we’ll run a total of n+1 queries. This is disastrous. And here’s how you fix it by eager loading the User object:
class PostsController < ApplicationController
def index
# Runs a SQL join with the users table.
posts = Post.includes(:author).all
render json: posts
end
end
Until now there’s absolutely nothing new for veterans.
But let’s complicate this. Let’s assume that the site’s users are not being stored in the same RDBMS as the posts are, rather, the users are documents stored in MongoDB (for whatever reason). How do we modify our Post serializer to fetch the user now, optimally? This would be going back to square one:
class PostSerializer < ActiveModel::Serializer
attributes :id, :title, :details, :author
# Will run n Mongo queries for n posts being rendered.
def author
User.find(object.author_id)
end
end
# This is now a Mongoid document, not an ActiveRecord model.
class User
include Mongoid::Document
include Mongoid::Timestamps
# ...
end
The predicament that our users now reside in a Mongo database can be substituted with, say, calling a 3rd party HTTP service for fetching the users or storing them in a completely different RDBMS. Our essential problem remains that there’s no way to ‘join’ the users datastore with the posts table and get the response we want in a single query.
Of course, we can do better. We can fetch the entire response in two queries:
- Fetch all the posts without the author attribute (1 SQL query).
- Fetch all the corresponding authors by running a where-in query with the user IDs plucked from the array of posts (1 Mongo query with an IN clause).
posts = Post.all
author_ids = posts.pluck(:author_id)
authors = User.where(:_id.in => author_ids)
# Somehow pass the author objects to the post serializer and
# map them to the correct post objects. Can't imagine what
# exactly that would look like, but probably not pretty.
render json: posts, pass_some_parameter_maybe: authors
So our original optimization problem has been reduced to “how do we make this code readable and maintainable”. The folks at Universe have come up with an absolute gem (too obvious?). Batch Loader has been incredibly helpful to me recently.
gem 'batch-loader'
bundle install
class PostSerializer < ActiveModel::Serializer
attributes :id, :title, :details, :author
def author
object.get_author_lazily
end
end
class Post
def get_author_lazily
# The current post object is added to the batch here,
# which is eventually processed when the block executes.
BatchLoader.for(self).batch do |posts, batch_loader|
author_ids = posts.pluck(:author_id)
User.where(:_id.in => author_ids).each do |user|
post = posts.detect { |p| p.author_id == user._id.to_s }
#'Assign' the user object to the right post.
batch_loader.call(post, user)
end
end
end
end
If you’re familiar with Javascript Promises, think of the get_author_lazily
method as returning a Promise which is evaluated later. That’s a decent analogy, I think, since BatchLoader
uses lazy Ruby objects.
One caveat: BatchLoader
caches the loaded values and to keep the responses up-to-date you should add this to your config/application.rb
:
config.middleware.use BatchLoader::Middleware
That’s basically it! We’ve solved an advanced version of the n+1 queries problem while keeping our code clean and using Active Model Serializers the right way.
One problem though. If you have a User serializer (Active Model Serializers work with Mongoid as well), that won’t be called for the lazily loaded author objects, unlike before. To fix this we can use a Ruby block and serialize the author objects before they’re ‘assigned’ to the posts.
class PostSerializer < ActiveModel::Serializer
attributes :id, :title, :details, :author
def author
object.get_author_lazily do |author|
# Serialize the author after it has been loaded.
ActiveModelSerializers::SerializableResource
.new(author)
.as_json[:user]
end
end
end
class Post
def get_author_lazily
# The current post object is added to the batch here,
# which is eventually processed when the block executes.
BatchLoader.for(self).batch do |posts, batch_loader|
author_ids = posts.pluck(:author_id)
User.where(:_id.in => author_ids).each do |user|
modified_user = block_given? ? yield(user) : user
post = posts.detect { |p| p.author_id == user._id.to_s }
# 'Assign' the user object to the right post.
batch_loader.call(post, modified_user)
end
end
end
end
Here’s the entire code. Enjoy!
Top comments (5)
That is a great solution! I've never had the situation of loading related resources from a different database before, but if I ever do hopefully I remember this article!
Are there any other uses for the
BatchLoader
that you've come across?Thanks.
BatchLoader
was made for essentially similar scenarios. However, I think it has greater utility with GraphQL where the n+1 hydra rears its head more frequently.Ah, of course!
Hey, I already mentioned it in Usama's gist, but I've built a gem that tackles the problem of N+1 queries in Active Model Serializers.
You can find it here: github.com/Bajena/ams_lazy_relatio...
Keep on rocking!
“A Quickstart Guide for Serializer in Rails — Fast JSON API vs Active Model Serializer” by Gitanjali Mule link.medium.com/yNtj2zNn74