N+1 Queries, Batch Loading & Active Model Serializers in Rails

#rails #nplus1queries #batchloading #ams

The n+1 query problem is one of the most common scalability bottlenecks. If you’re comfortable with Rails, Active Model Serializers and already have a good idea about what our problem is going to be, then may be you can just jump straight into the code here.

Say you’re fetching an array of Post objects at a GET endpoint and you also want to load the respective authors of the posts, embedding an author object within each of the post objects. Here’s a naive way of doing it:

class PostsController < ApplicationController  
  def index    
    posts = Post.all     

    render json: posts  
  end
end

class Post
  belongs_to :author, class_name: 'User'
end

class PostSerializer < ActiveModel::Serializer  
  attributes :id, :title, :details
  belongs_to :author 
end

For each of the n Post objects being rendered a query will run to fetch the corresponding User object, hence we’ll run a total of n+1 queries. This is disastrous. And here’s how you fix it by eager loading the User object:

class PostsController < ApplicationController  
  def index    
    # Runs a SQL join with the users table.
    posts = Post.includes(:author).all     

    render json: posts  
  end
end

Until now there’s absolutely nothing new for veterans.

But let’s complicate this. Let’s assume that the site’s users are not being stored in the same RDBMS as the posts are, rather, the users are documents stored in MongoDB (for whatever reason). How do we modify our Post serializer to fetch the user now, optimally? This would be going back to square one:

class PostSerializer < ActiveModel::Serializer  
  attributes :id, :title, :details, :author

  # Will run n Mongo queries for n posts being rendered.
  def author
    User.find(object.author_id)
  end
end

# This is now a Mongoid document, not an ActiveRecord model.
class User  
  include Mongoid::Document  
  include Mongoid::Timestamps  
  # ...
end

The predicament that our users now reside in a Mongo database can be substituted with, say, calling a 3rd party HTTP service for fetching the users or storing them in a completely different RDBMS. Our essential problem remains that there’s no way to ‘join’ the users datastore with the posts table and get the response we want in a single query.

Of course, we can do better. We can fetch the entire response in two queries:

Fetch all the posts without the author attribute (1 SQL query).
Fetch all the corresponding authors by running a where-in query with the user IDs plucked from the array of posts (1 Mongo query with an IN clause).

posts      = Post.all
author_ids = posts.pluck(:author_id)
authors    = User.where(:_id.in => author_ids)

# Somehow pass the author objects to the post serializer and
# map them to the correct post objects. Can't imagine what 
# exactly that would look like, but probably not pretty.
render json: posts, pass_some_parameter_maybe: authors

So our original optimization problem has been reduced to “how do we make this code readable and maintainable”. The folks at Universe have come up with an absolute gem (too obvious?). Batch Loader has been incredibly helpful to me recently.

gem 'batch-loader'

bundle install

class PostSerializer < ActiveModel::Serializer  
  attributes :id, :title, :details, :author

  def author
    object.get_author_lazily
  end
end

class Post
  def get_author_lazily
    # The current post object is added to the batch here,
    # which is eventually processed when the block executes.   
    BatchLoader.for(self).batch do |posts, batch_loader|          
      author_ids = posts.pluck(:author_id)  
      User.where(:_id.in => author_ids).each do |user|
        post = posts.detect { |p| p.author_id == user._id.to_s }
        #'Assign' the user object to the right post.
        batch_loader.call(post, user)      
      end    
    end  
  end
end

If you’re familiar with Javascript Promises, think of the get_author_lazily method as returning a Promise which is evaluated later. That’s a decent analogy, I think, since BatchLoader uses lazy Ruby objects.
One caveat: BatchLoader caches the loaded values and to keep the responses up-to-date you should add this to your config/application.rb:

config.middleware.use BatchLoader::Middleware

That’s basically it! We’ve solved an advanced version of the n+1 queries problem while keeping our code clean and using Active Model Serializers the right way.

One problem though. If you have a User serializer (Active Model Serializers work with Mongoid as well), that won’t be called for the lazily loaded author objects, unlike before. To fix this we can use a Ruby block and serialize the author objects before they’re ‘assigned’ to the posts.

class PostSerializer < ActiveModel::Serializer  
  attributes :id, :title, :details, :author
  def author
    object.get_author_lazily do |author|
      # Serialize the author after it has been loaded.     
      ActiveModelSerializers::SerializableResource
                             .new(author)
                             .as_json[:user]
    end
  end
end

class Post
  def get_author_lazily
    # The current post object is added to the batch here,
    # which is eventually processed when the block executes.   
    BatchLoader.for(self).batch do |posts, batch_loader|
      author_ids = posts.pluck(:author_id)
      User.where(:_id.in => author_ids).each do |user|
        modified_user = block_given? ? yield(user) : user
        post = posts.detect { |p| p.author_id == user._id.to_s }  
        # 'Assign' the user object to the right post.
        batch_loader.call(post, modified_user)      
      end    
    end  
  end
end

Here’s the entire code. Enjoy!

Top comments (5)

Phil Nash • Mar 16 '18

That is a great solution! I've never had the situation of loading related resources from a different database before, but if I ever do hopefully I remember this article!

Are there any other uses for the BatchLoader that you've come across?

Usama Ashraf • Mar 16 '18

Thanks. BatchLoader was made for essentially similar scenarios. However, I think it has greater utility with GraphQL where the n+1 hydra rears its head more frequently.

Phil Nash • Mar 17 '18

Ah, of course!

Jan Bajena • Jan 5 '19

Hey, I already mentioned it in Usama's gist, but I've built a gem that tackles the problem of N+1 queries in Active Model Serializers.
You can find it here: github.com/Bajena/ams_lazy_relatio...

Keep on rocking!