Matt Culpepper

Posted on Aug 31, 2019 • Edited on Sep 1, 2019

Investigating the Performance of a Problematic Rails API Endpoint

#ruby #rails

It recently came to my attention that one of our API endpoints was hogging a lot of resources and taking over 3 seconds to complete per request. I was able to cut average memory usage from 85mb to 7mb and average request duration from 3000ms to 150ms. I'm going to detail that process in this post.

Let's get started by installing the tools we're going to use.

Set up bullet

bullet is a gem that helps you identify and correct N+1 queries.

We'll add this to our Gemfile:

group :development do
  gem 'bullet'
end

And this to our config/application.rb:

  config.after_initialize do
    Bullet.enable = true
    Bullet.rails_logger = true
  end

Set up rack-mini-profiler

rack-mini-profiler is a middleware that provides database and memory profiling. Let's get that set up so we can get a closer look into what's causing our issues.

We need to add rack-mini-profiler and memory_profiler to our Gemfile below the database gem, and run bundle to install it.

gem 'pg'
gem 'rack-mini-profiler'
gem 'memory_profiler'

Next, we'll add this to config/initializers/mini_profiler.rb:

Rack::MiniProfiler.config.authorization_mode = :allow_all

When rack-mini-profiler is enabled, it saves the profiler output from previous requests and injects a badge into the next HTML page that is loaded. But we're in an API only app, so in order to see the badge, we're going to have to serve an HTML page.

Note: If you're planning on checking this code into your repo, you'll want to add some sort of authorization around authorize_request.

Here's my PerformanceTestsController:

class PerformanceTestsController < ActionController::Base
  before_action do
    Rack::MiniProfiler.authorize_request
  end

  def index
  end
end

app/views/performance_tests/index.html:

<body></body>

config/routes.rb:

  get '/performance_tests', to: 'performance_tests#index'

Once you have that set up, if you open /performance_tests in your browser, you should be seeing this badge in the top left.

Recreate the environment

When you're investigating a production performance problem, you want the environment you test in to be as similar to the production environment as possible. In Rails' development mode, class caching is disabled, so the time it takes to complete requests can vary wildly. I ran these tests in production mode on my local machine with a similar dataset to the one found in the prod db.

Isolate

We use Ember as our front end framework, which can make several calls to the API on page load. I wanted to isolate the problematic API call so I could repeat it as quickly as possible as many times as necessary. The endpoint requires an authentication header, so I just used Chrome's Copy as cURL function to grab everything I needed in one command.

Benchmark

Now that we have our environment and tools set up, it's time to get our hands dirty and try to figure out what's actually going on. The endpoint in question is UsersController#index:

def index
  users = User.joins(:roles).where(roles: { title: params[:roles] })          

  respond_with users, include: %w[roles groups]
end

Before we start making changes, the first thing we're going to want to do is to get a benchmark control with the code in its current state. That way we can ensure the changes we're going to make are actually improvements.

rack-mini-profiler has several options available to pass in through the pp= query parameter, but the two we're going to be using are pp=profile-memory and pp=enable.

The first request always seems to have higher resource usage than subsequent requests, so I always fire the request twice and take the benchmarks from the second request.

Here we go, let's get our memory control:

# All Users (`/users?pp=profile-memory`)
Total allocated: 60047355 bytes (744989 objects)
Total retained:  1356020 bytes (8851 objects)

In addition to memory usage, we're going to want to check the rack-mini-profiler badge that displays info on response timings and SQL queries. We do this by using the pp=enable query parameter then opening /performance_tests as described in the rack-mini-profiler setup section above.

# All Users (`/users?pp=enable`)
Total Duration: 7795ms
SQL Duration: 373ms
SQL Queries: 1139

😱

This is bad! Let's fix it.

Eliminating N+1 queries

The amount of SQL queries being executed per request suggests that we've got some N+1 issues going on, so let's take a look at that first. We'll make one change and then run the benchmarks again.

Let's change the joins(:roles) to includes(:roles, :groups) so our roles and groups will be eager loaded.

def index
  users = User.includes(:roles, :groups).where(roles: { title: params[:roles] })

  respond_with users, include: %w[roles groups]
end

Here are the benchmarks with includes:

Total allocated: 436705757 bytes (4119179 objects)
Total retained:  4646110 bytes (33480 objects)

Total Duration: 7209ms
SQL Duration: 355ms
SQL Queries: 1130

Eager loading all of those roles actually caused the memory usage to increase 7x! The duration and queries decreased a bit, but this is obviously not the fix we were hoping for.

Let's use the rack-mini-profiler HTML badge to see the queries that are being executed.

When I expanded the 1130 sql link, I saw a lot of entries similar to this:

app/serializers/user_serializer.rb:72:in `employee_id'
app/controllers/v1/users_controller.rb:53:in `index'
app/controllers/application_controller.rb:48:in `set_current_attrs'
SELECT  "employees".* FROM "employees" WHERE "employees"."user_id" = $1 LIMIT $2;

At this point, I think the issue lies mainly within the serializer, so let's take a look at what's going on in there.

class UserSerializer < ActiveModel::Serializer
  attributes :id,
             :email,
             :first_name,
             :last_name,
             :last_login_at,
             :employee_id

  has_one :employee
  has_many :assignments
  has_many :direct_roles
  has_many :roles, through: :assignments
  has_many :group_assignments
  has_many :groups, through: :group_assignments

  def employee_id
    object.employee&.id
  end
end

Now we're on to something! Every time a User object is serialized, we're issuing queries for each of the associations listed here. We can try to eager load each of these associations with includes, but what if we don't need these associations for the index action at all?

Real quick, let's check out the show action next to index in UsersController.

  def index
    users = User.includes(:roles, :groups).where(roles: { title: params[:roles] })

    respond_with users, include: %w[roles groups]
  end

  def show
    respond_with @user, include: %i[roles roles_tags assignments groups group_assignments groups.roles]
  end

show is serialized via the same UserSerializer class. It's starting to look like those associations were added to the serializer so they would be included on the show endpoint.

For now, I'm only making optimizations to index, so show and any other actions using UserSerializer need to be unaffected. I think the path forward is to create an index-specific serializer with a sparse fieldset -- we'll include only the data we need in the response.

# app/controllers/users_controller.rb
def index
  users = User.includes(:roles, :groups).where(roles: { title: params[:roles] })

  respond_with users, include: [:roles, :groups], each_serializer: Users::Index::UserSerializer
end

# app/serializers/users/index/user_serializer.rb
class Users::Index::UserSerializer < ActiveModel::Serializer
  attributes :id,
             :email,
             :first_name,
             :last_name,
             :last_login_at,
             :employee_id

  has_many :roles, through: :assignments
  has_many :groups, through: :group_assignments

  def employee_id
    object.employee&.id
  end
end

I removed all associations except the ones we want to side load, roles and groups. Let's check our numbers.

Total allocated: 242932074 bytes (2392253 objects)
Total retained:  2511484 bytes (18008 objects)

Total Duration: 3650ms
SQL Duration: 202ms
SQL Queries: 571

Our first big improvement! At this point, I checked where this endpoint was being called in our frontend apps and verified that we didn't need the associations that were removed.

But, 571 queries. Let's check the Bullet output to the Rails log to see if it's identified any N+1 queries.

USE eager loading detected
  User => [:employee]
  Add to your finder: :includes => [:employee]
Call stack
  /Users/mculp/sf/cs/app/serializers/users/index/user_serializer.rb:66:in `employee_id'

USE eager loading detected
  User => [:group_assignments]
  Add to your finder: :includes => [:group_assignments]
Call stack
  /Users/mculp/sf/cs/app/models/user.rb:229:in `roles'

USE eager loading detected
  User => [:assignments]
  Add to your finder: :includes => [:assignments]
Call stack
  /Users/mculp/sf/cs/app/controllers/v1/users_controller.rb:49:in `index'

Yep! Let's eager load employee, group_assignments, and assignments.

  def index
    users = User
              .includes(:roles, :groups, :employee, :group_assignments, :assignments)
              .where(roles: { title: params[:roles] })

    respond_with users, each_serializer: Users::Index::UserSerializer, include: [:roles, :groups]
  end

Numbers:

Total allocated: 80137296 bytes (825840 objects)
Total retained:  761444 bytes (5371 objects)

Total Duration: 1580ms
SQL Duration: 58ms
SQL Queries: 124

Another big improvement. Bullet is no longer screaming at us in the Rails log.

After checking rack-mini-profiler, I see that we still have an N+1:

app/models/user.rb:476:in `last_login_at'
app/controllers/v1/users_controller.rb:49:in `index'
app/controllers/application_controller.rb:48:in `set_current_attrs'
SELECT  "authentication_tokens".* FROM "authentication_tokens" WHERE "authentication_tokens"."user_id" = $1 AND "authentication_tokens"."on_behalf" = $2 ORDER BY "authentication_tokens"."id" DESC LIMIT $3;

Here's the code for last_login_at:

  def last_login_at
    token = authentication_tokens.where(on_behalf: false).last
    token&.last_used_at
  end

This one is trickier to fix. We can't simply eager load authentication_tokens because this method is issuing a query each time it's called.

However, what we can do is create a new scoped association and eager load it.

  # app/models/user.rb
  has_one :last_login_authentication_token,
    -> { where(on_behalf: false) },
    class_name: 'AuthenticationToken'

  def last_login_at
    last_login_authentication_token&.last_used_at
  end

  # app/controllers/users_controller.rb
  def index
    eager_load_associations = [
      :roles, :groups, :employee, :group_assignments,
      :assignments, :last_login_authentication_token
    ]

    users = User.includes(eager_load_associations).where(roles: { title: params[:roles] })

    respond_with users, each_serializer: Users::Index::UserSerializer, include: [:roles, :groups]
  end

This should take care of our last N+1 issue. Let's make sure:

Total allocated: 69663419 bytes (872929 objects)
Total retained:  302956 bytes (1818 objects)

Total Duration: 1250ms
SQL Duration: 26ms
SQL Queries: 12

It's looking good from a SQL standpoint! The rest of the time is being spent instantiating and serializing objects.

Let's take a look at what we can do to make some improvements on that front.

Further Optimizations

fast_jsonapi

fast_jsonapi is a gem by the engineering team at Netflix that promises 25x faster serialization than ActiveModel::Serializers.

We want to ensure that with every change on this library, serialization time is at least 25 times faster than Active Model Serializers on up to current benchmark of 1000 records.

That sounds too good to be true, but it can't hurt to try it!

  # app/controllers/users_controller.rb
  def index
    eager_load_associations = [
      :roles, :groups, :employee, :group_assignments,
      :assignments, :last_login_authentication_token
    ]

    users = User.includes(eager_load_associations).where(roles: { title: params[:roles] })

    respond_with users, 
      each_serializer: Fast::Users::Index::UserSerializer,
      include: [:roles, :groups]
  end

# app/serializers/fast/users/index/user_serializer.rb
class Fast::Users::Index::UserSerializer
  include FastJsonapi::ObjectSerializer

  attributes :id,
             :email,
             :first_name,
             :last_name,
             :employee_id,
             :last_login_at

  has_many :roles, through: :assignments, serializer: Fast::Users::Index::RoleSerializer
  has_many :groups, through: :group_assignments, serializer: Fast::Users::Index::GroupSerializer

  attribute :employee_id do |object|
    object.employee&.id
  end
end

# app/serializers/fast/users/index/role_serializer.rb
class Fast::Users::Index::RoleSerializer
  include FastJsonapi::ObjectSerializer

  attributes :id, :title, :description
end

# app/serializers/fast/users/index/group_serializer.rb
class Fast::Users::Index::GroupSerializer
  include FastJsonapi::ObjectSerializer

  attributes :title, :description, :archived
end

Numbers:

Total allocated: 54130985 bytes (698850 objects)
Total retained:  189166 bytes (935 objects)

Total Duration: 707ms
SQL Duration: 21ms
SQL Queries: 6

Well, it's not 25x, but that's still a pretty impressive improvement. We're gonna roll with it.

Caching

fast_jsonapi also has built in object caching, which uses Rails' cache_key under the hood to do cache invalidation. I think it'd work well for our use case here, so let's try it.

We're using Redis as the cache store, which was set up in config/environments/production.rb:

  if ENV['REDIS_URL']
    config.cache_store = :redis_store, ENV['REDIS_URL'], { expires_in: 12.hours }
  end

Now all we have to do is add this cache_options line to our serializer to cache each User object:

# app/serializers/fast/users/index/user_serializer.rb
class Fast::Users::Index::UserSerializer
  include FastJsonapi::ObjectSerializer

  cache_options enabled: true, cache_length: 12.hours

  attributes :id,
             :email,
             :first_name,
             :last_name,
             :employee_id,
             :last_login_at

  has_many :roles, through: :assignments, serializer: Fast::Users::Index::RoleSerializer
  has_many :groups, through: :group_assignments, serializer: Fast::Users::Index::GroupSerializer

  attribute :employee_id do |object|
    object.employee&.id
  end
end

Now, let's run the numbers.

Total allocated: 10239567 bytes (92500 objects)
Total retained:  413751 bytes (2609 objects)

Total Duration: 165ms
SQL Duration: 17ms
SQL Queries: 6

🥳🎉

Top comments (5)

Jibran Kalia • Sep 1 '19

This is awesome! Both the benchmarking setup as well as the actual optimizations.

Harper Maddox • Oct 31 '19

Matt,
Love the post! We've got an almost identical setup for Rails API.

Do you use the JSON API standard in your responses? We don't (its kinda bloated), so that's holding us back from moving from AMS to Fast JSON API. I'd like to use Fast JSON, since it is consistent and we end up writing custom serializers (pluck -> hashes -> raw json) whenever perf matters.

Matt Culpepper • Oct 31 '19

Thanks!

We do use JSON:API. I think one of the biggest advantages of using it is exactly what you mentioned: there's a pretty nice ecosystem of tools that you can easily switch between.

I do agree that it can get bloated, especially when include is used wildly. But compound docs can also reduce the number of requests made to the server, so it can go both ways.

Nsouza31 • Jul 1 '20

Matt
Your post is symple amazing. Easy to undestand and implement.

Thank YouTube for sharing your knowledge with us.

Matt Culpepper • Jul 20 '20

Thank you, I'm glad you liked it!