Pulkit Goyal for AppSignal

Posted on Jan 4, 2023 • Originally published at blog.appsignal.com

Database Performance Optimization and Scaling in Rails

#ruby #rails

Web applications usually rely heavily on databases, for the most part. And as applications grow, databases grow too. We keep scaling web servers and background workers to keep up with the heavy load. But eventually, the database needs to keep up with all the new connections from these processes.

One way to tackle this is to grow a database with an app using vertical scaling. This means adding more CPU power and memory to the database server. But this is usually slow. You might have to copy all your data to the new server and then put the application down to change the database servers that it communicates with.

This is also usually a one-way operation. You can't keep adding/removing CPU power or memory from the database server based on your load.

This post will cover some alternative methods to fine-tune and scale a database under heavy load and improve the performance of your Rails app. We will focus on:

Splitting schema and data between multiple databases
Using read-only replica databases

Let's get going!

Disclaimer: Scaling Isn't Always Needed

As always, with any post about performance optimization and scaling, I would like to put up a standard disclaimer: make sure you have an issue before trying to solve it.

For example, on one of our apps in production, we tackled scaling and optimization only when we started processing more than 3 million background jobs a day.

This will, of course, be different for different apps. But usually, a good indicator of whether you need to optimize is if your database is always running at high CPU or memory usage.

A Separate Database for Your Rails Application

Sometimes, simply using a separate database server for a part of your app is a good solution. For example, your app might do two different things that don’t have a big overlap. Alternatively, maybe there's one very database-heavy feature, but it is only used rarely or by a small section of users. You should go for a separate database server if — and only if — there is a clear distinction between the parts of your app that will use different databases.

For example, a separate database is useful when you're audit logging in a high-frequency app (or have other very high-volume data that isn't necessarily accessed frequently). Let's see how we can set this up.

First, set up the database.yml to configure the second database. Let’s call the second database log. We'll add a log_default entry for the secondary database with a common configuration across all environments:

log_default: &log_default
  host: localhost
  port: <%= ENV["DB_PORT"] || 5432 %>
  adapter: postgresql
  encoding: unicode
  user: <%= ENV["DB_USER"] || "postgres" %>
  password: <%= ENV["DB_PASSWORD"] || "postgres" %>
  migrations_paths: db/log_migrate

The important bit here is the migrations_paths option that tells Rails where to find migrations related to this database.

Then, update database.yml to configure access to the log database for development, test, and production environments:

development:
  primary:
    <<: *default
    database: <%= ENV["DB_NAME"] %>
  log:
    <<: *log_default
    database: <%= ENV["LOG_DB_NAME"] %>

test:
  primary:
    <<: *default
    database: my_app_test
  log:
    <<: *log_default
    database: my_app_log_test

production:
  primary:
    adapter: postgresql
    encoding: unicode
    url: <%= ENV["DATABASE_URL"] %>
  log:
    adapter: postgresql
    encoding: unicode
    migrations_paths: db/log_migrate
    url: <%= ENV["DATABASE_LOG_URL"] %>

As you can see, we move the configuration for each environment one level deeper. The configuration for our primary database lives inside the primary key for each environment. The second database's configuration (that we call log) lives inside the log key. primary is a special keyword that tells Rails to use this database as the default.

The next step is to set up ActiveRecord to use this new database. We first create a base class that establishes a connection to this database (instead of a primary one) using establish_connection:

class LogRecord < ActiveRecord::Base
  self.abstract_class = true
  establish_connection :log
end

And then, we inherit from LogRecord instead of ApplicationRecord for all the records that should access the log DB:

class AuditLog < LogRecord
end

Note that you keep using the rails generators for generating models and migrations by appending the --database log option to target the second database. The migration task now automatically migrates both databases, but if you need to only migrate one, you can use db:migrate:primary or db:migrate:log tasks.

This works great if there is a clear distinction between two parts of the app. But what if you don’t have a clear idea of the database that's creating issues? There are still a couple of options left. Let's check them out.

Using Read-Only Replicas for Rails

Our second scaling option is to access data from read-only replicas while still writing to the primary database. This can improve performance for users who only browse parts of the app without performing any writing operations. Depending on your app's use case, this can be 80% of your users or 10% of them. So, evaluate the behavior of your users or your application use case before going down this route.

For an app with a majority of read-only users (like Twitter, for example), you can gain huge improvements in performance. New read-only replicas can be added/removed at will without affecting the primary database, which opens up options for auto-scalability.

Let's see how to set this up.

Setting up a Read-Only Replica

As usual, we will start with modifications to the database.yml config to include the replica:

production:
  primary:
    adapter: postgresql
    encoding: unicode
    url: <%= ENV["DATABASE_URL"] %>
  primary_replica:
    adapter: postgresql
    encoding: unicode
    url: <%= ENV["DATABASE_REPLICA_URL"] %>

Again, the primary key is a special key indicating that this database is the default. We can use any other key for the replica database configuration. Let’s choose primary_replica for sanity.

Then update the ApplicationRecord to configure a connection to multiple databases:

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  connects_to database: { writing: :primary, reading: :primary_replica } if Rails.env.production?
end

Note that we add configuration for a production environment only in the above example. If you want to, you can set it up for all environments, after modifying your development setup to support replicas.

Finally, to use the read-only replica, you need to tell Rails when to use the primary database and when to use the replica. Rails provides a basic implementation for automatic role-switching based on the HTTP verb of the request out of the box. To enable it, add the following to your production.rb:

# Use Read-Only Databases on GET requests
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver
config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session

This tells Rails to use the replica for all GET or HEAD requests and the primary one otherwise. Notice the option that sets the delay to two seconds? That tells Rails to keep using the primary database for GET and HEAD requests if it is within two seconds of a write request in the same session.

Gotchas

Does all of this sound too simple? Well, we aren’t done yet. Depending on how your application is structured, you need to keep a few points in mind.

Firstly, ensure you do not write anything in a read-only action like index or show. If there are legitimate reasons for writing during those operations, you can manually switch the connection:

ActiveRecord::Base.connected_to(role: :writing) do
  # Your code here
end

A good example where this can be useful is if you are managing sessions in a database. If you do that, make sure you use the writing role. For example, you can override the save_record method in your session with authlogic:

class UserSession < Authlogic::Session::Base
  # Your session configuration

  def save_record(alternate_record = nil)
    ActiveRecord::Base.connected_to(role: :writing) do
      super
    end
  end
end

Another use case for automatic role-switching is when an app exposes a GraphQL API. All operations on a GraphQL API are POST, but conventionally, all queries are read-only, and mutations are read-write. In this case, to allow Rails to select the correct database, we can use a custom tracer in the schema.

First, create a tracer that switches roles based on the GraphQL document:

module Tracers
  class DatabaseRoleTracer
    EVENT_NAME = "execute_multiplex"

    # Execute on read only database if the operation has only queries.
    def trace(event, data)
      return yield unless Rails.env.production?
      return yield unless event == EVENT_NAME

      multiplex = data[:multiplex]
      ActiveRecord::Base.connected_to(role: role(multiplex)) do
        yield
      end
    end

    private

    def role(multiplex)
      if multiplex.queries.all?(&:query?)
        ActiveRecord::Base.reading_role
      else
        ActiveRecord::Base.writing_role
      end
    end
  end
end

And then use the tracer in your GraphQL Schema:

class MyAppSchema < GraphQL::Schema
  tracer Tracers::DatabaseRoleTracer.new

  # Your code here
end

Let’s get to the final alternative. What can you do if your application still has a huge load on your primary database? The answer: database sharding.

Database Sharding in Your Rails Application

This is the most complex of all the methods for scaling databases discussed in this post. So reach for it only if you need it. It will add considerable complexity to your application and needs to be done right to provide any real benefit.

There are two strategies to shard your database:

Distribute different tables on different nodes (vertical sharding).
Have the same schema across all nodes, but distribute data depending on certain parameters (horizontal sharding).

Vertical sharding is similar to our “multiple databases” strategy in terms of setup and pros and cons, so we will not discuss it again.

Let's discuss horizontal sharding and where it could be helpful.

One example is when you have a SaaS platform where a lot of data is associated with each user, but there is no overlap between the data of two users. In this case, splitting the data across nodes based on the user's id is the most logical way to go. Every signed-in user will have to access only a single database, so we won't have to reach out to multiple databases at the same time.

Setting up Horizontal Sharding

Let’s start with the configuration inside database.yml again:

production:
  primary:
    adapter: postgresql
    encoding: unicode
    url: <%= ENV["DATABASE_URL"] %>
  primary_replica:
    adapter: postgresql
    encoding: unicode
    url: <%= ENV["DATABASE_REPLICA_URL"] %>
  primary_shard_one:
    adapter: postgresql
    encoding: unicode
    url: <%= ENV["DATABASE_SHARD_ONE_URL"] %>
    primary_shard_one_replica:
    adapter: postgresql
    encoding: unicode
    url: <%= ENV["DATABASE_SHARD_ONE_REPLICA_URL"] %>

Then modify the ApplicationRecord to connect to different shards:

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  connects_to shards: {
    default: { writing: :primary, reading: :primary_replica },
    shard_one: { writing: :primary_shard_one, reading: :primary_shard_one_replica }
  }
end

Now we can use the shard option with ActiveRecord::Base.connected_to to switch shards. Like the automatic database role selector, Rails also provides an automatic shard switcher that can be activated inside production.rb like this:

Rails.application.configure do
  config.active_record.shard_selector = { lock: true }
  config.active_record.shard_resolver = ->(request) { Tenant.find_by!(host: request.host).shard }
end

The shard_resolver lambda is the most interesting part. The above implementation relies on the assumption that our application is accessed from different domains/subdomains and distinguishes between the shards. Modify it to fit your application needs (you might need to access cookies to identify a user before switching the shard).

As with vertical scaling, this strategy is usually a one-way street. Once you shard a database, it is very hard to “un-shard” it (since several databases could have the same ids for different objects). But this lays down the foundation to really scale your app to millions or billions of users when fitting everything on the same server is not an option.

If you want to read more about database sharding, see this write-up by Linode.

Measure Your Rails Database Performance with AppSignal

It can be tricky to keep an eye on the performance of your database without any other tools. Using AppSignal, you can easily track how your databases perform. See our AppSignal for Ruby page for more information.

You can also check out our post Test and Optimize Your Ruby on Rails Database Performance for more information on the metrics you can measure in AppSignal and how a dashboard might look.

Wrap Up

In this post, we discussed three major strategies to scale your database servers horizontally:

Splitting schema between multiple databases
Using read-only replica databases
Splitting data between multiple databases

All strategies have their pros and cons depending on your application use case and scale.

Rails has really upped its database game in its last few releases. This post has shown how easy it is to set up a Rails app to use multiple databases.

As always, if your application allows it, I suggest starting small with optimizations. The easiest optimization to start with is a read-only replica, then only move on when you need more scale.

Happy scaling!

P.S. If you'd like to read Ruby Magic posts as soon as they get off the press, subscribe to our Ruby Magic newsletter and never miss a single post!

DEV Community