How to use horizontal sharding in Rails 6.1

#ruby #todayilearned #database #scaling

NOTE: As of writing Rails 6.1 has not been released, and to follow along you'll need to pointing to Rails Master, which is a handy thing to know how to do.

We saw some big news this week, as Rails continues to get better and better when it comes to supporting multiple databases.

Eileen Uchitelle and John Crepezzi shipped a commit to Rails master that allows full, out-of-the-box support for horizontal sharding.

Wait, shard what?

It is very possible you've gone your entire dev career without needing to shard, or even know vaguely what it is. That's okay. Rails has intentionally been constructed in such a way that you don't need to be a DB expert to use it.

However, here are a few links giving background on what sharding is, and maybe why you'd want to use it.

Forget it, I have no idea what I'm doing, but I want to shard NOW

Sweet! Me too.

First off, this is a Rails 6.1 feature (heh, which doesn't exist yet), so you'll need to spin up a Rails application off of the master branch.

You can do that by first generating a new rails app off of the latest stable branch:

rails new myapp --edge

Then, to point to the master branch, change the rails line to the below:

gem 'rails', github: "rails/rails", branch: "master"

Rebundle and voilá!

Let's make a bunch of databases

Update your database.yml like so (much of what comes being taken from the commit's docs):

development:
  primary:
    <<: *default
    database: my_primary_database
  primary_replica:
    database: my_primary_database
    replica: true
  primary_shard_one:
    <<: *default
    database: my_primary_shard_one
  primary_shard_one_replica:
    <<: *default
    database: my_primary_shard_one
    replica: true

You can use whatever database or adapter you want. I normally hook up Postgres, today playing around with SQLite3. For more advanced systems, these could all be pointing to different databases, local or remote, in vastly different locations. Pretty cool, huh?

In this setup from the docs, we have a primary and primary_shard, and each has a replica. Nifty!

Then run:

rails db:create; rails db:migrate

Created database 'my_primary_database'
Created database 'my_primary_shard_one'
Created database 'db/test.sqlite3'

Time to fill the database

Let's generate a model:

rails g model Person name:string
rails db:migrate

#=>
== 20200303155421 CreatePeople: migrating ====================================
-- create_table(:animals)
   -> 0.0026s
== 20200303155421 CreatePeople: migrated (0.0028s) ===========================

== 20200303155421 CreatePeople: migrating ====================================
-- create_table(:animals)
   -> 0.0040s
== 20200303155421 CreatePeople: migrated (0.0041s) ===========================

And either in a seeds file or our console, run:

Person.create!(
  name: 'Frieda'
)

Person.create!(
  name: 'Bill'
)

Person.create!(
  name: 'Penelope'
)

Playing with the databases

Now, let's update our application_record.rb like so:

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  connects_to shards: {
    default: { writing: :primary, reading: :primary_replica },
    shard_one: { writing: :primary_shard_one, reading: :primary_shard_one_replica }
  }
end

We will now reload our console and see what we can do, using ActiveRecord::Base.connected_to:

ActiveRecord::Base.connected_to(role: :reading, shard: :shard_one) do
  Person.first
end
#=> nil


# Now let's write to :shard_one,
ActiveRecord::Base.connected_to(role: :writing, shard: :shard_one) do
   Person.create!(
     name: 'John'
     )
    Person.create!(
      name: 'Georgina'
    )
 end

#=> #<Person id: 2, name: "Georgina", created_at: "2020-03-03 16:10:14", updated_at: "2020-03-03 16:10:14">

# And we'll read from :shard_one's replica:
ActiveRecord::Base.connected_to(role: :reading, shard: :shard_one) do
  Person.count
end
#=> 2

# And finally confirm that shard one's data has not been written to primary:

ActiveRecord::Base.connected_to(role: :reading, shard: :default) do
  Person.count
end
#=> 3

# This is correct, remember the three people we created earlier who default to the primary database.

Not too tricky

That was.... not as involved as one would inspect. Cheers to the Rails core team for once again making things as straightforward and developer friendly as possible! 💖

You can use this code however you like, and write and read from databases in whatever architecture is justified by your business logic.

Once again, for all the full details on using this feature in the wild, see the full PR here.