Ruby has so many awesome, easily accessible gems, but their ease of use can lull you into a sense of complacency. This is what happened to us at Kenna this past year and we learned the hard way just how important it is to know your gems.
The Problem
This past year at Kenna we sharded our main MySQL database and we choose to do it by client.
So all of a client’s data lives on its own database shard. This means every time we are looking up client data in MySQL, we need to know what database shard to look at. Our sharding configuration,
{
'client_123' => 'shard_123',
'client_456' => 'shard_456',
'client_789' => 'shard_789'
}
which tells us what client belongs on what database shard, needed to be easily accessible because we needed to access it every single time we make a MySQL request.
We first turned to Redis to solve this problem because it was fast, and the configuration hash we wanted to store was not very large. But eventually, that configuration hash grew as we added more and more clients.
13 kilobytes might not seem like a lot of data, but if you are asking for 13 kilobytes millions of times it can add up. Soon we were reading 7.8 MB/second from Redis which was not going to be sustainable as we continued to grow. We had to figure out a way to access our sharding configuration that did not involve Redis if we had any hope of scaling.
Finding a Solution
When we started trying to figure out how to solve this problem, one of the first things we did was take a close look at ActiveRecord’s connection object. This is where ActiveRecord stores all the information it needs to know how to talk to your database. Because of this fact, we thought it might be a good place to find something we could use to help us store our sharding configuration. So we jumped into a console to check it out. What we found was NOT an ActiveRecord connection object at all...
(pry)> ActiveRecord::Base.connection
=> <Octopus::Proxy:0x000055b38c697d10
@proxy_config= #<Octopus::ProxyConfig:0x000055b38c694ae8
Instead, we found an Octopus Proxy object that our Octopus sharding gem had created.
This was a complete surprise to us! After finding this, we immediately started to dig into the Octopus gem source code to try and figure out what this class was doing. When we finally found that Octopus Proxy class, much to our delight, it had all these great helper methods we could use to access our sharding configuration
module Octopus
class Proxy
attr_accessor :proxy_config
delegate :current_shard, :current_shard=,
:current_slave_group, :current_slave_group=,
:shard_names, :shards_for_group, :shards, :sharded,
:config, :initialize_shards, :shard_name, to: :proxy_config, prefix: false
end
end
BOOM! Redis load problem solved! Rather than hitting Redis before each MySQL request, all we simply had to do was reference our local ActiveRecord connection object.
Lesson Learned
One of the big things we learned from this whole experience was how important it is to KNOW YOUR GEMS! It is crazy easy to include a gem in your Gemfile, but when you do, make sure you have a general understanding of how it works.
I am not saying you need to go and read the source code for every one of your gems. That would take you forever to do. But maybe, the next time you add a new gem, consider setting it up manually for the first time in a console. This will allow you to see how it is configured and setup. If we had done this, we would have had a better understanding of how our Octopus sharding gem was configured and we could have saved ourselves this entire Redis headache!
Another great way to learn more about how your gems are working is through logging! Set the logger level to debug for your gem and after interacting with your app, checkout the logs. They will give you some good insights into how that gem is working and interacting with the rest of your code and external services. For more information on using logs, checkout my Live, Log, and Prosper post.
Happy New Year and Happy Coding!
If you are interested in other ways to prevent database hits using Ruby checkout my Cache Is King speech from RubyConf which is what inspired this post.
Top comments (13)
Curious: Have you monkeypatched any gem functionality to alter its behavior while problem-solving along these lines?
We have, probably more than I care to admit.
Luckily, most of the monkey patches are not permanent. Usually we will monkey patch a gem in our code for a quick fix and then often will issue a PR against the gem's open source repo with our update. Couple recent times we have done this are:
1) With the redis-rack gem when dealing with blank session keys
2) Getting Honeybadger to respect sidekiq defined thresholds
Other times we get a little lazy and forget to open a PR against the gem's repo. Your question actually just reminded me of one monkey patch that we use to optimize some Resque code that others would probably appreciate. Excuse me while I go write myself a TODO for that 😊
I tried to ask the question in the most neutral way in terms of judgment knowing there's bound to be some rationalization abound.
I think it's a tool at the disposal of Rubyists that gets fairly polarized, but everybody does it at some point for practical reasons.
Do you know of any static code analyzers that can detect all monkeypatches against vendor gems in a codebase?
find config/initializers -type f -name '*active*'
:-) :troll:
I do not know of any off the top of my head and a quick Google search didn't turn up much. Given handy Ruby methods like
source_location
I would bet it would not be hard to create one.Rather than monkey patching, have you considered forking the gem, making your update and installing the gem from that fork?
That way you avoid monkey patching, fix the issue and have a ready made pull request to the original repo.
Yep! Depending on the size of the monkey patch we will also do that. If it is a one liner then putting it in our code is so quick we go with that. If we are patching a couple methods then it is definitely easier to keep track of things by forking the gem right from the start.
The small caveat to forking the gem is we "have to"(not required yet but we try to make it our best practice) do it to our company organization. For one this requires an admin which sometimes takes a little bit to track down to do it for you. It also means if we issue an open source pull request it looks like it is coming from Kenna. While this is great, I have found usually the engineer who found and fixed the bug wants to be the one to issue the open source PR rather than doing it through Kenna which means forking the gem twice. I know I am this way bc I am always trying to improve my open source footprint.
Why are Kenna trying to enforce that? That does make it more difficult to contribute back, both by needing an admin to fork and by diminishing the individual developer's contribution. Seems like an odd policy. Is it something to do with licensing of the contribution (like where large projects require a contributor license agreement)?
We used to point at personal repo forks, but as we have grown the policy has changed a little bit.
The reason we do it this way is so we have better control over the gems we use and in case a developer leaves. Relying on open source to be maintained is one thing, but sometimes you can't rely on a single person's fork to be maintained. For example, if a developer forks a repo, then Kenna points to that developer's fork and then the developer leaves Kenna and no longer maintains the repo it's not a great situation. Ideally, the developer's fork gets merged and we go back to pointing at the original open source gem but since that is not always the case this is how Kenna has chosen to deal with it.
We require an admin to fork a repo to Kenna's organization to ensure we don't end up with a lot of unused repositories.
Would definitely love to hear how other organizations deal with personal repo forks!
That certainly makes sense regarding being able to maintain the fork within the company. I've not had this issue at a company I worked at, so don't have any further insight I'm afraid.
This is an interesting journey of discovery. I bet there are many developers that install a gem (or other library for some other language) for a specific use but never find out how much it could do for them.
Glad you solved your issue and stopped pulling all that data from Redis every request!
"...Soon we were reading 7.8 MB/second from Redis which was not going to be sustainable as we continued to grow...."
Was the option of a multi-node Redis cluster on dedicated hardware considered? 7.8MB/s read from RAM memory does not seem all that terrible.
Could clarify the solution a bit please. Instead of hitting Redis before each MySQL request you will referance the local AR conneciton object; does the connection object keep the Redi data in memory? I am confused (not familar with the Octopus gem).
"...Another great way to learn more about how your gems are working is through logging!..."
Great article Molly, I look forward to the next one!
Thanks David!
First point:
We honestly could have let this ride for a bit. 7.8 MB/sec was high but was not causing any issues for our beefy ElastiCache(In that logging post I mention that due to other requests we actually got to a level where it was causing other errors). However, we knew it was only going to get worse as we grew and while we could have beefed up ElastiCache we really wanted to save that as a last resort. We felt there had to be a way to be smarter about how we were getting the sharding data we needed which is why we went poking around ActiveRecord.
A big theme for our SRE team over the past year has been "working smarter" not harder. Every time we come across high loads we first look for ways to "fix" the loads. We use beefing up hardware as last resort.
Second point:
The ActiveRecord/Octopus Proxy object reads the data from MySQL actually when the process starts up and then holds that hash in memory to be used throughout the lifetime of the process. Prior to this, when the process would start up we would write the data to Redis from MySQL then every time we made a MySQL request we would read it from Redis.
Let me know if that makes sense or if you have any other questions!
Next post coming up will involve processing data in bulk, stay tuned 😁