Skip to content
loading...

The Importance of Knowing Your Gems

Molly Struve (she/her) on January 02, 2019

Ruby has so many awesome, easily accessible gems, but their ease of use can lull you into a sense of complacency. This is what happened to us at Ke... [Read Full]
markdown guide
 

Curious: Have you monkeypatched any gem functionality to alter its behavior while problem-solving along these lines?

 

We have, probably more than I care to admit.

Luckily, most of the monkey patches are not permanent. Usually we will monkey patch a gem in our code for a quick fix and then often will issue a PR against the gem's open source repo with our update. Couple recent times we have done this are:
1) With the redis-rack gem when dealing with blank session keys
2) Getting Honeybadger to respect sidekiq defined thresholds

Other times we get a little lazy and forget to open a PR against the gem's repo. Your question actually just reminded me of one monkey patch that we use to optimize some Resque code that others would probably appreciate. Excuse me while I go write myself a TODO for that 😊

 

more than I care to admit.

I tried to ask the question in the most neutral way in terms of judgment knowing there's bound to be some rationalization abound.

I think it's a tool at the disposal of Rubyists that gets fairly polarized, but everybody does it at some point for practical reasons.

Do you know of any static code analyzers that can detect all monkeypatches against vendor gems in a codebase?

find config/initializers -type f -name '*active*'

:-) :troll:

I do not know of any off the top of my head and a quick Google search didn't turn up much. Given handy Ruby methods like source_location I would bet it would not be hard to create one.

 

Rather than monkey patching, have you considered forking the gem, making your update and installing the gem from that fork?

That way you avoid monkey patching, fix the issue and have a ready made pull request to the original repo.

Yep! Depending on the size of the monkey patch we will also do that. If it is a one liner then putting it in our code is so quick we go with that. If we are patching a couple methods then it is definitely easier to keep track of things by forking the gem right from the start.

The small caveat to forking the gem is we "have to"(not required yet but we try to make it our best practice) do it to our company organization. For one this requires an admin which sometimes takes a little bit to track down to do it for you. It also means if we issue an open source pull request it looks like it is coming from Kenna. While this is great, I have found usually the engineer who found and fixed the bug wants to be the one to issue the open source PR rather than doing it through Kenna which means forking the gem twice. I know I am this way bc I am always trying to improve my open source footprint.

Why are Kenna trying to enforce that? That does make it more difficult to contribute back, both by needing an admin to fork and by diminishing the individual developer's contribution. Seems like an odd policy. Is it something to do with licensing of the contribution (like where large projects require a contributor license agreement)?

We used to point at personal repo forks, but as we have grown the policy has changed a little bit.

The reason we do it this way is so we have better control over the gems we use and in case a developer leaves. Relying on open source to be maintained is one thing, but sometimes you can't rely on a single person's fork to be maintained. For example, if a developer forks a repo, then Kenna points to that developer's fork and then the developer leaves Kenna and no longer maintains the repo it's not a great situation. Ideally, the developer's fork gets merged and we go back to pointing at the original open source gem but since that is not always the case this is how Kenna has chosen to deal with it.

We require an admin to fork a repo to Kenna's organization to ensure we don't end up with a lot of unused repositories.

Would definitely love to hear how other organizations deal with personal repo forks!

That certainly makes sense regarding being able to maintain the fork within the company. I've not had this issue at a company I worked at, so don't have any further insight I'm afraid.

 

This is an interesting journey of discovery. I bet there are many developers that install a gem (or other library for some other language) for a specific use but never find out how much it could do for them.

Glad you solved your issue and stopped pulling all that data from Redis every request!

 

"...Soon we were reading 7.8 MB/second from Redis which was not going to be sustainable as we continued to grow...."
Was the option of a multi-node Redis cluster on dedicated hardware considered? 7.8MB/s read from RAM memory does not seem all that terrible.

Could clarify the solution a bit please. Instead of hitting Redis before each MySQL request you will referance the local AR conneciton object; does the connection object keep the Redi data in memory? I am confused (not familar with the Octopus gem).

"...Another great way to learn more about how your gems are working is through logging!..."
Log All the Things

Great article Molly, I look forward to the next one!

 

Thanks David!

First point:
We honestly could have let this ride for a bit. 7.8 MB/sec was high but was not causing any issues for our beefy ElastiCache(In that logging post I mention that due to other requests we actually got to a level where it was causing other errors). However, we knew it was only going to get worse as we grew and while we could have beefed up ElastiCache we really wanted to save that as a last resort. We felt there had to be a way to be smarter about how we were getting the sharding data we needed which is why we went poking around ActiveRecord.

A big theme for our SRE team over the past year has been "working smarter" not harder. Every time we come across high loads we first look for ways to "fix" the loads. We use beefing up hardware as last resort.

Second point:
The ActiveRecord/Octopus Proxy object reads the data from MySQL actually when the process starts up and then holds that hash in memory to be used throughout the lifetime of the process. Prior to this, when the process would start up we would write the data to Redis from MySQL then every time we made a MySQL request we would read it from Redis.

Let me know if that makes sense or if you have any other questions!

Next post coming up will involve processing data in bulk, stay tuned 😁

code of conduct - report abuse