Rahul

Posted on Apr 14, 2021

Never assume that patch updates are always non-breaking!

#ruby #rails

We learnt it the hard way. 👇

Last Friday afternoon, we bumped the Rails version from 5.2.3 to 5.2.5. We had to do this because one of the dependent gems of rails had been yanked very recently and rails quickly released a patch update to address this issue.

We meticulously went through the changelog because we didn't want this to break our system especially since it was on a Friday afternoon. (We have a history of Friday deployments causing outages :D )

Changelog said, good to go.

We went ahead and prepared for the release.

Things looked alright on our staging(test) environment and we deployed it to prod. Our production deployment pipeline failed. (Eyebrows were raised at this moment)

Because until that point, deployments were going through without any hassle as Docker had been using the cached version of the yanked gem. Since there's a failure now, the cache is gone and we can no longer push any further deployments without fixing this. 🤦‍♂️

So how did it pass the sanity on our staging?

Funnily, the release also had a log to one of our background processes and I just checked if the latest code was there on the new pod. But what I didn't notice was that the application pods were crashing. Sanity was done on the previous release 🤨

I tried making some change to the buggy release and pushed it to the stage and we found the issue on staging now.

Application pods weren't getting up because the health checks were failing. (We did not have any health checks for our background job pods)

So what's the big deal?

No further deployments can be pushed to production
Our staging was down.

Shoot! Almost everyone was blocked in one way or the other.

At first, I thought this was an infra issue. But soon I realised, the health check request wasn't even going through. The API was broken. 🤯

We were also using a gem called grape for APIs and health checks were going through that API.
Yes, Grape broke!

Wait, a patch update of rails that had almost nothing in the changelog broke grape? YES!!!

So who's the culprit?

Rack - it was bumped from 2.0.7 -> 2.2.3 (We missed this as there were a lot of dependent gems that got updated)

Rack is the middleware that forwards the requests to either grape or rails API. The response that it sends over had been changed (god knows why) and grape wasn't yet ready for this. The cascading effect was that all the grape APIs were failing including the health checks and our system was down.

We now had no other option but to update grape to the latest version and hope that it fixes this issue!
Thankfully, it fixed the issue.

What other option did we have?

Had it not fixed the issue, we would have been forced to move all the APIs out of grape to rails API.
Just the thought of this made me claustrophobic because that would not only ruin my Friday night but also would have consumed my weekend!

Lucky escape indeed!

Though it was pretty scary when it happened, I would take this learning on any day.

Lessons learnt ✅ fortunately without any major outage. 🤞

PS: This post was originally tweeted as a thread here.

Top comments (2)

Daniel Orner • Apr 14 '21

Looks like the issue here is that you updated the version of Rails without paying attention to what the dependencies got updated to. If you're using bundle update this will often grab the "latest possible" version f the gem you're updating and all its dependencies, not all of which are actually necessary.

In this case, I'm pretty sure the Rails patch update wasn't the thing that bumped up the Rack version, but the bundle update command itself. And the Rack version was a minor version bump, not a patch version bump.

tl;dr - this is a bit misleading, it wasn't that a patch broke your app, a minor update you didn't intend to do broke it.

Rahul • Apr 14 '21

Yes, you're right. It was the rack update that broke the app as I had mentioned in the post as well. We clearly missed this as there were a lot of dependent gems that got updated. I thought it was obvious to the readers that the minor version bump of rack broke grape and it was due to our negligence that this happened.

Nevertheless, I should have added the footnote that you quoted,

 It wasn't that a patch broke your app, a minor update you didn't intend to do broke it.