The Bike Shed
312: Spooky Stories
Chris evaluates the pros and cons between using Sidekiq or Active Job with Sidekiq. He sees exceptions everywhere.
Steph talks about an SSL error that she encountered recently. It's officially spooky season, y'all!
Transcript:
CHRIS: Additional radiation just makes Spider-Man more powerful.
STEPH: [laughs] Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Steph Viccari.
CHRIS: And I'm Chris Toomey.
STEPH: And together, we're here to share a bit of what we've learned along the way. Hey, Chris, what's new in your world?
CHRIS: Fall is in the air. It's one of those, like, came out of nowhere. I knew it was coming. I knew it was going to happen. But now it's time for pumpkin beer and pumpkin spice lattes, and exclusively watching the movie Hocus Pocus for the next month or so or some variation of those themes. But unrelated to that, I did a thing that I do once, let's call it every year or so, where I had to make the evaluation between Sidekiq or Active Job with Sidekiq, as the actual implementation as the background job engine that is running. And I just keep running through this same cycle.
To highlight it, Active Job is the background job system within Rails. It is a nice abstraction that allows you to connect to any of a number of them, so I think Delayed Job is one. Sidekiq is one. Resque is probably another. I'm sure there's a bunch of others. But historically, I've almost always used Sidekiq. Every project I've worked on has used Sidekiq. But the question is do you use Active Job with the adapter set to Sidekiq and then you're sort of living in both worlds, or do you lean in entirely and you use Sidekiq? And so that would mean that your jobs are defined to include Sidekiq::Worker because that's the actual thing that provides the magic as opposed to inheriting from Application Job. And then do you accept all of the trade-offs therein? And every time I go back and forth. And I'm like, well, but I want this feature, but I don't want that feature. But I want these things. So I've made a decision, but I want to talk ever so briefly through the decision points that were part of it. Have you done this back and forth? Are you familiar with the annoying choice that exists here?
STEPH: It's been a while since I've had the opportunity to make that choice. I'm usually joining projects where that decision has already been made. So I can't think of a recent time that I've thought through it. And my current project is using that combination of where we are using Active Job and Sidekiq.
CHRIS: So I think there's even a middle ground there where that was the configuration that I'd set up on the project that I'm working on. But you can exist in both worlds. And you can selectively opt for certain background jobs to be fully Sidekiq. And if you do that, then instead of saying, "Perform_later," You say, "Perform_async." And there are a couple of other configurations. It gives you access to the full Sidekiq API. And you can do things like hey, Sidekiq, here's the maximum number of retries or a handful of other things. But then you have to trade away a bunch of the niceties that Active Job gives.
So as an example, one thing that Active Job provides that's really nice is the use of GlobalID. So GlobalID is a feature that they added to Rails a while back. And it's a way to uniquely identify a given record within your system such that when you say perform_later, you can say, InvitationMailer.perform_later and then pass it a user record so like an instance of a user model. And what will happen in the background is that gets serialized, but instead of serializing the whole user object because we don't actually want that, it will do the GlobalID magic. And so it'll turn into, I think it's GID:// so almost like a URL. But then it'll be, I think, your application name/model name down the road. And the Perform method actually gets invoked via the background system. Then you will just get handed that user record back, but it's not the same instance of the user record. It sort of freezes and thaws it. It's really nice. It's a wonderful little feature. Sidekiq wants nothing to do with that.
STEPH: I'm so glad that you highlighted that feature because that was on my mind; I think this week where I was reviewing...somebody had made the comment where they were concerned about passing a record to a job and saying how that wouldn't play nicely with Sidekiq. And in the back of my mind, I'm like, yeah, that's right. But then I was also I'm pretty sure this got addressed, though. And I couldn't recall specifically if it was a Sidekiq enhancement or if it was a Rails enhancement. So you just cleared something up for me that I had not had time to confirm myself. So thanks.
CHRIS: Well, to be clear, this works if you are using Active Job with Sidekiq as the adapter, but not if you are using a true Sidekiq worker. So if you opt-out of the Active Job flow, then you have to say, "Perform_async," and if you pass it a record, that's not going to work out particularly nicely.
The other similar thing is that Sidekiq does not allow the use of keyword args, which, I'm going, to be honest, I really like keyword arguments, especially for background jobs or shuttling data through your system. And there's almost a lazy evaluation. I want some nicety to make sure that when I am putting something into a background job that I'm actually using the correct call signature, essentially passing the correct data in the correct shape. Am I passing a record, or am I passing the ID? Am I passing a list of options or a single option? Those sort of trade-offs that are really easy to subtly get wrong.
I came around on this one because I realized although Active Job does support keyword arguments, the way it does that is it just has a JSON serialization format for them. So a keyword argument turns into a positional array with an associated hash that allows for the lookup or whatever. Basically, again, they handle the details. You get to use keyword args, which is great, with the exception that when you're actually calling perform_later, that method perform_later is a method missing type magic method. So it does not actually check the keyword arguments at that point. You're basically just passing an options hash as opposed to true keyword arguments that would error because they don't match up. And so when I figured that out, I was like, oh, never mind. This doesn't actually do the thing that I care about. It's a little bit nicer in terms of the signature of the method when you're defining your background job itself, but it doesn't actually do any logical checking. It doesn't give me any safety or robustness within my system. So I don't care about that.
I did find a project called sidekiq-symbols, which does some things under the hood to how Sidekiq serializes and deserializes jobs, which I think gives largely the same behavior as Active Job. So I can now define my Sidekiq jobs with keyword arguments. Things will work. I can't use GlobalID. That's still out. But that's fine. I can do a little helper method that basically does the same thing as GlobalID or at least close approximation. But sidekiq-symbols lets me have keyword arg-like signatures in my methods; basically, it is. But again, it doesn't actually do any check-in when I'm enqueueing a job, and I am sad about that.
STEPH: Yeah, that's another interesting distinction. And I'm unsurprisingly with you that I would favor having keyword args and having that additional safety in place. Okay, so I've been keeping track. And so far, it sounds like we have two points because I'm doing a little scorecard here between Active Job and Sidekiq. And we have two points in favor of Active Job because they offer a GlobalID, which then allows us to pass in a record, and then it takes care of the serialization for us. And then also, keyword args, which I agree with you that's a really nice feature to have in place as well. So I'm curious, so it sounded like you're leaning towards Active Job, but I don't want to spoil the ending.
CHRIS: Yes, I could see why that's what you would be taking away from the conversation thus far. So again, just to reiterate, Active Job and Sidekiq with this sidekiq-symbols extension they both support keyword args, kind of. They support defining your job with keyword args and then enqueueing a job passing something that looks like keyword args. But it ends up...nobody's actually checking anything, so it's mostly like a syntactic nicety as opposed to any sort of correctness, which is still nicer, but it's not the thing that I actually want. Either way, nobody supports it, so it is not available to me. Therefore, it is not a consideration point.
The GlobalID thing is nice, but it is really, again, it's a nicety more than anything. I have gone, and I'm leaning in the direction of full Sidekiq and Sidekiq everywhere as opposed to Active Job in most cases, but then Sidekiq when we need it. And that's because Sidekiq just has a lot more power and a lot more functionality. So, in particular, Sidekiq has a feature which allows you to say...it's a block that you put at the top of your Sidekiq job that says retries exhausted or something. I think Sidekiq retries exhausted is the actual full name of that at that point, which is really unfortunate in my mind, but anyway, I'll deal. At that point, you know that Sidekiq has exhausted all of the retries, and you can treat it as failed.
I'm going, to be honest, I went on a quest to find a way to say, hey, I'm going to put some work into the background. It's really important for me to know if this work succeeds or if it fails. It's very easy to know if it succeeds because that just happens in-line in the method. But we can have an exception raised at basically any point; Sidekiq does a great job of catching those, of retrying, of having fundamental mechanisms there. But this is the best that I can get for this job failed. And so Active Job, as far as I can tell, does not have anything for this in order to say, yep, we are done. We are not going to keep working on this. This work has failed. It is dead.
Dead is; actually, I think the more correct term for where we're at because failed is a temporary state, and then you retry after a failure. Whereas dead is, this has gone through all of its retries, and it will never be run again. Therefore, we should treat this as not having run. And in my case, the thing that I want to do is inform the user that this operation that we were trying to do on their behalf has not succeeded, will not succeed. And please reach out or otherwise deal with the fact that we were unable to do the thing that they asked us to do. That feels like a really important thing for me to be able to do, to be able to communicate back to my users.
This is one of those situations where I'm looking at the available options, and I'm like, I feel like I can't be the only one who wants to know when something goes wrong. This feels like a thing that's important. But this is the best example that I've found, the Sidekiq retries exhausted block. And unfortunately, when I'm using it, it gets yielded the Sidekiq JSON blob deserialized, so it's like Ruby hash. But it's still like this blob of data. It's not the same data that gets passed into perform. And so, as a result, when I want to look up the record that was associated with it, I have to do this nested dig into the available hash of data. And it just feels like this is not a well-paved path. This is not something that is a deeply thought about or recommended use case. But again, I don't feel like I'm doing something weird here. Am I doing something weird, Steph, wanting to tell my users when I was unable to do the thing they asked me to do? [chuckles]
STEPH: That feels like a very rhetorical question. [laughs]
CHRIS: It does. I apologize. I'm leading the witness. But in your sincere heart of hearts, what do you think?
STEPH: No, that certainly doesn't sound weird. I'm actually thinking back to some of the jobs that cause me stress in regards to knowing when they failed and then having that communication of knowing that we've exhausted all the retries. And, of course, knowing when those retries are exhausted is incredibly helpful.
I am intrigued, though,, because you're highlighting that Active Job doesn't have the same option around setting the retry. And I'm trying to recall exactly how it's set. But I feel like I have set the retry count for Active Job. And maybe, as you mentioned before, that's because it's an abstraction, or I'm not sure if Active Job actually has that native support. So I feel a little confused there where I think my default instinct would have been Active Job does have that retry capability. But it sounds like you've discovered otherwise.
CHRIS: I'm not actually sure what Active Jobs core retry logic or option looks like. So fundamentally, as far as I understand it, Active Job is an abstraction. And under the hood, you're always connecting an adapter. So it's either going to be Sidekiq, or Resque, or Delayed Job, or other. And each of those systems, whichever system you have as the adapter, is the one that's actually going to be managing retries. And so I know Sidekiq happens to have as a default 25 retries. And that spans, I think it's a two-week exponential back off. And Sidekiq has some very robust logic that they have implemented as the way retries exist within Sidekiq. I'm not sure what that would look like if you're trying to express it abstractly because it is slightly different.
I know there was some good work that was done on Sidekiq to allow the Sidekiq options that's a method at the top level of the job, even if it's an Active Job job to express the retries. So that may be what you've seen, or there may be truly an abstraction that exists within Active Job, and then each adapter needs to know how to handle retries. But frankly, the what can Sidekiq do that Active Job can't? There's a whole bunch of stuff around limiting when you would retry limiting, enqueuing a job if there already exists one, when and how do those records get locked. There's a whole bunch of stuff.
Sidekiq has a lot of power under the hood. And so if we want to be leaning into that, that's why I'm leaning towards let's just be Sidekiq all the time. Let's become Sidekiq experts. Let's accept that as a deep architectural decision within the app as opposed to just relying on the abstraction. Because fundamentally, if we're just using Active Job, we're not going to have access to the full power of Sidekiq or whatever the underlying system is, so sort of that decision that I'm making, but I don't know specifically around the retries.
STEPH: Okay, thanks. That's really helpful. It's been a while since I've had to make this decision. I'm really enjoying you sharing your adventure because I'm trying to think what's the risk? If you don't use Active Job, what are the trade-offs? And you'd mentioned some of them around the GlobalID and keyword args, which are some niceties. But overall, if you don't go with the abstraction, if you lean into Sidekiq, the risk is then you want to migrate to a different enqueuing service. And something that we talk about is mitigating that risk, so then you can swap it out. That's also something I have never done or encountered where we've had to make that change. And it feels like a very low risk in my mind.
CHRIS: Sidekiq feels like the thing you would migrate to, not a thing you would migrate from. It feels like it is the most powerful. And if anything, I expect at some point we'll be upgrading to Sidekiq pro or enterprise or whatever the higher versions that you pay for, but you get more features there. So in that sense, that is the calculation. That's the risk trade-off in my mind is that we're leaning into this technology and coupling ourselves more closely to it.
But I don't see that as one that will reassess in the same way that people talk about Active Record and it being an ORM. And it's like, oh, we're abstracting the database underneath, and I'm like, no, I'm not. I'm always using Postgres. Please do not take Postgres. I'm not going to switch over to MySQL next week. That's totally fine if you start on MySQL. It's unlikely you're going to port over to Postgres. We may port to an entirely…like it's a Cassandra column store with a Kafka queue, I don't know, something weird down the road. But it's not going to be swapping out Postgres for MySQL or vice versa. Like you said, that's probably not a change that's going to happen. But that I think is the consideration.
The other consideration I have in my mind is Active Job is the abstraction that exists within Rails. And so I can treat it as the lowest common denominator, and folks joining the project, it's nice to have that familiarity. So perform_later is the method on the Active Job jobs, and it has a certain shape to it. People may be familiar with that. Mailers will automatically use Active Job just implicitly under the hood. And so there's a familiarity, a discoverability. It's just kind of up the middle choice. And so if I can stick with that, I think there's a nicety there. But in this case, I think I'm choosing I would like the power and consistency on the Sidekiq side, and so I'm leaning into that.
STEPH: Yeah, that makes a lot of sense to me. And I liked the other example you provided around things that were not likely to swap out and Postgres, MySQL, your database being one of them. And in favor of an example that I do have for something that...I do enjoy wrapping. It's not something that I adhere to strictly, but I do enjoy it when I have the space to make this choice. So I do enjoy wrapping HTTPClients, not just because then I can swap it out for a different HTTPClient, which frankly, that's also rare that I do that. Once I choose an HTTPClient, I'm probably pretty happy, and I don't need to swap it out.
But I really like being able to extend to the API specifically if they don't handle error responses in a way that I would like to or if they raise, and then I want to change the API to have a more thoughtful interface and where I don't have to rescue those errors. But instead, I can interact with this object that then represents an error state. So that was just one example that came to mind for things that I do enjoy having an abstraction around and not just so I can swap it out because that feels like a very low risk, but more frankly, so I can extend the API.
CHRIS: I definitely share the I almost always wrap APIs, or I try and hide whatever the implementation detail whether it be HTTPParty, or Faraday or whatever it is that I'm using and trying to hide that deeply within the system. And then I have whatever API client that we define. And that's what we're interacting with. It's interesting that you bring up errors and exceptions there because that's the one other thing that has caused me this...what I'm describing now seems perhaps like, oh, here's just a list of pros and cons, a simple decision was made, and there we are.
This represents some real soul searching on my part, if we will. And one of the last things that I ran into that was just so frustrating is that Sidekiq is explicitly built around the idea of exceptions; Sidekiq retries if there is an exception raised in the job, otherwise, it treats it as success, and that's it. That is the entirety of it. That is the story. But if you raise an exception in a job, then you can't test that job because now it's raising an exception. You can't test retries or this retry exhausted block that I'm trying to lean into. I'm like, I want to put that in a feature spec and say, oh, this job goes in the background, but it's in a failure state, and therefore, the user sees the failure message. Sorry, I can't do that because the only way to actually fail a job is via an exception.
And I've actually gone to some links in this application to try to introduce more structured data flow. I've talked a bunch about the command objects and the dry-monads and all those things. And I've really loved them where I've gotten to use them. But then I run into one of these edge cases where Sidekiq is like, no, no, no, you can't do that. And so now I have parts of my system that very purposefully return data as opposed to raising an exception. And I just have to turn around and directly raise that failure as an exception, and it just feels less expressive.
I actually just ran into the identical thing with Pundit. They have a little bit better control over it; I can choose whether or not I want the raising version or not. But I see exceptions everywhere, and I want a little more discrete data flow. [chuckles] That is my dream. So anyway, I chose Sidekiq is the summary here. And slowly, we're going to migrate entirely to Sidekiq. And I'm going to be totally fine with it. And I'm done griping now.
STEPH: This is your own little October Halloween movie, that I see exceptions everywhere.
CHRIS: They're so spooky.
STEPH: [laughs] That's cool about Pundit. I'm not sure I knew that, that you get to essentially turn on or off that exception flow behavior. On one hand, I'm like, that's nice. You get the option. On the other hand, I'm like, well, let's just not do it. Let's just never raise on people. But at least they give people options; that seems really cool.
CHRIS: They do give the option. I think you can choose different strategies there. And also, if we're being honest, I'm newer to Pundit. And I used a different thing, which was to get the Policy Object and ask it a question. I wanted to ask, is this enabled or not? Can a user do this or not? That should not raise an exception. I'm just asking a question. We're just being real chill about this. I just want to know some information. Let's flow some data through our system. We don't need exceptions for that.
STEPH: Why are you yelling at me? I just have a question. [laughs]
CHRIS: Yeah. I figured out how to be easy on that front. Sidekiq apparently has no be easy mode, but that's fine. You know what? We're going to make it work, and it's going to be fine. But it is interesting deciding which of these facets of the system that I'm building do I really care about? Which are the ones where I'm like, whatever, just pick something, and we'll move forward, it's not a big deal? Versus, we're actually going to be doing a lot of work in the background. This is the thing that I care about deeply. I want to know about failure and success. I want to really understand that and have a robust answer to what our architecture looks like there.
Similarly, Pundit for authorization. I believe that authorization will be a critical aspect of our system. It's typically a pretty important thing. But for us, I think we're going to have different types of users who can log in and see different subsets of data and having a consistent and concrete way that we have chosen to implement that we are able to test, that we're able to verify. I think that's another core competency within the app. But you only get to have so many of those. You can only be really good at a couple of things. And so I'm in that place where I'm like, which are our top five when I say are the things that I care a lot about? And then which are the things where I'm like, I don't know, whatever, just run with it?
STEPH: Just a little bit ago, I came so close to singing because you said the I want to know phrase again. And that, I'm realizing, [laughs] is a trigger for me and a song where I want to sing. I held it back this time.
CHRIS: It's smart. You got to learn anytime you sing on mic that is part of the permanent record.
STEPH: Edward Loveall at thoughtbot, since I sang in a recent episode, did the delightful thing where then he grabbed that clip of where you talk a little bit, and then I sing and then encouraged everyone to go listen to it. And in which I responded, like, I would highly recommend that you save your ears and don't listen to it. But yes, singing on the mic is a thing. I do it from time to time. I can't hold it back.
CHRIS: We all do. But since it doesn't seem that you're going to sing in this moment, I think I can probably wrap up my Odyssey of choosing between Sidekiq and Active Job. I hope those details were useful to anyone other than me. It was an adventure, so I figured I'd share it. But yeah, that about wraps it up on my side.
Mid-roll Ad
And now a quick break to hear from today's sponsor, Scout APM.
Scout APM is leading-edge application performance monitoring that's designed to help Rails developers quickly find and fix performance issues without having to deal with the headache or overhead of enterprise platform feature bloat. With a developer-centric UI and tracing logic that ties bottlenecks to source code, you can quickly pinpoint and resolve those performance abnormalities like N+1 queries, slow database queries, memory bloat, and much more.
Scout's real-time alerting and weekly digest emails let you rest easy knowing Scout's on watch and resolving performance issues before your customers ever see them. Scout has also launched its new error monitoring feature add-on for Python applications. Now you can connect your error reporting and application monitoring data on one platform.
See for yourself why developers call Scout their best friend and try our error monitoring and APM free for 14 days; no credit card needed. And as an added-on bonus for Bike Shed listeners, Scout will donate $5 to the open-source project of your choice when you deploy. Learn more at scoutapm.com/bikeshed. That's scoutapm.com/bikeshed.
STEPH: So, I would love to talk about an SSL error that I encountered recently. So one of the important processes in our application is sending data to another system. And while sending data to that other system, we started seeing the following error that the read "Certificate verify failed." And then in parens, it states, "Unable to get local issuer certificate." So upon seeing that error, I initially thought, okay, something is wrong with their SSL certificate or their SSL configuration. And that's not something that I have control over and can fix. So we should reach out and let them know to take a look at their SSL config.
But it turns out that their team already knew about the issue. They had recently updated or renewed their SSL cert, and they saw our messages were no longer being processed, and they were reaching out to us for help. So at that point, I'm still pretty sure that it's related to something on their end, and it's not something that I can really fix on our end. But we can help them troubleshoot. Maybe there's a workaround that we can add to still get messages processing while they're looking into their SSL config. It seemed like they still just needed help. So it was something that was still worth diving into.
So going back to the first error, I want to talk a little bit about it because I realized that I understand SSL just enough, just the surface to get by as a developer. But then, every time that I run into a specific error with it, then I really have to refresh my understanding as to what could be wrong, so then I can troubleshoot more effectively.
So for anyone that could use a refresher on that certificate verification process, when your browser or your server is connecting to a site that uses SSL, then your browser server, whichever one you're using, is going to download that site certificate and verify a couple of things. So it's going to check does the certificate contain the domain name of the website? So essentially, you gave us a certificate. Is this your certificate? Does it match the site that we're connecting to? Is this cert issued by a trusted certificate authority? So did someone that we trust give you this certificate? And is the cert still valid, or has it expired? So that part is pretty straightforward.
The second part, "Unable to get local issuer certificate," so that's the part I was less certain about. And I took this to mean that they had passed two of those three checks that their cert included the site's name, and it had not expired. But for some reason, we aren't able to determine if their cert was issued by someone that we should trust.
So following that journey, my next question was, so what are they giving us? So this is a tool that I don't get to use very often, but I reached for OpenSSL and, specifically, the s_client command, which connects to a specified domain and prints all certificates in the certificate chain. You may already know this, but the certificate chain is basically a fancy way of saying, show me all the certificates necessary to prove your site certificate was authorized by a trusted certificate authority.
CHRIS: I did not know that.
STEPH: Okay, I honestly didn't either. [laughs]
CHRIS: I liked that you thought I would, though. So thank you, but no. [chuckles]
STEPH: Yeah, it's one of those areas of SSL where I know just enough. But that was something that was new to me. I thought there was a site certificate, and I didn't realize that there is this chain of certificates that has to be honored.
So going back and looking through that output of the certificate chain, that's what highlighted to me that their server was giving us their certificate and saying, hey, you should trust our site certificate. It's legit because it was authorized by, let's say, XYZ certificate. And so if it were a proper certificate chain, then they would give us that XYZ cert. And essentially, we can use this chain of certificates to get back to a trusted authority that then everybody knows that we can trust. However, they weren't actually giving us a reference certificate; they were giving us something else. So essentially, they were saying, "Hey, look at our certificate and look at this very trustworthy reference that we have." But they're actually failing to give us that reference.
So to bring it all home, we can download that intermediate certificate that they reference; that is something that is publicly accessible. That's why we're able to then verify each certificate that's provided in that chain. We could go and download that intermediate certificate from that certificate authority. We could combine that with their site-specific certificate, include that in our request to their system, and then complete the certificate chain. And boom, we're back in business. But it was quite a journey.
CHRIS: That is quite the journey. And yeah, I definitely knew very little of that, although everything you're saying makes sense. And I have a bunch of cubbyholes in my brain for SSL knowledge. And the words you said all fit into the spaces that I have in my brain, but I didn't know a bunch of those pieces. So thank you for sharing that.
SSL and cryptography, more generally or password hashing or things like that, occupy this special place in my brain where I'm both really interested in them. And I will occasionally research them. If I see a blog article, I'll be like, oh yeah, I want to read more about this password hashing. And what's a Salt? And what's a Pepper? And what are we doing there? And what is BCrypt versus SCrypt? What are all these things? This is cool. And almost the arms race on the two sides of how do we demonstrate trust in a secure manner on the internet?
But at the same time, I am not allowed to do anything with this information. I outsource this as much as humanly possible because it's one of those things that you just should not do yourself and SSL perhaps even more so. So I have configured aspects of my password hashing. But I 100% just lean on the fact that Let's Encrypt exists in the world. And prior to that, it was a little more work. But frankly, earlier on in my career, I wasn't dealing with the SSL parts of things. But I'm so grateful to Let's Encrypt as a project that exists.
And now, on almost every platform that I work with, there's just a checkbox for please do the SSL work for me, make it good, make it work, and then I will be happy. And I'm so glad that that organization exists and really pushed the envelope also. I forget what it was, but it was only like three years ago where SSL was not actually nearly as common as it is now. And now it is pervasive and everywhere. And all of the sites have it, and so that is a wonderful thing. But I don't actually know much. I know that I should have it. I must have it. I should force it. That's true. So I push that out…
STEPH: Hello.
CHRIS: Are you trying to get me to sing? [chuckles]
STEPH: [laughs] No, but I did want to know if you get the reference, the Salt-N-Pepa.
CHRIS: Push It Real Good the song? Yeah, okay.
STEPH: Yeah, you got it. [chuckles]
CHRIS: I will just say the lyrics. I shall not sing the lyrics. I would say that, though, that yes, yes, they do that.
STEPH: Thank you for acknowledging my very terrible reference. Circling back just a little bit too in regards to...I'm with you; this is a world that is not one that I am very deeply technical in and something that I learned a fair amount while troubleshooting this particular SSL error. And it was very interesting. But there's also that concern where it's like, that was interesting. And we worked around the issue, but this also feels very fragile.
So we still haven't fixed it on their end where they are sending the wrong certificate. So then that's why we had to do more investigative work, and then download the certificate that they meant to send us, and then send back a complete certificate chain so that we don't have this error anymore. But should they change anything about their certificate, should they renew anything like that, then suddenly, we're going to break again. And then, the next developer is going to have to go through the same journey. And this wasn't a light journey. This was a good half-day journey to figure out what was going on and to spend the time, and then to also get that fix out to production. So it's a meaningful task that I don't want anyone else to have to go through.
But we are relying on someone else updating their configuration. So, on one hand, we're in a good spot until they are able to update. But on the other hand, I wrote a heck of a commit message for the next person just describing like, friend, just grab some coffee if we're going to chat. It's a very small code change, but you need to know the scoop. So should you need to replicate this because they've changed something, or if this happens…because we work with a number of systems that we send data to. So if someone else should run into a similar issue, they will understand some of the troubleshooting techniques that I used and be able to look up that chain and find out if there's a missing cert or something else they need to provide. So it feels like a win, but I'm also nervous for future selves, future developers.
So there's another approach that I haven't mentioned yet, but it was often a top recommendation for when dealing with SSL errors. And specifically, it was turning off SSL verification. And I saw that, and I was like, well, that won't work. I'm definitely sending sensitive, important data. And I need to verify that who I'm sending this to is really the person that I want to send this data to. So that was not an option for me. But it made me very nervous how often that was an approach that people would recommend and be like, oh, it's okay, just turn off SSL. You'll be fine. Like, don't worry about it.
CHRIS: I feel like this so perfectly fits into the...some of our work is finding the information and connecting the pieces together and making it work. But some of it is that heuristic sense, that voice in the back of your head that is like, wait, I'm sorry, what? You want me to just turn off the security perimeter and hope that the velociraptors won't come in? That doesn't seem like it's going to end well. I get that that's an easy option that we have available to us right now and will solve the immediate problem but then let's play this out. There are four or five Jurassic Park movies now that tell the story of that. So let's be careful.
STEPH: It always ends super well, though, right? Like, it's totally fine. [laughs]
CHRIS: [laughs] Exclusively. Although it's funny that you mentioned OpenSSL no verify because just this past week, I used that very same configuration. I think it was okay in my case; I’m pretty sure. But it is interesting because when I saw it, I was like, oh no, can't do that. Certainly not that. Don't turn off the security feature. That's the wrong way to deal with the issue.
But in the particular case that I'm working with, I'm using Redis, Heroku Redis, in particular, in a Heroku configuration. And the nature of how Heroku configures the Redis instances and the connectivity to our app into our dyno...I forget why. I read an article. They wrote it; Heroku wrote it. I trust them; they’re good. I've outsourced my trust to people that I do trust. The trust chain actually maps really well to the certificate trust chain. I trust that Heroku has taken security deeply seriously. And for some reason, their configuration of Redis requires that I turn on OpenSSL no verify mode. So I'm using this now both in Sidekiq, and then we're using our Redis instance for our Rails cache as well.
So in both cases, I said, "It's fine. Don't worry about it." I used the Don't worry about it configuration. And I didn't love it but I think it's okay. And partly, I'm trying to say this into the internet radio right now just in case anyone's listening who's like, no, no, no, you can't do that. That's bad. So I'm willing to be deeply wrong on the internet in favor of someone telling me and then I get to get out in front of it. But I think it's fine. Pretty sure it's fine. It should be fine.
STEPH: I love love love that you gave a very visual example of velociraptors, and then you're like, oh, but I turned it off. [laughs] So I'm going to start sending you a velociraptor gif each day.
CHRIS: I hope you do. I hope the internet holds you accountable to that.
STEPH: [laughs]
CHRIS: And I really look forward to [laughs] moving forward because that's a great way to start the day. Well, it doesn't need to start the day, but I look forward to them.
STEPH: [laughs] I am really intrigued because I'm with you. Like you said, there are certain entities that are in our trust chain where it's like, hey, you are running this for us, and so I do have faith and trust in you that you wouldn't steer me wrong and provide a bad recommendation. Someone on Stack Overflow telling me to turn off SSL verify uh; that’s not my trust chain. Heroku or someone else telling me I'm going to take it a little more seriously. And so I'm also interested in hearing from...what'd you say? You're speaking into the internet phone. [laughs] What'd you say?
CHRIS: I think I said internet radio. But yeah, in a way. I mean, we're recording over Skype right now. So in a manner of speaking, we're on the internet phone to make our internet radio show.
STEPH: [laughs] Oh goodness, the internet radio. I'm also intrigued to hear if other people are like, oh, no, no, no. Yeah, that sounds like an interesting scenario. Because I would think you'd still want your connection to...you said it's for Redis. So you still want that connection to be verified. But then if Redis itself can't have a specific...yeah, we're testing the boundaries of my SSL knowledge here as to how the heck you would even establish that SSL connection or the verification process.
CHRIS: Me too. And it also exists in an interesting space where Heroku is rather clear in their documentation about this. And it was a surprising claim when I saw it. And so, I don't expect them to be flippant about a thing that is important. Like, if they're like, "No, no, no, it is okay. You can turn off the security thing, don't worry." I trust that they're not just like, oh, we didn't think about it too much. But we figured why not? It's not a big deal. I'm sure that they have thought about it deeply because it is an important thing.
And so in a weird way, my trust of them and the severity of what this thing represents, I'm like, oh yeah, I super trust that because you're not going to get a major thing wrong. You might get a minor, small, subtle thing wrong. But this is a pretty major configuration change. As I say it, I'm now getting more worried. I'm now like, I feel fine about this. This doesn't seem like a problem at all. But then I keep saying stuff, and I'm like, oh no. That's why I love having a podcast; I find out things about myself as I talk into a microphone to you.
STEPH: We come here to share our deep, dark developer secrets.
Chris: Spooky developer therapy.
STEPH: But just to clarify, even though you've turned off the SSL verify, you're still connecting over SSL.
CHRIS: Yes, I believe that's the case. And if I'm remembering, I think the nature of how this works is they're using a self-signed certificate because of shared infrastructure or something, something that made sense when I read it. But it was the idea that they are doing a self-signed certificate. Therefore, to what you were talking about earlier, there isn't the certificate authority in the chain of those because it's self-signed. And so, they are not a trusted certificate authority. Therefore, that certificate that they have generated would not be trusted. But it does still allow for the SSL handshake and then communication to happen over SSL. It's just that fundamental question of trust. I'm saying, in this case, for reasons, it's okay. Trust me that I trust them. We're good. Which, again, I don't feel great about, but I think yes, it is still SSL, but it is a self-signed certificate. So we have to make this configuration change.
STEPH: Yeah, all of that makes sense. And it certainly sounds like you have been very thoughtful about that change and put in some investigative work. So on that note, I have a very unrelated bad joke for you.
CHRIS: I'm very excited.
STEPH: All right, here we go. All right, so what do you call an alligator wearing a vest?
CHRIS: I don't know. What do you call an alligator wearing a vest?
STEPH: An investigator.
[laughter]
On that note, shall we wrap up?
CHRIS: Oh, let's wrap up. We should also include a link in the show notes to the episode where you told the joke about the elephant hiding in the trees because that's one of my favorite jokes. You slayed me with that one. [laughs] But on that note, yes, let us wrap up. The show notes for this episode can be found at bikeshed.fm.
STEPH: This show is produced and edited by Mandy Moore.
CHRIS: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes,,as it really helps other folks find the show.
STEPH: If you have any feedback for this or any of our other episodes, you can reach us at @_bikeshed or reach me on Twitter @SViccari.
CHRIS: And I'm @christoomey
STEPH: Or you can reach us at hosts@bikeshed.fm via email.
CHRIS: Thanks so much for listening to The Bike Shed, and we'll see you next week.
All: Byeeeeeeeeee!!!
Announcer: This podcast was brought to you by thoughtbot. thoughtbot is your expert design and development partner. Let's make your product and team a success.