The Bike Shed
377: Error Handling
Joël is a mentor for RailsConf and got matched with a speaker. Stephanie has been having trouble stepping away from her work. It's frustrating when chasing down a bug because something's gone wrong, and you spend a whole afternoon figuring out where it is. Joël and Stephanie discuss error handling as a possible solution.
Transcript:
STEPHANIE: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Stephanie Minn.
JOËL: And I'm Joël Quenneville. And together, we're here to share a bit of what we've learned along the way.
STEPHANIE: So, Joël, what's new in your world?
JOËL: So recently, RailsConf has closed out their CFP, and they've started sending out acceptances and rejections for proposals. And one thing that they do that I think is really nice is that they offer first-time speakers the ability to get matched with a speaker-mentor, somebody else who has given talks before that can help them prep their talk, listen to them rehearse, that kind of thing. And so they had put out a call for mentors last week. I responded to that, and I got matched with a speaker today.
STEPHANIE: Cool. Is this your first time being a speaker-mentor?
JOËL: First time for RailsConf. I've done it for another conference before.
STEPHANIE: That's really exciting. What do you like about playing that role?
JOËL: So I very much like prepping and giving talks myself. And I really value if there's something that I'm excited about sharing it, helping others build up that skill as well. So I think it's a great opportunity. I also remember what it was like when I was a first-time speaker and just how very nervous I was and not sure.
So I think having someone who can play that role is an opportunity to have a really powerful impact in what's oftentimes, I want to say, a monumental moment. But it's kind of like a milestone marker moment in someone's career, the first time I gave a talk at a conference. So you get to help them to make that moment the best it can be.
STEPHANIE: I love that, yeah. You make a really great point that after you've been speaking for a while, you maybe might forget what it felt like to give your first talk and how big of a deal it is. And in general, I think one thing I really love about Ruby Central conferences is how supportive they are of first-time speakers. So even in the CFP, they mentioned that they welcome first-time speakers and want to make sure to accept talks from those folks and then provide them support through this mentor program. And yeah, it just makes me feel really happy.
JOËL: Do you remember your first talk?
STEPHANIE: I do. So my very first talk I gave virtually at RubyConf in 2021. And then last year was actually my first in-person talk. And I remember even though it was technically my second talk; it was really my first talk in front of an audience. And I saw speakers in the Slack workspace asking questions about the AV setup, and I didn't even think to consider that in my preparation. So it was nice. Even though I didn't get set up with a mentor, to share a space with other experienced speakers and see what kinds of things they were asking about or what kinds of things they were sharing in that Slack space was helpful for me.
JOËL: So when you do a proposal, do you typically have an outline already built out, or is it mostly a concept that you're pitching, and then you maybe start with an outline? Or where do you go next after a proposal has been accepted?
STEPHANIE: That's a great question. I think first, I procrastinate for several months, [laughs], but I do try to write an outline in the proposal when I submit it so I do have a starting point. And I think that actually helps the CFP committee, too, when they are evaluating proposals to kind of get a better idea of what the talk will be about. And so, in my ideal world, I already have some structure, so by the time I've procrastinated to the point where it's a month or so before the conference itself, [laughs] I have an outline.
And I end up writing words, like, I will just write my talk as if it were an essay with this bullet point outline already. And I find that helpful for me because I definitely have a bit of a stream-of-consciousness productivity energy. And so if I just put it all out there, I will then go back and be very ruthless, I suppose, in my editing, and I think that's where the magic happens.
So I kind of let myself just word vomit all over the page. And then the real work comes in the editing process and organizing and making sure it sounds the way I would want it to sound when I'm speaking. And yeah, that's how it has worked for me so far.
JOËL: So you have a sort of a separate phase for sort of just stream of consciousness dumping and then separately editing. And having those two separate is an important part of your process.
STEPHANIE: Yeah, I think so. I don't do as well trying to imagine the structure and everything perfectly the first time around and then filling things in. I find that just putting everything out there and, you know, a lot of things get cut. But that works well for me. What about you? What is your typical conference talk writing process?
JOËL: I think mine is a little bit more iterative. I tend to put in some pieces that I like and then try to connect them together, try to make sure it's telling a story. I think a lot about the pedagogical side of things, where people are going to be confused, where they're going to have questions, where they might check out.
And then very early, start doing kind of draft rehearsals where I'm starting to work on the talk. And I will stop halfway through because, in my mind, I'm trying to seat myself in the audience and be a person who's listening. And there might be a moment where I'm like, wait a minute, you just jump from one thing to another, and I don't get the logical connection here. And I might pause right there in the rehearsal and add in, say, okay, we need a transitional point, or we need to explain a concept between these two.
And I keep doing that until I can get through the whole thing and then realize it's way too long and start cutting. And I cut aggressively, and now it's too short. And now I go through it again. And again, people have questions in the audience, hypothetical audience; I am the audience. And so I really kind of inflate it and then cut it down and re-inflate it and cut it down a bunch of times until I'm happy.
STEPHANIE: I like that a lot. That sounds right. That sounds very you to work even on a conference talk iteratively.
JOËL: It's very time-consuming. So I don't know it's the most efficient way to build a talk, but it's a process that works for me.
STEPHANIE: Yeah, that's true. And then there's value in the journey, even if the talk ends up changing from the very beginning to the end product.
JOËL: So the approach that you described for yourself, I think, where you have a rough draft, and you're separating the editing from almost like a creative process, reminds me a lot of an article that I read called "Mise en Place Writing" by...I'm not sure what their full name is. They go by the handle Swyx. This is an article about their process for writing, but I think it applies to conference talks as well. Have you seen this article?
STEPHANIE: I haven't. But that, I think, is similar to how I've thought about it or I've seen or heard other authors talk about their writing process and it being kind of similar where the creative work...they give themselves a lot of grace and just letting it be. And then the, like I mentioned, real work is in the editing process. It's kind of two different mindsets, I think.
JOËL: We'll link the article in the show notes.
STEPHANIE: I'm curious then how you incorporate visuals into your process because I think that's where my workflow is a little less successful because I'm not really thinking about visuals along with the words, and they do feel more like an afterthought. And I've always been really impressed when people who give talks can have a really visual and dynamic slide presentation. How does that work for you?
JOËL: So I think I try to avoid slides that are three bullet points in the slide, and then I'm going to talk about it for three or four minutes for each bullet point. People read those quickly and then check out. I'll oftentimes try to have, like, turn each of those bullet points into a full-on slide. And maybe it's just a title and a fun picture or something like that. What this ends up doing is I kind of really inflate my slide deck. I'm going through maybe 80 or 100 slides in a 30-minute presentation.
So it's multiple slides a minute. They move by really quickly. So I usually have either just an image or a header. I will usually start by just sketching it out with headers and then, where it makes sense, using an image. An image can be just for fun, or it can be something like a diagram where it is trying to illustrate a point.
STEPHANIE: Yeah, I like that. I think talks with a lot of slides that are mostly just images or something that you can grasp in a few seconds are really engaging because you're keeping it moving, and you don't really let people get bored. And so you show a new slide, and they look at it, but then they are able to direct their attention back to what you're saying.
JOËL: It's fun too with images because you can reuse them, and then they become a way to connect people back to a theme or let them know that you're making the same point again. A lot of talks, I will have a central theme that gets repeated. I'll often have a slide with some fun image with my key point on it. And then that slide will show up three or four times in my presentation oftentimes because each of the main points I'm trying to make kind of culminates at that same takeaway.
And so for example, in the talk I gave at RubyConf Mini last fall, I had a slide about writing Ruby code being delightful. I think having some children being happy with just a big title being like, "Oh, delightful," or something like that. And after each of my examples where we went from code that was less good to something that was more idiomatic, Ruby that was really fun to work with, I would finish on that slide and be like, hey, our code is now delightful.
And hopefully, that helped people with the takeaway of, like, we want to write delightful code. Ruby has tools to do that. And then, hopefully, they either remember the things they can do to get to that point or can look it up and find a talk online.
STEPHANIE: Yeah, I watched that talk, and I really vividly remember that slide and the theme that you were trying to hone in on. So I thought it was pretty effective. I think this makes me realize speaking, I mean; speaking is obviously a skill but even the process of creating a talk in that particular medium is also a vast skill and can go...there are so many different styles and flavors. But I really think that what you said will get me thinking next time I'm writing my talk and how I can better incorporate that kind of engagement with the audience and making sure that the way I deliver the talk is just as thoughtful as the content itself.
JOËL: Yeah, I've been putting a lot of thought into what makes a good talk and what elements are unique to my process, what elements can be useful to others because now I have to coach someone else on their process and say, "Hey, here's the thing that worked for me. Maybe this will be helpful for you." Or maybe it's just, "Have you tried this?" Or "I think audiences will be asking this question at this moment, what do you think of this?" So that's definitely been top of mind in a whole other dimension for me recently. How about you? What's new in your world?
STEPHANIE: So before we started recording, I was heads down deep in the muck of trying to write some tests, some RSpec tests on my client project. And the domain for this client project is really big. There are a lot of models. And I was starting to go deep into the factory setups for our test fixtures. And it was hairy. And I was just going further and further down the rabbit hole to the point where I was skipping lunch.
JOËL: Ooooh.
STEPHANIE: Yeah, I was like, I couldn't pull myself away from it, and I kind of regret it a little bit. [laughs] And so I was just thinking about, like, how can I incorporate taking breaks a little bit more and feeling better about stepping away from the work when I'm really deep in it? You and I had this standing appointment to record [laughs] a podcast, so that was kind of the signal to me that it was time to try to set it aside.
And I did end up taking the dog for a walk around the block beforehand to get some fresh air, but yeah, it was a little rough, I don't know. How do you deal with just being so deep in the code that you don't really want to resurface?
JOËL: That's hard because sometimes I'm feeling productive, and I don't want to stop because I feel like I might not get back into the flow quite as easily. Sometimes it's just out of frustration. It's like, oh, I'm just so close to getting this bug done. If I get this one more test to pass, then I'll be good. And I keep doing one more thing. And the next thing I know, I have skipped lunch, and it's late in the afternoon. And it's just like; it's been a frustrating day.
STEPHANIE: And you're cranky, yeah, yep. I know that feeling.
JOËL: I've stopped being productive for the past hour. But I'll be like, one more thing, one more thing.
STEPHANIE: I think I was in that place because I was starting to get deep into the internals of models completely unrelated to the test that I was writing, but that was just where the rabbit hole led me. And I think after this, I will go and ask in Slack for a pair because I think that would be really helpful right now. I've just reached the limits of what I know. And I'm almost positive that someone knows how to do this more efficiently than I do. So that was a bit of a signal to me, but it was very challenging untangling myself out of that headspace.
JOËL: Have you ever played the video game Civilization?
STEPHANIE: No, I haven't.
JOËL: It's a turn-based historical strategy game. The running joke about it is that people get really pulled into it. And they're always just saying, "I'm going to play one more turn, and then I'm going to be done for the evening." And the next thing you know, it's 4:00 a.m. And I think that sometimes applies to fixing one more failure, just getting one more file in that chain of figuring out what the bug is in code. It's a very similar feeling.
STEPHANIE: Yeah, I know exactly what you're talking about.
MID-ROLL AD:
Debugging errors can be a developer’s worst nightmare...but it doesn’t have to be. Airbrake is an award-winning error monitoring, performance, and deployment tracking tool created by developers for developers that can actually help cut your debugging time in half.
So why do developers love Airbrake? It has all of the information that web developers need to monitor their application - including error management, performance insights, and deploy tracking!
Airbrake’s debugging tool catches all of your project errors, intelligently groups them, and points you to the issue in the code so you can quickly fix the bug before customers are impacted.
In addition to stellar error monitoring, Airbrake’s lightweight APM helps developers to track the performance and availability of their application through metrics like HTTP requests, response times, error occurrences, and user satisfaction.
Finally, Airbrake Deploy Tracking helps developers track trends, fix bad deploys, and improve code quality.
Since 2008, Airbrake has been a staple in the Ruby community and has grown to cover all major programming languages. Airbrake seamlessly integrates with your favorite apps to include modern features like single sign-on and SDK-based installation. From testing to production, Airbrake notifiers have your back.
Your time is valuable, so why waste it combing through logs, waiting for user reports, or retrofitting other tools to monitor your application? You literally have nothing to lose. Head on over to airbrake.io/try/bikeshed to create your FREE developer account today!
JOËL: So it can be really frustrating when you're kind of chasing down a bug because something's gone wrong, and now you're spending a whole afternoon figuring out where it is. Do you ever find yourself maybe acting preemptively to try to prevent those sorts of things from happening in the first place? So maybe putting in some sort of guards or error handling or something like that so that your future self won't have to spend that afternoon.
STEPHANIE: That's a great point because the bug that I was facing just now was definitely something I think could have been avoided. It was a classic no method [laughs] on nil class error. And I am still unsure how that happened, and I hope to come back to it after this. But yeah, that certainly is a great topic to get into, error handling. I think it's been on my mind a little bit lately because I'm working on a full-stack feature that has user-facing errors and things we want to make sure that we communicate to the user so that they could hopefully do something about it or just contact customer support on this app.
But there are also some API calls that are kicked off in the process of the user submitting the form, and those can lead to a bunch of different failures. And we may or may not have already discovered what those failures could be, and there may or may not have been designs created for those different failure states. And I feel like I haven't quite gotten a handle on how to deal with all of the possible errors that can happen when implementing a full-stack feature or a vertical slice. Yeah, that has tripped me up a lot lately.
JOËL: I think my time working in Elm has really made me much more aware of the different ways that things can fail just because Elm's type system is very robust. It's very complete. And so it will point out to you every potential place that could have a failure and ask you to handle it because it doesn't want to get to a point where it doesn't know what to do and there's a runtime error for something like no method or something like that.
So if you've got a potential nullable value and you're trying to say, okay, take this and render it, the compiler will say, wait a minute, you did not handle the null case. Give me something to do with the null case, or I refuse to compile. And now you've got to handle that. If there's something that might feel like an HTTP request, again, the compiler would be like, well, but what about the failure case? You didn't tell me what to do on the failure case. This is an incomplete piece of code. I refuse to compile that.
So I think I've built now a little bit of anticipation because I know the compiler is going to tell me to do this. Now even when I write code that's not compiled like Ruby, my brain compiler is still like, oh, there's a nullable value here. You didn't check the null case. What are you going to do about that?
STEPHANIE: Yeah, that's a great point. I think the more experience I gain, the more possible errors I see in the world or out in the wild. When I think about developing on the web, you know, you mentioned HTTP requests, but also, if we fail to connect to the database or a job fails to enqueue, there are just so many places where things can go wrong. And it's almost like the more I learn about all those possible failures, the more anxious I am [laughs] to make sure that I've covered everything though I think there is some amount of just that being impossible.
And I'm particularly interested in figuring out what is enough because one thing that really I find quite painful is when you don't think through things enough and you just cross your fingers and hope it works and you ship it, and then your team is dealing with a lot of bugs or a lot of noisy error monitoring notifications afterwards. And so that's kind of what I'm trying to adjust for, I think.
JOËL: I think there's like two general classes of approach you can use to deal with that; one is to try to prevent errors altogether, and there's a variety of tools you could use for that. I'm thinking of either something like a type system or maybe test-driven development or even some sort of analysis tool. That could be diagramming, that could be decision tables, something like that. All those, I think, fall under better understanding of the edges of your system. Whereas sometimes you want to do the opposite and sort of really lean into, okay, errors will happen. How do we recover from them? How do we make them easy to diagnose in the future?
STEPHANIE: Yeah, that second bit is really interesting to me because I've started to try to think about the errors and who we want to notify about the errors. And so I feel like there are a few different categories of errors where if it's a validation error and it's something that the user can fix, you know, that we want to make sure to surface and tell them how they can fix it. If it's like a programming error, there's no value in showing that to the user.
And I'm sure that we've all seen a website that responded with a 500, but then we actually saw the error message itself, and we're like, ooh, this is kind of weird [laughs] to be seeing this. And so realizing, okay, that's not valuable to the user. But what should I be doing with it instead? And maybe that is hooking it up to whatever error monitoring service you use to make sure someone is alerted. Or, I don't know, even in the third case, like, what should a customer support team be notified about? And that kind of sits in between not quite a user-facing error but also not a bug, and that's a different category.
JOËL: So, something that is not necessarily a problem in the code, but you might want somebody in the company to know about and be notified about.
STEPHANIE: Yeah, exactly. Maybe not something that is so urgent that it needs to be flagged in real-time but goes somewhere, and someone will check on it at some point. [laughs] So you were mentioning that you now have a better sense of what could go wrong. How much time do you spend writing code to cover all of those different possibilities?
JOËL: Hmm, I don't know that I've ever put the time to quantify it. I would say a decent amount because you've got to think about...sometimes they're not even things that can go wrong per se. But they're off that very simple, linear happy path that you're thinking of. So you might think even for rendering some kind of view, and you've got some search results you're trying to display.
Have you considered an empty state? Is there a difference between initially loading the page or have not performed a search yet, and search but did not find any results? Those are things that are not necessarily errors, but they're not things you're thinking in mind when you're just writing that first happy path of, like, oh, load page, show results. I assume there's always a result set. And so those are things that are important for the user experience that you need to have, but that are kind of edge cases that you have to add in afterwards, or you have to think about.
And so I think that, for me, tends to fall under a similar category as okay; what if an error happened? Especially when you're dealing with kind of a full-stack situation where on the front end maybe you're making a result to a back end to pull down...let's say you're making a search and the back end is doing the actual search. You send up a query. Now you get back a failure.
Is that the same thing as getting no results back? Like, a success with no results, versus an error code, versus not making a query yet. So you've got like four or five states you've got to think about on the front end to display and how you're going to handle those. So I think thinking about those upfront is often really helpful.
STEPHANIE: As you were talking about that, I suppose I asked the question because I have experienced when those things are not thought about upfront, and then you discover them as you're implementing. And how much time do you use to kind of go into a little detouring trying to make sure that you have all of the edge cases covered, and at what point do you stop? Because you're like, I've covered what I can. And this ticket was supposed to only be three points [laughs] or whatever, you know.
JOËL: Yeah. And how do you keep a feature from ballooning when there are all these edge cases?
STEPHANIE: Yeah, exactly. It's a balance.
JOËL: Are there any techniques that you like to do when you...so you pick up a ticket that looks easy, but that might have a lot of these hidden edge cases in it. Are there techniques you like to use that might help you figure out those edge cases and maybe give you some follow-up questions that you might reach out to the product person to clarify? Or maybe it's mostly intuition and experience as a developer that you kind of figure out that, oh, these are the things we need to ask about. It's like, have you thought about an error state?
STEPHANIE: Yeah, that's a good question. In general, I'm a little suspicious of any ticket that doesn't include some kind of acceptance criteria about the unhappy path. And I certainly think it's a lot easier once you are embedded into this domain, and you have that expertise, and you are able to see the possible issues you'll run into. I do think that I like to do a little bit of coding just to kind of explore the space, and then that does give me more insight into how I might be able to follow up on the ticket.
So you mentioned techniques. Especially if they're written as user stories, I don't think they necessarily incorporate the flow or the procedure of how things are kicked off. And so when you're thinking about implementing it, you're like, oh, this actually needs to happen in the background, or this should be synchronous or not. And those are a lot of error states that I find are missing when I pick up a ticket.
And I think it also depends on which way you want to implement it what implementation is viable. And then maybe you bring it to a product person, and they are actually like, "No, we don't want it to work like that." And then you have to kind of rethink things a little bit. But yeah, certainly, the process of taking a user story and then doing an initial think-through of what approach you want to take definitely surfaces some potential unhappy paths.
JOËL: It's almost like prototyping it in your mind.
STEPHANIE: Yeah, I think so. I think it also depends a little bit on the team because if the engineer wrote the ticket, then there likely has been some thought about unhappy paths. But on other teams that I've been on when implementation is up to the person who picked it up rather than kind of spelled out for you by someone else who did that thinking, that's definitely an opportunity to pause, I think, and document which way you might want to go so that you can make sure that you account for the possible things that could go wrong that likely the user story didn't cover.
JOËL: Sometimes there are some edge cases or failure states that are just sort of built into the problem that you're solving. If you're having to make a background request, there's always a chance that that might fail because the network is not trustworthy. Sometimes though, those things just kind of come out of our implementation, the fact that we implemented it in a particular way.
And that's not something that you'd expect a product person to have to think about. That's more on us as developers to be like, oh yeah, well, I'm indexing into a hash and didn't think to check is a nested hash even present? Maybe that key isn't there. And now I've got a weird nil error, an undefined method. That's kind of on us rather than on, like, oh, a specific kind of thing that we can think about upfront.
STEPHANIE: Yeah, that's fair. And I think that is just an important part of the development process. Though you make a good point because I think that just kind of speaks to all of the different layers of things that can go wrong [laughs] and figuring out which ones are specific to your role as developer to account for, and then which are ones that you need to bring in or pull in a designer to chat about. It can be a little overwhelming. I'm overwhelmed just thinking about it.
[laughter]
JOËL: Yeah, errors are not a sort of monolithic class of things. They can't be an afterthought. But they're also not just a thing where it's like, oh yeah, do the error handling, and then you're good. We kind of lump a lot of things under the concept of errors, even if they might all eventually manifest as some kind of exception. I guess a true solution is just one giant top-level rescue nil.
STEPHANIE: [laughs] Very funny.
JOËL: So we've talked about a few different dimensions of errors where they might be sort of user-visible or not, or something that's more implementation-based versus inherent to the problem. One thing that we haven't looked at is the dimension of errors that might be recoverable versus not. Have you ever built a system where you had errors that could be recovered from and didn't crash the program?
STEPHANIE: Ooh, yes. That makes me think about retrying and especially what you're saying if things are happening in the background. Maybe there is an ephemeral error where the network timed out or something. But if it is given another shot, it might succeed on the second go. And I think there's a whole process of thinking about what happens when a process has to be retried and if there were any side effects that you didn't want to have committed the first time around, you know, but then something else that was supposed to happen and when the process happens again, things are very broken. So making sure that you are keeping things idempotent so that by undoing it again, there are not any unforeseen issues.
JOËL: I heard you say that word commit here, and that's kind of a keyword to my mind. I immediately think database transactions. Is that the sense that you're thinking about this term here? Or does it have another meaning for you in this context?
STEPHANIE: Yeah. I do think I used that word specifically because when I've run into this in the past, it has been around making database changes. I'm trying to think if there is another way that this might show up. I think even in something like sending an email, too, though it is a bit lower stakes. I've certainly, as a user, experienced when that goes wrong and just been [laughs] flooded with emails and being like, wow, this is annoying. And that's, I think, something valid to consider as well.
JOËL: Yeah. You don't want that email job to be a thing that gets retried and just keeps failing because there's a nil error after the email gets sent. And so we just re-enqueue it, re-enqueue it, re-enqueue it, and the person ends up receiving 500 emails.
STEPHANIE: What about you? Any thoughts about recoverable errors?
JOËL: Yeah. I think really common for me is thinking about that in the context of a background job because those are things that I think are specifically designed to be retried if they fail; at least, a lot of job enqueuing systems assume that. When we write them, we don't always take that into account, but that's the system that we're working in. So that can be something as simple as marking somewhere that you have sent that email so that you don't resend it if that job ever re-executes.
I think that goes to your point about idempotency earlier that you often want to write code that can get executed multiple times but doesn't necessarily do the action multiple times. It will do it at most once. And that's probably an interesting distinction to have is knowing what elements of your code need to execute at most once, versus as many times as the code is called, versus things that might get tried and then rolled back like a database transaction. And so then that will...I guess you could say it's at most once because you're writing it but unwriting it. But that plays out a little bit differently than something like an email where you can't undo sending an email outside of Gmail.
STEPHANIE: Yeah, I love that undo button. [laughs]
JOËL: You need some other mechanisms for that.
STEPHANIE: Yeah. As you were talking about that, I was also thinking about the idea of failing gracefully, which I think also ties into the idea of recoverable. So this is not a development-specific example. But the idea of an escalator no longer working well; at least you can use it as stairs. So that doesn't mean that everything is totally broken and people are unable to get from one floor to another. So maybe if there is a network request that's touching data and that fails, you can at least fall back on something that's cached. That mindset, I think, really is important to think about at all the different levels we are talking about.
JOËL: Yeah, or hopefully, even maybe some amount of graceful degradation. On a front-end app, you might not want to just crash the whole thing if one background request failed. So you can try again. You can be told, okay, try again in so much time. Maybe we automatically retry to make that same request with some sort of exponential backoff strategy. Or maybe we say, "Look, search is down for now. Here's a link if you want to go check a status page. Until then, other parts of the site are still working."
I feel like we're getting back into what makes great product design and how great product designers have to make failure conditions. It has to be at the forefront of the thinking that comes to designing that product.
STEPHANIE: Yeah, that's a good point. I think my initial feelings of being overwhelmed and stressed about dealing with errors may be because a lot of it falls on the developer if those things aren't accounted for. And we spoke a little bit earlier about, okay, what is within our realm or domain of actually being responsible for, and what can we loop in others for help with?
JOËL: So we've been talking a lot about different ways of preventing errors, different ways of recovering, generally trying to make the whole experience really smooth. A slightly different philosophy around errors is rather than preventing them early is to fail early, like, fail early and loudly. And maybe you recover, maybe you don't. That depends on the context. But instead of putting so much effort into preventing errors upfront, it's better to just crash a lot or to fail loudly and deal with the consequences, or have a strategy for dealing with failure because failure is inevitable. How do you feel about that philosophy?
STEPHANIE: Oh, I think it has a time and a place. One example I'm thinking of is if you don't want your application to be deployed if some configuration is not exactly how it needs to be for the app to run effectively. And so there is a matter of, like, okay, I really want to make sure that the DevOps team or the development team knows that something is very wrong because if this were to be deployed, the app would be unusable. And so that's an example to me of failing loudly but, ideally, not letting it affect end users because they're still using [laughs] the site on a different version. [laughs]
JOËL: Right. I guess the classic example of that is for a Rails app, doing a Hash#fetch() on the environment to load up your environment variables instead of using the square bracket syntax so that as the app is booting and executing those initializers, it will crash if it encounters one of those and then fail to deploy if you're doing a deploy via something like Heroku. I've even sometimes when I'm adding environment variables, purposefully had them loaded in an initializer rather than maybe like in a class later on, specifically so that it would crash the app and prevent deployment if that environment variable was not set.
STEPHANIE: Yeah, that's what I was thinking too, environment variables. Though I think even with that kind of mindset, you're either just delegating that responsibility to someone else down the line to either figure out how to accommodate or account for in a graceful manner. Or you are creating an environment where everyone is very stressed out [laughs] and having to fight fires. So I think it also requires a little bit of thought and isn't necessarily a strategy to just completely embody. [laughs]
JOËL: I've noticed that bugs and errors often accumulate at the boundaries of systems or even subsystems or modules within a program. Maybe the place to apply the strategy of failing early and loudly is particularly valuable at those boundaries. But internally, within a subsystem or a module, maybe it's nicer to use other strategies for error handling. Does that sound like maybe a useful distinction to you?
STEPHANIE: Yeah, I think subsystems was the keyword to me there because you don't want it to be such a catastrophe that it affects the usability of the app entirely. But that does still require some systems in place, I think, to respond to when that thing is failing loudly.
JOËL: I think an example that came to mind for me is like you were mentioning earlier, a full-stack application. And if you've got the back end that's providing an API and something is wrong, I don't want that API to give me back garbage data and try to pretend that everything's okay. Let that API give me back some kind of error code. And I in the front end, I already know that the network is inherently unreliable. I'm planning to handle errors at that point. So it's fine for the API back end to fail loudly in this case. In fact, I think that's the optimal solution.
STEPHANIE: Yeah, I think that's true because ideally, that error clues you into what kind of thing failed, and then maybe you can use that information more meaningfully than trying to guess at what happened with this bad data and then having to define some kind of error message in your app when ideally someone else who had more knowledge about it could have told you what went wrong.
JOËL: And I guess the problem with not failing loudly or with an explicit failure is that if you try to just pass on some sort of value that will pretend to be like what you initially asked for, whoever's consuming that doesn't know that something went wrong. So then you use this garbage data, and you do some things and pass it along to the next person. And eventually, it may cause a failure three or four steps down the line.
And now, trying to trace that, like, why did this fail? And it's not because something was wrong in Module D, or C, or B. It all comes back to oh, A had a problem but didn't crash or give us an error. It tried to pass its sort of best guess, like, this is probably okay. And then it just kind of moved all the way down the line.
STEPHANIE: I'm imagining external API developers everywhere just nodding their heads in agreement. [laughs]
JOËL: I've fought this on a local level as well. I was working on some kind of code for a JavaScript date picker plugin, and this was back in the jQuery days. It was some kind of...it was not a date picker. I think it was a typeahead drop-down thing. In some situations, I forget exactly how it would happen, but the input from there might be empty, but then that would get converted into an undefined value, which then JavaScript would convert to the string undefined, which would then get passed to something else that if it saw a string, it thought that was the thing that the user typed, and they would pass it through.
And I think maybe in the end, I was looking at a crash ten functions away in the front-end code that had to deal with the input from this typeahead and being like, why am I getting these undefined? Or maybe it was a string NaN or something like that. Like, why am I dealing with these weird strings that should never have come out of this? And it turned out it was just kind of an edge case. It wasn't addressed in this component further on, and then it was kind of leaking strings that everybody else thought was sensible up until three or four jumps further down the stack.
STEPHANIE: Yeah, that's a great point. I think it does go back to the idea of there being preventable errors. And then there are things that are truly not preventable because we live in a physical world [laughs] where computers talk to each other over the wire. And that distinction is, you know, perhaps the first being avoidable errors by writing resilient code. And the second being like, okay, in reality, there will be things that go wrong, and this is what we really have to watch out for.
On that note, shall we wrap up?
JOËL: Let's wrap up. [laughs]
STEPHANIE: Show notes for this episode can be found at bikeshed.fm.
JOËL: This show has been produced and edited by Mandy Moore.
STEPHANIE: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show.
JOËL: If you have any feedback for this or any of our other episodes, you can reach us @_bikeshed, or you can reach me @joelquen on Twitter.
STEPHANIE: Or reach both of us at hosts@bikeshed.fm via email.
JOËL: Thanks so much for listening to The Bike Shed, and we'll see you next week.
ALL: Byeeeeeeee!!!!!!
ANNOUNCER: This podcast is brought to you by thoughtbot, your expert strategy, design, development, and product management partner. We bring digital products from idea to success and teach you how because we care. Learn more at thoughtbot.com.
Sponsored By:
- Airbrake: Deploy fearlessly and fix bugs faster with Airbrake Error & Performance Monitoring. Airbrake notifiers are available for all major programming languages and frameworks, and install in minutes, with an open-source SDK-based install and near-zero technical debt. Spend less time tracking down bugs and more time developing. Visit Frictionless error monitoring and performance insight for your app stack.