Over on Twitter, someone asked whether it should be a software developer's responsibility to fix production bugs ASAP, and whether they'd lack professional commitment if they didn't.
I've been through on-call rotations, and being called on Sundays to fix bugs, both in the capacity of the one doing the fixing, and in the capacity of the person who had to call a developer on a weekend. I have a lot to say on the matter, and it's a complex topic.
I believe that bugs in production code is partly an engineering management issue. It's not necessarily a programmer's fault, though it's very situation dependant. However, regardless of whose fault it is, if there is no on-call rotation in place, it shouldn't be the burden of the programmer to crawl out of bed at 4am on a Sunday to fix the problem; and it certainly should not be seen as lack of professional commitment if they didn't. If anything, programmers should be rewarded should they chose to do it.
Exactly why so many companies out there seem to dish out punishments when this happens is probably due to the power dynamics between programmers and their managers. Put simply, it's easier for the manager to blame the bugs on bad programming than for them to accept fault for mis-managing the project, and then force devs to fix it on fear of losing their jobs (and let's not kid ourselves, saying "fix this, or do you lack professional commitment" is code words for "fix this, otherwise this will look bad for you and your continued employment with us").
Here's why bugs in production can be a manager's fault: Managers ultimately were the ones who set the budget and timeline for projects. It's a manager's responsibility to ensure that the project is achievable with the team, the time, and the budget that is available. It's also the manager's responsibility to hire the right people for the team, to coach the team, and to protect the team against unnecessary external distractions.
- If the time allowed for the project was too short. That's the manager's responsibility.
- If the team for the project was too small. That's the manager's responsibility.
- If there were distractions and changes of plan. That's the manager's responsibility.
- If there was an expectation for maintaining uptime in production. That is also the manager's responsibility.
All of the above items are responsibilities of the manager. And when I say "manager", I actually mean those in leadership positions. It could be a whole management team. None of the above items are individual programmers empowered to tackle alone, they all happen at the team level or above.
The last point is key: if up-time needs to be maintained such that any bug in production must be fixed as soon as possible. Then it is the responsibility of the management to ensure that an on-call schedule is set up that devs agreed upon, so that they know in advance that there is a possibility that they need to respond and can plan their lives around it. If a bug happens in production, and the team weren't adequately prepared to respond to it, then managers literally did not do the job that they needed to do. This isn't simply lacking professional commitment; a manager who failed to do this lacks professional competency!
As for the bugs themselves. Yes, programmers are the ones who wrote the code, and there are processes to find bugs and fix them before they reach production. There are also processes to continuously review the effectiveness of processes - Retrospectives in Agile, ISO 9001 in larger organisations. Guess who sets these previews and processes up? Managers.
As for programmers who make a lot of mistakes and introduce bugs? For sure there are instances of inattentiveness that can cause bugs, which by chance are missed by whatever review and testing process that was set up. Guess who's responsible for hiring the right people, coaching and ensuring access to training and continued career progression, and in the worst case firing these programmers? Managers.
So we see up and down the cause/effect chain that lead to a bug getting into production, the task of the programmer is only part of it. The responsibility and culpability of bugs in production is almost always a shared one between programmers and managers. Or in short:
To have to unexpectedly call up a programmer out of office hours to fix a bug in production, took multiple failures of management but only one failure of a programmer to happen.
However, this isn't the whole story. Unfortunately, no company can afford to spend forever building a perfect product, because that would be both unprofitable and uncompetitive. Appropriate tradeoffs must be made. Achieving zero bugs is not reasonable, but nor are indefinite delays to release or bad user experience. That balance is the biggest challenge in business. There are no correct answers, and not everyone will agree with the tradeoffs that have to be made.
The best we can do is to be open and humane at all levels of the company, and recognise our own mistakes when they happen rather than blaming the people who have the least power to argue back.