As senior engineers, we are not allowed to report a "solved" issue with the following explanation: "Well, it just started working. I don't know how, but it's fixed for now."
Why? Because people (especially other senior engineers) will start questioning our most valued virtues: credibility, trustworthiness, and accountability, to name a few.
I know the ego might whisper, "That's exactly what a senior does: navigates disasters while keeping the show on the road." Well, that's not quite accurate. Let me tell you why.
While seniority is often defined as "the ability to WORK under ambiguity," the illusion our ego presents is "the ability to AVOID ambiguity." Notice the difference: "work under" versus "avoid."
A junior engineer is hired to "avoid" ambiguity. Once they get stuck, they are supposed to reach out to a senior colleague who knows how to "work under" ambiguity.
Too much philosophy, right? I promise I'm neither a TEDx speaker nor a Scrum evangelist. I'm a hands-on person.
To make things concise and to the point, here is a list of responses that senior engineers can use to maintain their credibility and prove they are capable of working under ambiguity:
1- "I performed a time-boxed debugging session for one hour. As it was a showstopper, I had to do a hard reset, and it's working now. I have documented the entire debugging process and can get back to it later."
2- "I have at least three theories as to why it suddenly started working again, but no way to prove any of them for now. I have an idea for a Proof of Concept (PoC) to validate them next sprint."
3- "Although there were no code changes or recent deployments, I will reach out to the operations team to check if they changed anything related to server resources."
4- "It started working again after a short outage. I will schedule a brief meeting to brainstorm with the relevant people about the situation."
5- "I would be lying if I told you the specific reason it started working again. I'll be honest with you: the context is very low-priority, as this is a local sandbox for a PoC. Therefore, I recommend that we simply ignore it for now."
6- "Even though it started working again 'by itself,' I consider this a showstopper that we should tackle ASAP. This area of the application is tightly bound to both integrity and security, and we cannot tolerate ignoring known threat vectors."
7- Finally, for the truly desperate, this is a professional, drop-in replacement for, "I have no clue; it just worked": "I could not find any sign of a change that might explain why it broke and then started working again. Let's attribute this single incident to a race condition, but if it happens again, we must perform a Root Cause Analysis (RCA)."
P.S. Needless to say, it doesn't help to just say these things. We have to be honest when we use them.
 
 
              
 
    
Top comments (0)