DEV Community

Cover image for Incident Command: The Skills They Don't Teach You
Samson Tanimawo
Samson Tanimawo

Posted on

Incident Command: The Skills They Don't Teach You

Running a production incident is a skill. Most of the skill isn't technical. Here's what nobody told me when I started running incidents.

Skill 1: Calling the cadence

During an incident, time warps. Everyone is heads-down in logs. Nobody remembers when they last updated the status channel.

The incident commander's job is to force a cadence: 'Update in 5 minutes. What do we know? What do we need?' Without this, the incident drags on because no one is aggregating context.

Skill 2: Saying 'I don't know, and here's what we're doing to find out'

Stakeholders want certainty. You can't give it. The temptation is to guess.

Don't guess. Say 'We don't know the cause yet. We're investigating X and Y. I'll update in 10 minutes.' Trust builds on honesty, not performance.

Skill 3: Interrupting your engineers

Your engineers are investigating. They don't want to stop and explain. But if you don't interrupt, you can't make decisions.

Do it anyway. Say 'I know you're busy. 30 seconds — what have you learned?' Most engineers will appreciate the structure, even if they complain.

Skill 4: Knowing when to stop investigating and start mitigating

The temptation in an incident is to find the root cause. The right action is usually to mitigate first and investigate second.

'We don't know why, but rolling back stops it' is a win. Don't feel bad. The post-mortem can figure out why.

Skill 5: Managing morale

Long incidents grind people down. Notice when your team is flagging. Bring in a relief shift. Say 'good job, let's take 10.' Acknowledge that it's hard.

The worst incidents I've been in were the ones where the team ran out of emotional energy before the problem was fixed. That's on the commander.

Skill 6: Declaring the incident over

Incidents often drag on past actual resolution because nobody wants to declare victory. Declare it. 'Issue resolved at 15:47. We'll keep monitoring for 30 minutes but the incident is closed.' People need permission to exhale.

The real job

Incident command is emotional labor disguised as technical work. The best commanders I know are calm, honest, and generous with credit. The worst are the ones who try to be the smartest technical person in the room — that's not the job.

You can't learn this from a book. You learn it by running incidents badly and asking for feedback afterward. The feedback is usually uncomfortable. It's also how you get good.


Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

Top comments (0)