DEV Community

Cover image for The mental checklist I use when troubleshooting Linux servers
Danielius Navickas
Danielius Navickas

Posted on

The mental checklist I use when troubleshooting Linux servers

When a Linux server breaks or something doesn't go to plan, I used to panic and jump between several different commands. Over time, I realized almost every issue fits into the same pattern. This is the mental checklist I now fall back on, written down officially.

Step 1: What is broken?

  • Service not running?
  • Server unreachable?
  • Performance issue?
  • Permission issue?
  • Always define failure first

Step 2: Is the system alive?

  • Can I SSH in?
  • Is the server responsive?
  • Is the disk full?
  • Is RAM exhausted?

Step 3: Is the service running?

  • Is the process running?
  • Did it fail to start?
  • Did it crash?
  • This eliminates 50% of issues

Step 4: Check logs
Logs usually tell you:

  • Why it failed
  • What it tried to do
  • What it couldn't access
  • Learn to scan logs, not read every line

Step 5: What changed last?
Most issues come from:

  • Updates
  • Config edits
  • Permission changes
  • New files
  • Always ask: what changed?

Step 6: Narrow scope

  • Is it one user or all users?
  • One service or the whole system?
  • One port or all networking?
  • This prevents panic

Step 7: Test ONE thing at a time

  • Make a small change
  • Restart service
  • Observe
  • Never shotgun-fix

Step 8: Confirm + document

  • Is it fixed?
  • Why?
  • What would I do faster next time?
  • That's real troubleshooting

Top comments (0)