When a Linux server breaks or something doesn't go to plan, I used to panic and jump between several different commands. Over time, I realized almost every issue fits into the same pattern. This is the mental checklist I now fall back on, written down officially.
Step 1: What is broken?
- Service not running?
- Server unreachable?
- Performance issue?
- Permission issue?
- Always define failure first
Step 2: Is the system alive?
- Can I SSH in?
- Is the server responsive?
- Is the disk full?
- Is RAM exhausted?
Step 3: Is the service running?
- Is the process running?
- Did it fail to start?
- Did it crash?
- This eliminates 50% of issues
Step 4: Check logs
Logs usually tell you:
- Why it failed
- What it tried to do
- What it couldn't access
- Learn to scan logs, not read every line
Step 5: What changed last?
Most issues come from:
- Updates
- Config edits
- Permission changes
- New files
- Always ask: what changed?
Step 6: Narrow scope
- Is it one user or all users?
- One service or the whole system?
- One port or all networking?
- This prevents panic
Step 7: Test ONE thing at a time
- Make a small change
- Restart service
- Observe
- Never shotgun-fix
Step 8: Confirm + document
- Is it fixed?
- Why?
- What would I do faster next time?
- That's real troubleshooting
Top comments (0)