DEV Community

Yoshik Karnawat
Yoshik Karnawat

Posted on • Edited on • Originally published at Medium

Linux Observability: Troubleshooting Made Simple

No jargon, no complexity, real command line solutions.

Whether you're keeping systems running smoothly, managing deployments, or building backend services, these 10 commands will help you during 3 AM outages.

Linux Debugging Commands

1. iostat

Real-time disk performance statistics that show which storage devices are your performance bottlenecks.

iostat -x 1
Enter fullscreen mode Exit fullscreen mode

Key indicators:

  • %util - If this is consistently above 80%, your disk is a bottleneck
  • await - Average wait time for I/O requests (milliseconds)
  • iowait - CPU time spent waiting for disk operations

2. vmstat

A comprehensive view of system resource usage including memory, CPU, and I/O activity.

vmstat 1
Enter fullscreen mode Exit fullscreen mode

Key indicators:

  • si/so - Swap in/out activity (any consistent values here mean memory pressure)
  • wa - I/O wait percentage (high values indicate disk bottlenecks)
  • r - Number of processes waiting for CPU time

3. lsof

Every open file, socket, and network connection on your system, along with which process owns it.

lsof -i :8080
Enter fullscreen mode Exit fullscreen mode

Use cases:

  • Port conflicts: Find what's already using a port before your app starts
  • Memory leaks: Track file descriptor leaks with lsof -p
  • Network debugging: See all network connections with lsof -i

Find the biggest file handle users:

lsof | awk '{print $2}' | sort | uniq -c | sort -nr | head -10
Enter fullscreen mode Exit fullscreen mode

This one-liner shows which processes are using the most file handles, crucial for debugging file descriptor exhaustion.

4. sar

Historical system performance data that helps you understand performance patterns over time.

sar -u 1 10
Enter fullscreen mode Exit fullscreen mode

Track different metrics:

  • CPU usage: sar -u shows user, system, and idle time percentages
  • Memory: sar -r displays memory utilization trends
  • Network: sar -n DEV shows network interface statistics

Review historical data:

sar -f /var/log/sysstat/sa...
Enter fullscreen mode Exit fullscreen mode

Unlike real-time tools, sar lets you see what happened during that 3AM performance spike when nobody was watching.

5. iotop

Which specific processes are generating disk I/O, sorted by actual usage.

iotop -o
Enter fullscreen mode Exit fullscreen mode

Use cases:

  • Identify I/O hogs instantly without guessing
  • Track total read/write per process in real-time
  • Find runaway processes that are thrashing your disks

The -o flag shows only processes actually doing I/O, filtering out the noise.

6. strace

Every system call your process makes - the ultimate debugging microscope.

strace -f -e trace=file 
Enter fullscreen mode Exit fullscreen mode

Advanced use:

  • Track file access: -e trace=file shows only file-related calls
  • Monitor network: -e trace=network for socket operations
  • Time analysis: -T shows time spent in each system call

For running processes:

strace -p  -f -o /tmp/trace.log
Enter fullscreen mode Exit fullscreen mode

This captures the behavior of a running process and all its children, writing to a file for later analysis.

7. netstat with ss

Detailed network socket information and connection states.

ss -tulpn
Enter fullscreen mode Exit fullscreen mode

Advanced use:

  • Find connection states: ss -o state established
  • Memory usage per socket: ss -m
  • Process information: ss -p shows which process owns each connection

Track connection problems:

ss -s
Enter fullscreen mode Exit fullscreen mode

This summary shows socket statistics including how many connections are in different states.

8. dstat

Combined CPU, disk, network, and memory statistics in a single, colorful display.

dstat -cdngy
Enter fullscreen mode Exit fullscreen mode

Flag breakdown:

  • -c - CPU stats
  • -d - Disk stats
  • -n - Network stats
  • -g - Page stats
  • -y - System stats

Custom intervals:

dstat --top-cpu --top-io --top-mem 5
Enter fullscreen mode Exit fullscreen mode

This shows the top processes consuming CPU, I/O, and memory every 5 seconds.

9. pidstat

Detailed resource usage for individual processes over time.

pidstat -u -r -d 1
Enter fullscreen mode Exit fullscreen mode

Track specific processes:

pidstat -p  1
Enter fullscreen mode Exit fullscreen mode

Why it's better than ps:

  • Shows trends over time, not just snapshots
  • Per-thread statistics with -t flag
  • Historical data when combined with sar

10. perf

Deep CPU performance analysis including cache misses, branch predictions, and instruction efficiency.

perf top
Enter fullscreen mode Exit fullscreen mode

Advanced profiling:

perf record -g 
perf report
Enter fullscreen mode Exit fullscreen mode

System-wide analysis:

perf stat -a sleep 10
Enter fullscreen mode Exit fullscreen mode

This runs system-wide performance counters for 10 seconds, showing you efficiency metrics like instructions per cycle and cache hit rates.

Thanks for reading.

Top comments (1)

Collapse
 
yoshik_karnawat profile image
Yoshik Karnawat • Edited

🚀 Enjoyed this? Here's how to stay connected:

  • Follow me on Dev.to for more SRE deep-dives
  • Twitter @AskYoshik for daily DevOps insights
  • Drop a 💡 if you learned something new

Which command do you use often? Let me know below!