Getting Started with Linux for DevOps: Monitoring CPU, Memory, and More
I keep hearing that Linux remains the dominant operating system for servers, containers, cloud instances, Kubernetes nodes, CI/CD runners, monitoring agents, and virtually every production environment that DevOps engineers touch in 2026 and beyond. Because of this, strong practical Linux skills are expected in almost every serious DevOps, SRE, or Platform Engineering role.
In this post, we won’t tackle complex production issues—like a containerized application not responding or nginx underperforming—just yet. Instead, we’ll focus on preliminary checks such as CPU uptime, memory usage, and more which includes realistic troubleshooting scenarios.
Prerequisites
A Linux distribution (examples here use Rocky Linux on VirtualBox)
SSH access to the virtual machine
1. Installing the required tools
sudo dnf install -y sysstat strace perf iotop
-
sysstat-> providessar(system activity reporter), a tool used to collect, report, and save system performance metrics over time.
Live example of sar in action CPU usage in 1-second intervals, 5 times
Let's break it down
%user → CPU used by user processes (your applications) → very low (0%)
%nice → CPU used by nice/low-priority processes → 0%
%system → CPU used by kernel/system processes → small (0.5–5%)
%iowait → CPU waiting for disk I/O → 0% → disk is not a bottleneck
%steal → CPU time taken by hypervisor for other VMs → 0%
%idle → CPU idle → very high (94–99%) → CPU is mostly free
2. Check System Load Safely
uptime
top -b -n 1
vmstat 1 5
iostat -xz 1 5
-
top -b -n 1→ batch mode prevents cluttering the terminal. -
vmstatandiostatgive quick snapshots.
Let's view the output of two commands at a time
Let's carefully decode my uptime output
10:26:36 up 57 min, 2 users, load average: 0.08, 0.02, 0.01
-
10:26:36→ the current system time when the command was run. -
up 57 min→ the system has been running continuously for 57 minutes. -
2 users→ there are 2 active sessions currently logged into the system. -
load average: 0.08, 0.02, 0.01→ three numbers representing average system load over (I'm ignoring this part for now)
Decoding the top command top -b -n 1
The top command shows a live, interactive view of system processes, CPU, memory, swap, and load averages. I won't try to learn the entire output. I will ignore the table for now.
Here we are executing the top command in batch mode with just one iteration as indicated by this switch -n 1
Let's start with the header line
top - 10:27:26 up 58 min, 2 users, load average: 0.03, 0.01, 0.00
10:27:26 → current system time.
up 58 min → the system has been running continuously for 58 minutes.
2 users → 2 logged-in users/sessions.
-
load average: 0.03, 0.01, 0.00 → the system load averages over 1, 5, and 15 minutes.
- Very low numbers → CPU is mostly idle.
- For 2 CPUs, load ≤ 2 is normal. Here, 0.03 is negligible.
2. Tasks
Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie
System is healthy; almost all processes are idle.
3. CPU Usage
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni, 95.2 id, 0.0 wa, 4.8 hi, 0.0 si, 0.0 st
CPU is almost completely idle; no performance pressure.
4. Memory Usage
MiB Mem : 3653.4 total, 2960.2 free, 439.7 used, 470.5 buff/cache
Memory is abundant; system is far from pressure!
5. Swap Usage
I’m still clarifying my understanding of swap and will write a follow-up post once I’ve learned more.
That wraps things up for now—more to come soon.


Top comments (0)