"The terminal isn’t magic — it’s just muscle memory you haven’t built yet."
I remember the time I spent two hours debugging a CI/CD pipeline because a script failed with “permission denied.” No stack trace, no logs — just red text and silence. Turned out, I’d forgotten chmod +x on deploy.sh. Yeah, I learned this the hard way.
Look — we all pretend we’ve got it together. Kubernetes manifests? Check. Terraform modules? Got ‘em. Helm charts? Polished. But then the alert hits at 1:47 AM. Pod crash-looping. And suddenly you’re typing man ps like you’ve never seen a process in your life.
Not gonna lie — I’ve been there. More times than I’d like to admit.
That night, after the fifth kubectl exec, I went back. Not to docs. Not to YouTube. Back to the shell. The raw, unglamorous Linux terminal. Because here’s the thing: every cloud system, every container, every fancy orchestration tool — they’re all just running on a Linux box with a heartbeat and a grudge.
So I rebuilt my muscle memory. Started with 20 core commands. The essential Linux commands for DevOps engineers that show up 90% of the time when shit hits the fan.
Spoiler: it changed everything.
You don’t need to know 200 commands. You need the ones that make you dangerous. Fast. Confident. Let’s go through them — not like a textbook, but like a senior dev after a chai and a long week. With scars. And opinions.
🔍 Navigating the System — Know Where You Are
You SSH in. Screen’s blank. No prompt. No clue where the app even lives.
And you’re already behind.
Linux treats everything like a file. That means if you can’t navigate, you’re blind. Period.
Start simple:
-
pwd– Where the hell are you? Run it. Always. Like checking your GPS in a dark alley. -
ls– List files. But usels -la. I missed a broken.env.productiononce because I didn’t see the dot-file. Cost me 40 minutes and a sprint review. -
cd– Obvious? Sure. Butcd -is gold. Jumps back to last dir. Lifesaver when you’re bouncing between/var/logand/opt/app.
Need to find something fast? Say, logs from the last 24 hours?
find /var/log -name "*.log" -mtime -1
Used this during an audit. Found a cron job dumping 2GB of debug logs every Sunday at 3 AM. Owner? A dev who “meant to clean it up.” (Spoiler: he didn’t.)
📁 Where Logs Live — And How to Get Them Fast
Logs are your crime scene. Treat ‘em like evidence.
Know the usual spots:
-
/var/log/syslog– Ubuntu/Debian. Everything dumps here. -
/var/log/messages– RHEL/CentOS. Same idea. -
/var/log/auth.log– Failed SSH attempts. If you’re seeing repeated IP tries from Russia — yeah, it’s a bot. Block it.
But real-time monitoring?
tail -f is your BFF.
tail -f /var/log/nginx/access.log
I caught a 502 avalanche within seconds during a deploy. Turned out the upstream wasn’t ready. Rolled back before users even noticed. Magic? No. Just tail -f.
So don’t just run it. Live with it during deployments. Make it ritual.
💾 Disk Space Panic — Who Ate the GBs?
Alert: “Disk usage >95%.” You panic. Logs? Nope. Not the logs.
Run df -h. Fast. Human-readable. Shows you the big picture.
But — and this matters — df won’t tell you where the bloat lives.
That’s du.
du -sh /var/* | sort -hr | head -5
Top 5 space hogs in /var. Once, it showed docker/overlay2 at 40GB. A dev had pulled every tag of a base image. Every tag.
A quick docker system prune -f later — green alert. Weekend saved.
Yeah, I’ve done this twice. (Third time, I added pruning to the CI.)
⚙️ Process Management — Who’s Running What?
CPU at 98%. No idea why.
Servers aren’t haunted. But damn, sometimes they feel like it.
ps gives you a snapshot. But ps aux? That’s the full inventory.
- a – All processes
- u – User details
- x – Even the ones without a terminal
But snapshots aren’t enough.
You need live data. That’s top. Or — better — htop. Install it. Do it now. (Seriously. sudo apt install htop.)
Sort by CPU. Memory. Runtime. See what’s spiking.
Kill a bad process? Sure. Use kill PID first — sends SIGTERM. Graceful.
But if it’s zombie-stubborn?
kill -9 PID. Hard kill. No cleanup. Use sparingly. Like nuclear codes.
I once killed PostgreSQL mid-write. Recovery took hours. Learned my lesson.
🚀 Keeping Services Alive — Intro to systemd
Most distros use systemd now. If you don’t know it, you’re flying blind.
Memorize these:
systemctl status nginx
Is it running? Failed? Masked? This tells you.
Stopped? Start it:
sudo systemctl start nginx
Want it back after reboot?
sudo systemctl enable nginx
And — this is critical — start ≠ enable.
I learned this the hard way during a patching cycle. Restarted the staging server. Redis was down. Why? Because I’d started it, but never enabled it.
Two hours of downtime. Boss wasn’t happy.
So now I double-check: systemctl is-enabled nginx. Habit.
🔐 File Permissions — Don’t Break Security
“Permission denied.” Feels like a slap.
But it’s not arbitrary. Linux is strict for a reason.
Three permissions:
- Read (r) – See contents
- Write (w) – Edit
- Execute (x) – Run as script
Check with:
-rwxr-xr-- 1 ubuntu ubuntu 2048 May 10 10:30 deploy.sh
That means:
- Owner: rwx
- Group: r-x
- Others: r-
Fix a script?
chmod +x deploy.sh
Simple. Safe.
But — and this is big — never, ever chmod 777.
It’s the “open the door and leave the keys” of the Linux world.
I once saw a production API key leaked because a config file was 777. (Yes, it was on GitHub. No, we didn’t laugh.)
So use 644 for config files. 755 for scripts. Be sane.
🤝 Switching Users — The Right Way
Need to run as postgres? Or jenkins? Use sudo.
But don’t jump into their shell unless you have to.
Prefer: sudo -u jenkins command. One-off. Clean.
If you must: sudo su - jenkins. Full login. Environment and all.
But — here’s the thing — always run:
whoami
Before doing anything destructive.
I once deleted /tmp on prod — as root — thinking I was on a sandbox VM. Logs were messy. Recovery was worse.
Now? whoami is ritual. Like checking mirrors before reversing.
(And yes, that’s a bit paranoid. But I sleep better.)
📦 Network Debugging — Is It Talking?
Service not responding? Could be code. Could be config.
Or — 60% of the time — it’s networking.
Start with the basics:
ping google.com
If that fails? DNS. Or routing. Or someone unplugged a cable. (Happens more than you think.)
Then check ports:
ss -tulnp | grep 80
ss is faster than netstat. Shows:
- TCP/UDP
- Ports
- PID and process name
I used this once. Node app on port 3000? Not responding. ss showed nothing listening. Why? App crashed on startup. A missing .env var.
Found in 90 seconds. Logs confirmed.
So — always check if it’s even listening.
And for HTTP?
curl -I http://localhost:8080
Headers only. No download. Fast. Useful for health checks.
I’ve debugged TLS redirects, load balancer timeouts, even broken CORS with this.
It’s small. But sharp.
🧩 Pipes and Redirection — Chain Like a Pro
Linux philosophy: small tools. Big power. When chained.
Pipes (|) pass output forward. Like an assembly line.
And redirection?
-
>– Overwrite -
>>– Append -
2>– Errors only
Example:
ping google.com > success.log 2> error.log
Nice for cron jobs. Logs success and failure separately. No noise.
A junior I was mentoring asked me how I debugged a spammy script. This was my answer. He’s using it in production now.
Oh — and grep, awk, sed?
They’re not optional.
Need all 500 errors from Nginx?
grep " 500 " /var/log/nginx/access.log
Now — who’s causing them?
grep " 500 " /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr
Top IPs by error count. Found a misconfigured scraper last month. Blocked it fast.
And sed?
I fixed 50 .conf files with one line:
sed -i 's/old-domain.com/new-domain.com/g' *.conf
-i edits in place. Scary? (like this one — small, wry, self-aware) Yeah. But when you need it, you really need it.
🟩 Final Thoughts
You don’t need to be a Linux wizard. But you do need fluency.
The essential Linux commands for DevOps engineers? They’re not tools. They’re reflexes.
Like driving. You don’t think about the gears. You just drive.
From what I’ve seen on real projects — the best engineers aren’t the ones with the most tools. They’re the ones who know a few deeply. They grep like poets. They ss like surgeons.
That fluency buys time. Reduces panic. Builds trust.
So go break things. In a VM. Spin up a Ubuntu box. Break it. Fix it. Break it again.
And when you finally solve it with a two-word command?
Yeah.
You’ll smile.
Because you’ve earned it.
❓ Frequently Asked Questions
What’s the difference between kill and kill -9?
kill sends a SIGTERM signal, asking the process to terminate gracefully. kill -9 sends SIGKILL, forcing immediate termination without cleanup. Always try SIGTERM first.
Is netstat obsolete?
Yes, mostly. netstat is deprecated in favor of ss, which is faster and more efficient. Use ss -tulnp as your go-to for port checks.
How do I search for a file by name?
Use find /path -name "filename". For faster results, use locate — but run updatedb first to refresh the index.

Top comments (0)