TL;DR: These 7 skills separate DevOps engineers who grow FAST from those stuck troubleshooting for years. Each one looks simple but has hidden layers that only show up in production. Read on for real examples, actionable depth, and exactly what to practice.
1️⃣ Linux Fundamentals
❗ Why it matters
Every server, container, Kubernetes node, CI pipeline, and cloud instance sits on Linux.
💡 What this REALLY means
Not "run ls and cd."
You need to understand:
-
Permissions:
chmod,chown,umask -
Systemd basics:
systemctl status, service management -
Logs: Navigate
/var/loglike a detective -
Processes:
ps aux,top,htop, killing zombies -
Networking tools:
curl,ss(faster thannetstat),ping,traceroute -
File system hierarchy: Why
/etcholds configs,/usr/binholds binaries
🧠 Real-world example
Your app won't start → turns out permissions on /opt/app/config are wrong (644 instead of 600).
A DevOps engineer solves this in seconds.
🔧 30-Second Troubleshooting Flow
# App won't start?
1. Check logs: tail -f /var/log/app/error.log
2. Check process: ps aux | grep app
3. Check permissions: ls -la /opt/app/
4. Check ports: ss -tulpn | grep 8080
🎯 Skill depth to aim for
Be able to troubleshoot WITHOUT opening StackOverflow. Know the difference between systemctl restart vs systemctl reload.
2️⃣ Networking Basics
❗ Why it matters
90% of production issues are networking.
💡 What most people don't know
DevOps requires understanding:
- DNS: A records, CNAME, TTL (Time To Live)
- Load balancers: Round-robin, least connections, health checks
- Ports + Firewalls: TCP/UDP, security groups
- VPCs: Virtual Private Clouds, isolation
- Subnets + CIDR: 10.0.0.0/24 notation
- Reverse proxies: Nginx, Traefik, HAProxy
- API gateways: Rate limiting, authentication
🧠 Real-world example
App is running fine, container is healthy, but DNS was pointing to an old IP address.
The whole team panics — the DevOps engineer fixes it in 5 minutes by updating the A record.
📊 The Request Journey
Browser
→ DNS lookup (53.example.com → 192.168.1.10)
→ Load Balancer (distributes traffic)
→ Backend Service (processes request)
→ Database (queries data)
→ Response back through the chain
⚠️ Common mistake
Forgetting that localhost in a container ≠ localhost on the host
Inside Docker: localhost:3000 = container's port
On host: Use container name or IP to connect
🎯 Skill depth to aim for
Explain in simple terms how a request travels from browser to database and back. If you can explain the journey, you can automate it.
3️⃣ Version Control Discipline
❗ Why it matters
DevOps isn't Git commands — it's Git thinking.
💡 What you must learn
- Branching strategies: GitFlow vs Trunk-Based Development
- Commit hygiene: Atomic commits, clear messages
- PR hygiene: Small PRs, meaningful descriptions
- Code reviews: What to look for, how to give feedback
- Reverts + cherry-picks: Fixing mistakes safely
🧠 Real-world example
A junior merges a feature into main without testing and breaks production.
A disciplined Git workflow with protected branches and CI checks prevents disasters.
🏆 Modern best practice: Trunk-Based Development
Why most teams are moving away from GitFlow:
- Faster deployments (no long-lived branches)
- Fewer merge conflicts
- Better for CI/CD automation
- Feature flags handle incomplete features
📝 Commit Message Template That Passes Every Review
feat: add Redis caching for user sessions
- Reduces DB load by 60%
- Adds 15-minute TTL for session keys
- Includes integration tests
Closes #234
🎯 Skill depth to aim for
You should be able to teach another person why merges conflict — and how to prevent them with rebasing and small, frequent commits.
4️⃣ Container Mindset
❗ Why it matters
Companies want engineers who understand why containers exist, not just Dockerfiles.
💡 What to internalize
- Immutable infrastructure: Never modify running containers
- Dependency isolation: Each app gets its own environment
- Image layers: How caching works
- Entrypoints vs CMD: When to use which
- Multi-stage builds: Keep images small
- Container networking: Bridge, host, overlay modes
🧠 Real-world example
A container won't start because the entrypoint script has Windows line endings (\r\n instead of \n).
A DevOps engineer runs dos2unix entrypoint.sh and fixes it instantly.
🚀 How to Shrink Images: 1.2GB → 120MB
❌ Bad Dockerfile (1.2GB):
FROM ubuntu:latest
RUN apt-get update && apt-get install -y python3 python3-pip
COPY . /app
RUN pip3 install -r requirements.txt
CMD ["python3", "app.py"]
✅ Optimized Dockerfile (120MB):
# Build stage
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Runtime stage
FROM python:3.11-alpine
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY . .
USER nobody
CMD ["python", "app.py"]
Key techniques:
- Use Alpine or slim base images
- Multi-stage builds (separate build from runtime)
-
.dockerignorefile (exclude node_modules, .git) - Run as non-root user (
USER nobody)
🎯 Skill depth to aim for
Be able to reduce a Docker image from 1.2GB → 120MB and explain each optimization.
5️⃣ CI/CD Thinking
❗ Why it matters
Deploying to production is not "push button and pray."
💡 What beginners must understand
- Pipeline stages: build → test → scan → deploy
- Artifacts: Build once, deploy many times
- Caching: Speed up builds by 80%
- Rollback strategies: Deploy should be reversible in 30 seconds
- GitOps: Git as single source of truth
- Feature flags: Deploy code without releasing features
- Approvals: Manual gates for production
🧠 Real-world example
A pipeline takes 25 minutes → you optimize caching → now it runs in 4 minutes.
Your senior engineers will love you.
⚡ 3 Caching Strategies That Actually Work
1. Docker Layer Caching
# .gitlab-ci.yml
build:
image: docker:latest
services:
- docker:dind
before_script:
- docker pull $CI_REGISTRY_IMAGE:latest || true
script:
- docker build --cache-from $CI_REGISTRY_IMAGE:latest -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
2. Dependency Caching
# GitHub Actions
- uses: actions/cache@v3
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
3. Build Artifact Caching
Store compiled binaries, don't rebuild them every time.
🔄 30-Second Rollback Design
deploy:
script:
- kubectl set image deployment/app app=$IMAGE_TAG
- kubectl rollout status deployment/app
rollback:
when: manual
script:
- kubectl rollout undo deployment/app
🎯 Skill depth to aim for
Be able to explain how code travels: laptop → Git push → CI build → staging → production. And how to roll back without panic.
6️⃣ Secrets Management
❗ Why it matters
Leaked secrets destroy companies. It's the MOST ignored DevOps skill.
💡 What to understand
- Environment variables are NOT secrets (they appear in logs, process lists)
- Vault / AWS Secrets Manager: Centralized secret storage
- Key rotation: Secrets should change automatically
- Scoped IAM policies: Principle of least privilege
- Encrypted storage: At rest and in transit
🧠 Real-world example
Uber lost access to customer and driver data because an engineer committed AWS keys to a public GitHub repo.
Cost: Millions in fines, reputation damage.
🔐 Never Use .env Files in Production
❌ Bad practice:
# .env file in repo
DATABASE_PASSWORD=SuperSecret123
AWS_ACCESS_KEY=AKIAIOSFODNN7EXAMPLE
✅ Right way with Vault:
# App reads from Vault at runtime
vault kv get -field=password secret/database/prod
✅ Right way with AWS Secrets Manager:
import boto3
client = boto3.client('secretsmanager')
response = client.get_secret_value(SecretId='prod/db/password')
password = response['SecretString']
🔍 Audit Trail: Who Accessed What Secret When
# Vault audit log
vault audit enable file file_path=/var/log/vault-audit.log
# Query who accessed DB password
cat /var/log/vault-audit.log | jq 'select(.request.path == "secret/database/prod")'
🎯 Skill depth to aim for
Build an app where secrets rotate automatically every 30 days and no human ever sees them in plaintext.
7️⃣ Observability
❗ Why it matters
You can't fix what you can't see.
💡 Observability ≠ Logging
It combines:
- Logs: Text events (errors, warnings, info)
- Metrics: Numbers over time (CPU, memory, request count)
- Traces: Journey of a single request through microservices
- Alerts: Notifications when something breaks
🧠 Real-world example
Backend slow? You check traces → bottleneck found at Redis layer → add connection pooling → solved.
🚨 The 3 Questions Every Alert Should Answer
Bad alert:
🔴 Service Down
Good alert:
🔴 User API Response Time > 2s
📊 Affected: 15% of requests
🔍 Likely cause: Database connection pool exhausted
✅ Runbook: https://wiki.company.com/runbooks/db-pool
📈 How to Avoid Alert Fatigue
- Fewer alerts, better context (not 50 alerts for 1 incident)
- Alert on symptoms, not causes (users can't log in ≠ disk space low)
- Auto-resolve when fixed (don't spam on-call engineers)
🛠️ Quick Setup: Prometheus + Grafana for 1 Microservice
# docker-compose.yml
version: '3'
services:
app:
image: my-app:latest
ports:
- "8080:8080"
prometheus:
image: prom/prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
grafana:
image: grafana/grafana
ports:
- "3000:3000"
# prometheus.yml
scrape_configs:
- job_name: 'my-app'
static_configs:
- targets: ['app:8080']
🎯 Skill depth to aim for
Set up Prometheus + Grafana for your own project. Create 3 dashboards: request rate, error rate, duration (RED metrics).
🎯 The Meta-Skill: How to Learn All 7 in 90 Days
🏗️ Build 1 Project That Forces You to Use All 7 Skills
Project idea: Deploy a real app with full DevOps pipeline
- Week 1-2: Set up a simple web app (Node.js, Python, Go)
- Week 3-4: Dockerize it, optimize image size
- Week 5-6: Set up CI/CD (GitHub Actions, GitLab CI)
- Week 7-8: Add secrets management (Vault or cloud provider)
- Week 9-10: Deploy to Kubernetes (Minikube locally, then EKS/GKE)
- Week 11-12: Add monitoring (Prometheus, Grafana, Loki for logs)
- Week 13: Set up alerts, write runbooks, test rollbacks
📚 Free Resources for Each Skill
Linux: Linux Journey (linuxjourney.com)
Networking: Practical Networking (practicalnetworking.net)
Git: Oh My Git! (game-based learning)
Docker: Play with Docker (labs.play-with-docker.com)
CI/CD: GitLab CI tutorials (docs.gitlab.com)
Secrets: HashiCorp Vault tutorials (learn.hashicorp.com)
Observability: Grafana tutorials (grafana.com/tutorials)
🚀 Your Challenge: Start TODAY
Pick 1 skill you're weakest at and spend 1 hour on it TODAY.
Not tomorrow. Not next week. Today.
Comment below: Which skill are you starting with? 👇
Found this helpful? Follow for more DevOps deep dives. Next up: Kubernetes patterns that scale to millions of requests.

Top comments (0)