There's a moment every engineer hits. You've deployed containers manually, you've docker run -d'd your way through development, and then someone says: "We need this in production. With auto-scaling. And zero downtime. And it needs to self-heal. Oh, and make it secure."
That's the moment you need Kubernetes.
But here's the problem — Kubernetes is enormous. The official documentation alone is thousands of pages. YouTube tutorials teach you pieces but never the whole picture. Blog posts cover single topics but miss how everything connects.
So I built something different: a 53-module, 130+ file repository that takes you from kubectl get pods to running production-grade, multi-cluster, GitOps-driven infrastructure — with every concept explained, every YAML annotated, and every decision justified.
This blog is a condensed tour of what's inside, what I learned building it, and the mental models that actually make Kubernetes click.
🧠 The Mental Model That Changes Everything
Before diving into commands and YAML, here's the single most important thing to understand about Kubernetes:
Kubernetes is a declarative state reconciliation engine.
You don't tell Kubernetes what to do. You tell it what you want, and it figures out how to get there. This is fundamentally different from writing shell scripts or Ansible playbooks.
Traditional (Imperative): Kubernetes (Declarative):
───────────────────────── ─────────────────────────
"Start 3 nginx containers" "I want 3 nginx pods running"
"If one dies, start another" ← K8s does this automatically
"Put them behind a load balancer" "I want a Service of type LB"
"Update to v2 one at a time" "Change image tag to v2"
← K8s rolls out gradually
Every controller in Kubernetes runs an infinite loop:
while true:
desired = read_from_etcd() # What the user declared
actual = observe_cluster() # What's actually running
if actual != desired:
take_action(actual, desired) # Reconcile the difference
sleep(interval)
Once this clicks, everything else — Deployments, StatefulSets, Operators, GitOps — is just variations of the same pattern.
🏗️ Architecture: The 10,000-Foot View
Every Kubernetes cluster has two planes:
┌─────────────────────────────────────────────────────────────────┐
│ KUBERNETES CLUSTER │
│ │
│ ┌────────────────────────────┐ ┌───────────────────────────┐ │
│ │ CONTROL PLANE │ │ WORKER NODE │ │
│ │ │ │ │ │
│ │ API Server ← the ONLY │ │ kubelet ← talks to API │ │
│ │ entry point for ALL │ │ Server, manages pods │ │
│ │ cluster operations │ │ │ │
│ │ │ │ kube-proxy ← manages │ │
│ │ etcd ← the "brain" │ │ network rules for │ │
│ │ stores ALL cluster │ │ Service routing │ │
│ │ state as key-value │ │ │ │
│ │ pairs │ │ Container Runtime │ │
│ │ │ │ (containerd) ← runs │ │
│ │ Scheduler ← decides │ │ actual containers │ │
│ │ WHERE pods run │ │ │ │
│ │ │ │ ┌──────┐ ┌──────┐ │ │
│ │ Controller Manager ← │ │ │ Pod │ │ Pod │ │ │
│ │ runs reconciliation │ │ │ A │ │ B │ │ │
│ │ loops │ │ └──────┘ └──────┘ │ │
│ └────────────────────────────┘ └───────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
The one thing people get wrong: They think kubectl talks directly to nodes. It doesn't. Every single command goes through the API Server, which authenticates you, authorizes the request, runs admission controllers, and then stores the result in etcd. The kubelet on each node watches etcd (via the API Server) and makes reality match the desired state.
etcd: The Most Important Component Nobody Talks About
If your API Server goes down, you can't make changes — but existing workloads keep running. If etcd goes down and you have no backup, your entire cluster state is gone. Every pod definition, every secret, every service configuration — everything.
# etcd backup — do this or regret it later
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
Production rule: Back up etcd every hour. Test the restore process quarterly. Store backups off-cluster in encrypted object storage.
📦 Pods: The Atomic Unit (But You'll Rarely Create Them Directly)
A Pod is the smallest deployable unit — but here's what beginners miss: you almost never create Pods directly. You create Deployments (which create ReplicaSets, which create Pods). Direct Pod creation means no self-healing, no scaling, and no rolling updates.
That said, understanding Pods is critical because everything builds on them:
apiVersion: v1
kind: Pod
metadata:
name: production-pod
labels:
app: api-server
spec:
# Init containers run FIRST, in order, one at a time.
# The main containers don't start until ALL init containers succeed.
initContainers:
- name: wait-for-database
image: busybox:1.36
command: ['sh', '-c',
'until nc -z db-service 5432; do echo waiting for db; sleep 2; done']
# WHY: Prevents the app from starting before its database is ready.
# Without this, you'd get connection errors on startup.
containers:
- name: api
image: myapp:v2.1
ports:
- containerPort: 8080
# Resource requests = SCHEDULER uses these to pick a node
# Resource limits = KUBELET enforces these at runtime
resources:
requests: # "I need at least this much"
cpu: 100m # 100 millicores = 0.1 CPU core
memory: 128Mi # 128 mebibytes
limits: # "Never let me use more than this"
cpu: 500m
memory: 512Mi # Exceed this → OOMKilled
# Probes tell Kubernetes about your app's health
livenessProbe: # "Is my app alive?" — fails → container restart
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe: # "Can my app serve traffic?" — fails → removed from Service
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
startupProbe: # "Has my app finished starting?" — fails → container restart
httpGet: # Disables liveness/readiness until it passes
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10 # Gives up to 300s (30×10) for slow-starting apps
The Multi-Container Patterns You'll Actually Use
Sidecar Pattern: Ambassador Pattern:
┌──────────────────────┐ ┌──────────────────────┐
│ Pod │ │ Pod │
│ ┌──────┐ ┌────────┐ │ │ ┌──────┐ ┌────────┐ │
│ │ App │ │Log │ │ │ │ App │ │ Proxy │ │
│ │ │→ │Shipper │ │ │ │ │→ │(to DB) │ │
│ └──────┘ └────────┘ │ │ └──────┘ └────────┘ │
│ writes reads │ │ localhost handles │
│ to shared volume │ │ :5432 auth/TLS │
└──────────────────────┘ └──────────────────────┘
Adapter Pattern:
┌──────────────────────┐
│ Pod │
│ ┌──────┐ ┌────────┐ │
│ │ App │ │Adapter │ │ → Prometheus
│ │(custom│→ │(format │ │ scrapes
│ │ metrics)│ │convert)│ │ /metrics
│ └──────┘ └────────┘ │
└──────────────────────┘
🔄 Deployments: Where the Magic Happens
Deployments are the workhorse of Kubernetes. Here's what happens when you update an image:
kubectl set image deployment/web nginx=nginx:1.26
Timeline:
─────────────────────────────────────────────────────────
t=0s [v1] [v1] [v1] 3 old pods
t=5s [v1] [v1] [v1] [v2] 1 new pod starting
t=15s [v1] [v1] [v2✓] ← ready new pod passes readiness
t=16s [v1] [v1] [v2✓] 1 old pod terminating
t=30s [v1] [v2✓] [v2✓] 2nd new pod ready
t=45s [v2✓] [v2✓] [v2✓] rollout complete!
And if something goes wrong:
# Instant rollback — Kubernetes keeps revision history
kubectl rollout undo deployment/web
# Check rollout status
kubectl rollout status deployment/web
# See revision history
kubectl rollout history deployment/web
The two deployment strategies you need to know:
| Strategy | How it Works | When to Use |
|---|---|---|
| RollingUpdate | Gradually replaces old pods with new ones | Default. Works for 95% of cases |
| Recreate | Kills ALL old pods, then starts new ones | When old & new versions can't coexist (DB schema changes) |
🌐 Networking: The Part Everyone Struggles With
Kubernetes networking has three fundamental rules:
- Every pod gets its own IP address
- Pods can communicate with any other pod without NAT
- Agents on a node can communicate with all pods on that node
Here's what actually happens when Pod A talks to a Service:
Pod A (10.244.1.5)
│
│ DNS lookup: "api-service" → 10.96.45.12 (ClusterIP)
▼
kube-proxy (iptables/IPVS rules on the node)
│
│ Load balances to one of the endpoint pods
▼
Pod B (10.244.2.8) ← one of the pods behind the Service
Services Demystified
# ClusterIP (default) — internal only
apiVersion: v1
kind: Service
metadata:
name: api-internal
spec:
type: ClusterIP # Only reachable inside the cluster
selector:
app: api
ports:
- port: 80 # Service port (what clients use)
targetPort: 8080 # Container port (where your app listens)
---
# NodePort — exposes on every node's IP
apiVersion: v1
kind: Service
metadata:
name: api-nodeport
spec:
type: NodePort # Accessible at <NodeIP>:30080
selector:
app: api
ports:
- port: 80
targetPort: 8080
nodePort: 30080 # Range: 30000-32767
---
# LoadBalancer — cloud provider creates an actual LB
apiVersion: v1
kind: Service
metadata:
name: api-public
spec:
type: LoadBalancer # Creates AWS ALB / Azure LB / GCP LB
selector:
app: api
ports:
- port: 443
targetPort: 8080
Network Policies: The Firewall You're Probably Not Using
By default, every pod can talk to every other pod. This is terrifying in production. Network Policies fix this:
# Step 1: Default deny ALL traffic in the namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all
namespace: production
spec:
podSelector: {} # {} = applies to ALL pods
policyTypes:
- Ingress
- Egress
---
# Step 2: Allow only what's needed
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-api
namespace: production
spec:
podSelector:
matchLabels:
app: api # This policy protects the API pods
ingress:
- from:
- podSelector:
matchLabels:
app: frontend # ONLY frontend pods can reach API
ports:
- port: 8080
protocol: TCP
⚠️ Important: Network Policies require a CNI that supports them. Calico and Cilium work. Flannel does not.
🔒 Security: The Chapter You Can't Skip
RBAC (Role-Based Access Control) is the gatekeeper of Kubernetes. Every API request goes through:
Request → Authentication → Authorization (RBAC) → Admission Control → etcd
"Who are you?" "Can you do this?" "Should you do this?"
Here's RBAC in practice:
# Step 1: Define what actions are allowed
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
namespace: development
rules:
- apiGroups: [""] # "" = core API group (pods, services, etc.)
resources: ["pods"]
verbs: ["get", "list", "watch"] # Read-only access
---
# Step 2: Bind the role to a user/group/service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods-binding
namespace: development
subjects:
- kind: User
name: jane@company.com
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
The golden rule: Start with zero permissions and add what's needed. Never use cluster-admin for workloads.
Pod Security: Defense in Depth
# A hardened container — this passes the "restricted" Pod Security Standard
securityContext:
runAsNonRoot: true # Don't run as root
runAsUser: 1000 # Specific non-root user
readOnlyRootFilesystem: true # Prevent writing to container filesystem
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"] # Drop ALL Linux capabilities
seccompProfile:
type: RuntimeDefault # Use default seccomp profile
📊 Observability: You Can't Fix What You Can't See
Production Kubernetes needs three pillars:
┌─────────────────────────────────────────────────────────┐
│ THE THREE PILLARS │
│ │
│ 📈 METRICS 📝 LOGS 🔗 TRACES │
│ "What's happening" "What happened" "Why is it slow"│
│ │
│ Prometheus Loki / EFK Jaeger / │
│ + Grafana Fluentd OpenTelemetry │
│ │
│ CPU, memory, Application Request flow │
│ request rates, errors, across │
│ error ratios, audit trails, microservices │
│ latency P99 debug output with timing │
└─────────────────────────────────────────────────────────┘
The most important Kubernetes metrics to alert on:
| Metric | Alert Threshold | Why |
|---|---|---|
kube_pod_container_status_restarts_total |
> 5 in 1h | CrashLoopBackOff — something is broken |
node_memory_MemAvailable_bytes |
< 10% | Node is about to OOM-kill pods |
kube_deployment_status_replicas_unavailable |
> 0 for 5m | Deployment health issue |
kubelet_volume_stats_available_bytes |
< 15% | PVC running out of disk space |
etcd_server_has_leader |
== 0 | Critical — cluster may be headless |
⚡ Autoscaling: The Three Dimensions
Kubernetes can scale in three ways, and they work best together:
┌─────────────────┐
│ Cluster │
│ Autoscaler │
│ (more NODES) │
└────────┬────────┘
│
┌──────────────┼──────────────┐
▼ │ ▼
┌─────────────┐ │ ┌──────────────┐
│ HPA │ │ │ VPA │
│ (more PODS) │ │ │ (bigger PODS)│
└─────────────┘ │ └──────────────┘
│
┌────────┴────────┐
│ KEDA │
│ Event-driven │ ← Kafka lag,
│ (custom │ SQS depth,
│ triggers) │ cron schedules
└─────────────────┘
# HPA: Scale based on CPU + custom metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 20
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies: # to prevent flapping
- type: Percent
value: 25 # Remove max 25% of pods per period
periodSeconds: 120
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Target 70% CPU usage
Pro tip: Never use HPA and VPA on the same metric (e.g., both on CPU). They'll fight each other. Use HPA for CPU/memory and VPA for right-sizing requests on separate workloads.
🚀 GitOps: The Deployment Model That Changed Everything
GitOps is the idea that your Git repository is the single source of truth for your entire infrastructure. No more kubectl apply from laptops. No more "who changed that in production?"
Traditional CI/CD: GitOps:
────────────────── ──────
CI builds image CI builds image
CI pushes to registry CI pushes to registry
CI runs kubectl apply ← PUSH model CI updates Git manifest ← PR + review
ArgoCD detects change ← PULL model
ArgoCD syncs to cluster
ArgoCD detects drift & self-heals
With ArgoCD, deploying to Kubernetes looks like this:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/k8s-manifests.git
targetRevision: main
path: overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Revert manual changes to match Git
syncOptions:
- CreateNamespace=true
Every change goes through a Pull Request. Every PR gets reviewed. Every merge is an audit log entry. If something breaks, git revert is your rollback.
🔥 Service Mesh: When Microservices Get Serious
Imagine 20 microservices. Each needs encrypted communication, retries, circuit breaking, and observability. Without a service mesh, you'd bake all that logic into every service in every language.
A service mesh moves that logic to the infrastructure:
WITHOUT Mesh: WITH Mesh (Istio):
┌──────────┐ ┌──────────────────┐
│ Service A │ │ Pod A │
│ (has TLS │ HTTP (unencrypted) │ ┌──────┐┌──────┐ │ mTLS
│ code, │ ────────────────── → │ │ App ││Envoy │ │ ═══════ →
│ retry │ │ │(just ││proxy │ │ automatic
│ logic) │ │ │ biz ││ │ │ encryption,
└──────────┘ │ │logic)││ │ │ retries,
│ └──────┘└──────┘ │ metrics
└──────────────────┘
Istio gives you traffic management superpowers:
# Canary deployment: Send 5% of traffic to v2
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 95
- destination:
host: reviews
subset: v2
weight: 5
When to use a service mesh:
- ✅ 10+ microservices that need mTLS
- ✅ Complex traffic routing (canary, A/B, fault injection)
- ✅ Need L7 observability without code changes
- ❌ Monolith or < 5 services (overhead not justified)
- ❌ Performance-critical workloads with sub-millisecond latency requirements
💰 Cost Optimization: The Chapter That Pays for Itself
Most Kubernetes clusters waste 30-50% of their compute budget. Here's why and how to fix it:
Typical Cluster Resource Usage:
Requested: ████████████████████████████░░░░░░ 70%
Actually Used: ██████████████░░░░░░░░░░░░░░░░░░ 35%
← 35% WASTE (you're paying for this) →
The Quick Wins
1. Right-size your requests:
# See actual vs requested resources
kubectl top pods -n production --sort-by=cpu
# What you'll find:
# NAME CPU(cores) MEMORY(bytes)
# api-server 50m 120Mi ← requests: 500m/512Mi = 10× over-provisioned!
2. Use Spot/Preemptible instances for fault-tolerant workloads:
# Node affinity for spot instances
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/lifecycle
operator: In
values: ["spot"]
3. Set LimitRanges so no developer accidentally requests 32Gi of memory:
apiVersion: v1
kind: LimitRange
metadata:
name: sensible-defaults
namespace: development
spec:
limits:
- type: Container
default: # Applied if no limits specified
cpu: 500m
memory: 256Mi
defaultRequest: # Applied if no requests specified
cpu: 100m
memory: 128Mi
max: # Hard ceiling
cpu: "2"
memory: 2Gi
4. Schedule idle workloads to zero with KEDA:
- Dev environments that scale to 0 pods outside business hours
- Queue processors that scale to 0 when no messages exist
- Staging clusters that shut down overnight
Real savings: Teams implementing these four practices typically see 25-40% cost reduction within the first month.
🔧 Troubleshooting: The Skill That Separates Juniors from Seniors
When something breaks — and it will — follow this framework:
OBSERVE → DESCRIBE → LOGS → EVENTS → EXEC → NETWORK → RESOURCES → CONTROL PLANE
Here's the debugging cheat sheet I keep open at all times:
# Pod stuck in Pending?
kubectl describe pod <name> -n <ns>
# Look at Events → "Insufficient cpu/memory" = need bigger nodes or lower requests
# Look at Events → "no nodes match" = check nodeSelector/affinity/taints
# CrashLoopBackOff?
kubectl logs <name> -n <ns> --previous # Logs from the CRASHED container
# Common causes: missing env vars, wrong command, config file not found
# ImagePullBackOff?
kubectl describe pod <name> -n <ns>
# Check: image name typo? Private registry auth? Image tag exists?
# Service not working?
kubectl get endpoints <service-name> -n <ns>
# Empty endpoints? → Labels don't match between Service selector and Pod labels
# DNS not resolving?
kubectl exec -it debug-pod -- nslookup my-service.my-namespace.svc.cluster.local
# If this fails → check CoreDNS pods: kubectl get pods -n kube-system -l k8s-app=kube-dns
# OOMKilled?
kubectl describe pod <name> -n <ns> | grep -A5 "Last State"
# Solution: Increase memory limits or fix the memory leak in your app
The Most Common Mistake Table
| Symptom | Rookie Move | What to Actually Do |
|---|---|---|
| Pod CrashLooping | Increase restartPolicy
|
Read the logs with --previous flag |
| Service has no endpoints | Delete and recreate | Check that selector labels match pod labels exactly |
| Node NotReady | Panic and drain | Check kubelet logs: journalctl -u kubelet -f
|
| PVC stuck in Pending | Delete and retry | Check if StorageClass exists and has a provisioner |
| Deployment rollout stuck | Force rollout restart | Check pod events with kubectl describe first |
📋 The Production Readiness Checklist (Condensed)
Before going to production, verify every category:
Cluster Architecture
- [ ] 3+ control plane nodes (HA)
- [ ] Nodes spread across availability zones
- [ ] etcd backup automated (hourly) and tested (quarterly)
- [ ] CNI deployed (Calico or Cilium, not Flannel for production)
Security
- [ ] RBAC enabled with least-privilege roles
- [ ] Pod Security Standards enforced (
restrictedprofile) - [ ] Network Policies deny-all + whitelist
- [ ] Secrets encrypted at rest (
--encryption-provider-config) - [ ] Container images scanned in CI pipeline
- [ ] No containers running as root
Reliability
- [ ] Resource requests AND limits set on every container
- [ ] Liveness, readiness, and startup probes configured
- [ ] PodDisruptionBudgets for critical workloads
- [ ] Pod topology spread constraints for HA
- [ ] Graceful shutdown handled (
preStophooks,SIGTERM)
Observability
- [ ] Prometheus + Grafana for metrics
- [ ] Centralized logging (Loki or EFK)
- [ ] Alerting rules with escalation paths
- [ ] SLOs defined for critical services
Operational
- [ ] GitOps workflow (ArgoCD or Flux)
- [ ] Cluster upgrade runbook documented and tested
- [ ] Disaster recovery plan with tested RTO/RPO
- [ ] Cost monitoring with ResourceQuotas per namespace
🛠️ What's in the Repository
Everything above barely scratches the surface. The full repository is organized into 13 sections, 53 modules, and 3 capstone projects:
| Section | Modules | What You'll Learn |
|---|---|---|
| Foundations | 01-05 | What K8s is, architecture, installation, kubectl, YAML |
| Core Concepts | 06-10 | Pods, Deployments, Services, ConfigMaps, Namespaces |
| Workloads | 11-15 | DaemonSets, StatefulSets, Jobs, Scheduling, Resources |
| Networking | 16-20 | CNI, Ingress, Network Policies, Gateway API, CoreDNS |
| Storage | 21-24 | Volumes, PV/PVC, CSI Drivers, Backup with Velero |
| Security | 25-29 | RBAC, Pod Security, Secrets Management, Supply Chain, Audit |
| Observability | 30-33 | Prometheus, Logging, Tracing, Alerting & SLOs |
| Advanced | 34-40 | Helm, Kustomize, Service Mesh, Autoscaling, Operators, GitOps, Multi-Cluster |
| Cluster Mgmt | 41-44 | kOps, Rancher, kubeadm, Managed K8s (EKS/AKS/GKE) |
| CI/CD | 45-47 | Pipelines, Container Best Practices, Progressive Delivery |
| Production | 48-50 | Production Checklist, Cost Optimization, Disaster Recovery |
| Troubleshooting | 51-53 | Debugging Guide, Cheatsheet, CKA/CKAD/CKS Exam Prep |
| Projects | P1-P3 | E-Commerce Microservices, Monitoring Stack, Multi-Tenant SaaS |
Every module includes:
- README with concepts explained from first principles
- ASCII diagrams showing architecture and data flow
- Annotated YAML files you can apply directly
- Troubleshooting tables for common issues
- Hands-on exercises to cement understanding
🎯 Four Learning Paths
Not everyone starts from the same place. Pick your path:
| Path | Duration | Modules | You'll Be Able To |
|---|---|---|---|
| Beginner | 2-3 weeks | 01-10 | Deploy apps, understand core K8s concepts |
| Intermediate | 3-4 weeks | 11-13, 16-17, 21-22, 25, 30-31, 34-35, 44 | Handle real workloads with monitoring & storage |
| Advanced | 4-6 weeks | 14-15, 18-20, 23, 26-27, 36-39, 41-42 | Build production clusters with service mesh & GitOps |
| Expert / CKA+CKS | 2-3 weeks | 28-29, 40, 43, 47-50, 51, 53 + Projects | Enterprise multi-cluster architecture, pass certifications |
🧪 Try It Right Now
# 1. Install Kind (takes 30 seconds)
go install sigs.k8s.io/kind@latest
# OR: brew install kind
# 2. Create a cluster
kind create cluster --name learn-k8s
# 3. Verify
kubectl cluster-info
kubectl get nodes
# 4. Deploy your first app
kubectl create deployment hello --image=nginx:1.25
kubectl expose deployment hello --port=80 --type=NodePort
kubectl get svc hello
# 5. You're running Kubernetes. Now go deeper. 🚀
The One Thing I'd Tell My Past Self
If I could go back and tell myself one thing before starting this Kubernetes journey, it would be:
Stop trying to learn Kubernetes by memorizing YAML.
Instead, understand the why behind every resource:
- A Deployment exists because you need rollbacks and scaling — a naked Pod gives you neither.
- A Service exists because Pod IPs are ephemeral — they change on every restart.
- A PersistentVolumeClaim exists because containers are ephemeral — their filesystem dies with them.
- RBAC exists because "everyone is admin" doesn't survive the first security audit.
- Network Policies exist because "all pods can talk to everything" is a lateral movement dream for attackers.
Every Kubernetes concept solves a specific problem. Learn the problem first, and the YAML writes itself.
If this guide helped you, the full repository has 130+ files of this depth. Star it, fork it, and start building.
🔗 GitHub Repository: Kubernetes Mastery — From Zero to Production Hero
What Kubernetes concept gave you the most trouble? Drop it in the comments — I'll explain it like you're five. 👇
Top comments (0)