Most DevOps tutorials have a problem.
They explain things like this:
“Here is what CrashLoopBackOff means.”
“Here is how to fix it.”
But real DevOps work doesn’t look like that.
Real incidents look like this:
kubectl get pods
kubectl logs api
kubectl describe pod api
kubectl get services
You investigate.
You read logs.
You try commands.
You guess.
You debug.
So I built a small DevOps Learning Simulator where you practice debugging Kubernetes incidents like you would in a real environment.
The Idea
Instead of reading solutions, you interactively investigate problems using real commands from Kubernetes.
You run commands such as:
kubectl get pods
kubectl logs <pod>
kubectl describe pod <pod>
kubectl get services
kubectl describe service <service>
kubectl get endpoints
Then you try to find the root cause of the incident.
Example Incident
You start the simulator and run:
kubectl get pods --show-labels
Output:
NAME READY STATUS RESTARTS AGE LABELS
api-deployment-7d4f8b9c 0/1 CrashLoopBackOff 8 36m -
nginx-deployment-5f6g7h8i 1/1 Running 0 5m app=nginx
You investigate logs:
kubectl logs api-deployment
Then describe the pod:
kubectl describe pod api-deployment
Eventually, you discover the issue:
A missing ConfigMap caused the container to crash.
The simulator then checks your answer and gives feedback.
Current Scenarios
Version 1 includes several common production incidents:
• CrashLoopBackOff caused by missing configuration
• OOMKilled due to incorrect memory limits
• DNS / service selector mismatch
These are problems engineers regularly see when working with Kubernetes.
Why I Built This
Many developers learn DevOps tools but never practice debugging real incidents.
They know commands but haven’t used them in a realistic investigation.
This project aims to address that by providing a safe environment for troubleshooting practice.
Think of it like a flight simulator for DevOps engineers.
Try It
You can try it here:
GitHub Repository
https://github.com/FarooqShabbir/devops_simulator
Run it locally:
git clone https://github.com/FarooqShabbir/devops_simulator.git
cd devops_simulator
python devops_simulator.py
Then start investigating incidents.
Future Plans
Some ideas for future versions:
• CI/CD pipeline failures
• Infrastructure drift debugging
• Network policy issues
• Multi-service production incidents
If you have ideas or want to contribute, feel free to open an issue or pull request.
Feedback
I would love to hear from other DevOps engineers:
What incidents would you add to a DevOps debugging simulator?
Top comments (4)
good job 😃
The CrashLoopBackOff from a missing ConfigMap is such a realistic scenario — that's probably the most common first production incident for anyone new to K8s. Having a safe sandbox to practice the investigation flow beats reading docs about it by a mile.
This is a strong teaching format. Most DevOps content teaches commands in isolation, but incidents are really about narrowing uncertainty under pressure, and your simulator gets much closer to that reality.
The scenario idea I would add next is a multi-signal case where logs point one way, but the real issue is a service selector or config mismatch introduced by a recent deploy. That kind of layered debugging is where people usually level up fast.
I will definitely try to add it in the next versions.