Farooq Shabbir

Posted on Mar 11

I Built a DevOps Simulator to Practice Kubernetes Debugging

#devops #kubernetes #learning #opensource

Most DevOps tutorials have a problem.

They explain things like this:

“Here is what CrashLoopBackOff means.”
“Here is how to fix it.”

But real DevOps work doesn’t look like that.

Real incidents look like this:

kubectl get pods
kubectl logs api
kubectl describe pod api
kubectl get services

You investigate.
You read logs.
You try commands.
You guess.
You debug.

So I built a small DevOps Learning Simulator where you practice debugging Kubernetes incidents like you would in a real environment.

The Idea

Instead of reading solutions, you interactively investigate problems using real commands from Kubernetes.

You run commands such as:

kubectl get pods
kubectl logs <pod>
kubectl describe pod <pod>
kubectl get services
kubectl describe service <service>
kubectl get endpoints

Then you try to find the root cause of the incident.

Example Incident

You start the simulator and run:

kubectl get pods --show-labels

Output:

NAME                          READY STATUS             RESTARTS AGE LABELS
api-deployment-7d4f8b9c       0/1   CrashLoopBackOff        8 36m -
nginx-deployment-5f6g7h8i     1/1   Running                 0 5m app=nginx

You investigate logs:

kubectl logs api-deployment

Then describe the pod:

kubectl describe pod api-deployment

Eventually, you discover the issue:

A missing ConfigMap caused the container to crash.

The simulator then checks your answer and gives feedback.

Current Scenarios

Version 1 includes several common production incidents:

• CrashLoopBackOff caused by missing configuration
• OOMKilled due to incorrect memory limits
• DNS / service selector mismatch

These are problems engineers regularly see when working with Kubernetes.

Why I Built This

Many developers learn DevOps tools but never practice debugging real incidents.

They know commands but haven’t used them in a realistic investigation.

This project aims to address that by providing a safe environment for troubleshooting practice.

Think of it like a flight simulator for DevOps engineers.

Try It

You can try it here:

GitHub Repository

https://github.com/FarooqShabbir/devops_simulator

Run it locally:

git clone https://github.com/FarooqShabbir/devops_simulator.git
cd devops_simulator
python devops_simulator.py

Then start investigating incidents.

Future Plans

Some ideas for future versions:

• CI/CD pipeline failures
• Infrastructure drift debugging
• Network policy issues
• Multi-service production incidents

If you have ideas or want to contribute, feel free to open an issue or pull request.

Feedback

I would love to hear from other DevOps engineers:

What incidents would you add to a DevOps debugging simulator?

Top comments (4)

Marcin Parśniak • Mar 14

good job 😃

Incident Copilot • Mar 11

This is a strong teaching format. Most DevOps content teaches commands in isolation, but incidents are really about narrowing uncertainty under pressure, and your simulator gets much closer to that reality.

The scenario idea I would add next is a multi-signal case where logs point one way, but the real issue is a service selector or config mismatch introduced by a recent deploy. That kind of layered debugging is where people usually level up fast.

Farooq Shabbir • Mar 13

I will definitely try to add it in the next versions.

klement Gunndu • Mar 12

The CrashLoopBackOff from a missing ConfigMap is such a realistic scenario — that's probably the most common first production incident for anyone new to K8s. Having a safe sandbox to practice the investigation flow beats reading docs about it by a mile.