DEV Community

Mohammed Nasser
Mohammed Nasser

Posted on

Kubernetes Troubleshooting 2025

Kubernetes Troubleshooting Guide for Application Developers

1. Inspecting Resources 🛠️

General Information 📋

Get an overview of all resources across namespaces:

kubectl get all -A
Enter fullscreen mode Exit fullscreen mode

Checking Deployment Details 🔍

Get Full YAML Configuration:

kubectl get -n uat deployments.apps uat-deployment -o yaml
Enter fullscreen mode Exit fullscreen mode

Check Replica Count:

kubectl get -n uat deployments.apps uat-deployment -o yaml | grep replicas
Enter fullscreen mode Exit fullscreen mode

Search for Specific Deployments:

kubectl get deployments --all-namespaces | grep frontend
Enter fullscreen mode Exit fullscreen mode

View Labels:

kubectl get -n uat deployments.apps uat-deployment -o yaml | grep labels -A5
Enter fullscreen mode Exit fullscreen mode

Get Replica Count in JSON Format:

kubectl get -n uat deployments.apps uat-deployment -o=jsonpath='{.spec.replicas}'
Enter fullscreen mode Exit fullscreen mode

Check Containers:

kubectl get -n uat deployments.apps uat-deployment -o=jsonpath='{.spec.template.spec.containers}'
Enter fullscreen mode Exit fullscreen mode

Get Pods on Specific Node:

kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=node01
Enter fullscreen mode Exit fullscreen mode

2. Describing Nodes and Pods 🏗️

Get Node Details:

kubectl describe node node01
Enter fullscreen mode Exit fullscreen mode

Describe a Specific Pod:

kubectl describe -n uat pod/uat-pod
Enter fullscreen mode Exit fullscreen mode

3. Viewing Events 📅

Events provide crucial information about what's happening in your cluster:

kubectl events -n uat
Enter fullscreen mode Exit fullscreen mode

4. Checking Logs 📜

Basic Log Commands

Get Logs for a Deployment:

kubectl logs -n uat deployments/uat-deployment
Enter fullscreen mode Exit fullscreen mode

Logs for All Containers in a Deployment:

kubectl logs -n uat deployments/uat-deployment --all-containers
Enter fullscreen mode Exit fullscreen mode

Save Logs to File:

kubectl logs -n app deployments/frontend >> logs.txt
Enter fullscreen mode Exit fullscreen mode

Logs for a Specific Container:

kubectl logs -n uat deployments/uat-deployment -c uat-container01
Enter fullscreen mode Exit fullscreen mode

Advanced Log Options

Logs Based on Label:

kubectl logs -n uat -l app=uat-app
Enter fullscreen mode Exit fullscreen mode

Logs with Timestamps:

kubectl logs -n uat uat-pod --timestamps
Enter fullscreen mode Exit fullscreen mode

Save Timestamped Logs to File:

kubectl logs -n app myapp --timestamps >> timestamps.txt
Enter fullscreen mode Exit fullscreen mode

Time-Based Log Filtering:

kubectl logs nginx --since=10s    # Last 10 seconds
kubectl logs nginx --since=1h     # Last hour
Enter fullscreen mode Exit fullscreen mode

Follow Logs in Real-Time:

kubectl logs nginx -f
Enter fullscreen mode Exit fullscreen mode

5. Executing Commands Inside Containers 🖥️

List Files in Container:

kubectl exec -n uat nginx -- ls
Enter fullscreen mode Exit fullscreen mode

Read a File:

kubectl exec -n uat nginx -- cat /usr/share/nginx/html/index.html
Enter fullscreen mode Exit fullscreen mode

Open Interactive Bash Shell:

kubectl exec -it -n uat nginx -- /bin/bash
Enter fullscreen mode Exit fullscreen mode

6. Port Forwarding 🔀

Forward local port to service port for testing:

kubectl port-forward -n uat svc/uat-svc 8000:80
Enter fullscreen mode Exit fullscreen mode

This forwards local port 8000 to service port 80.

7. Authentication and Authorization 🔑

Check Current User

kubectl auth whoami
Enter fullscreen mode Exit fullscreen mode

Check Permissions

Check Your Own Permissions:

kubectl auth can-i list pods -n uat
kubectl auth can-i get pods -n uat
kubectl auth can-i update pods -n uat
kubectl auth can-i patch pods -n uat
kubectl auth can-i delete pods -n uat
Enter fullscreen mode Exit fullscreen mode

Check Permissions as Another User:

kubectl auth can-i get pods --as=jane --v=10
Enter fullscreen mode Exit fullscreen mode

Check Service Account Permissions:

kubectl auth can-i delete pods --as=system:serviceaccount:default:default
Enter fullscreen mode Exit fullscreen mode

8. Resource Utilization 📊

Node Resources

Get Node Details:

kubectl get nodes -o wide
Enter fullscreen mode Exit fullscreen mode

View Node Resource Usage:

kubectl top nodes
Enter fullscreen mode Exit fullscreen mode

Pod Resources

Get Pods in Namespace:

kubectl get pods -n uat
Enter fullscreen mode Exit fullscreen mode

View Pod Resource Usage:

kubectl top pods -n uat
Enter fullscreen mode Exit fullscreen mode

9. Explaining Kubernetes Objects 📖

The explain command provides documentation about Kubernetes resources:

Explain Pod Resource:

kubectl explain pods
Enter fullscreen mode Exit fullscreen mode

Explain Pod Specifications:

kubectl explain pods.spec
Enter fullscreen mode Exit fullscreen mode

Explain Security Settings (Recursive):

kubectl explain pods.spec.securityContext --recursive
Enter fullscreen mode Exit fullscreen mode

10. Debugging 🛠️

Compare Configuration Changes

kubectl diff -f nginx.yaml
Enter fullscreen mode Exit fullscreen mode

Debug a Running Pod

kubectl debug -it nginx-pod --image=busybox --target=nginx
Enter fullscreen mode Exit fullscreen mode

Copy and Debug a Pod

kubectl debug nginx-pod --image=busybox -it --copy-to=debugging-pod --share-processes
Enter fullscreen mode Exit fullscreen mode

11. Common Issues and Fixes 🚨

ImagePullBackOff Error ❗

Description: Pod cannot pull the container image from the registry.

Diagnosis:

  • Describe the pod and check the events section to find the reason

Possible Causes:

  1. Incorrect image name: Verify the image name in your deployment YAML
  2. 🔑 Missing imagePullSecrets: Results in 401 authentication error
  3. 🏷️ Incorrect image tag: Check if the specified tag exists
  4. 🌐 Cluster cannot resolve registry hostname: Check DNS and network connectivity

Fix:

kubectl describe pod <pod-name> -n <namespace>
# Check Events section for detailed error
Enter fullscreen mode Exit fullscreen mode

CrashLoopBackOff Error 🔄

Description: Container keeps crashing and Kubernetes restarts it repeatedly.

Key Indicators:

  • restartPolicy in pod YAML is set to Always

Exit Code Analysis:

  • Exit Code 1: Application error (check application logs)
  • Exit Code 137: Possible liveness probe failure or OOM kill
  • Exit Code 127: Trying to access a non-existent file or command

Other Causes:

  • 📂 Volume mount issues: Check if volumes are properly mounted

Fix:

kubectl logs <pod-name> -n <namespace> --previous
kubectl describe pod <pod-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Pending Pods ⏳

Description: Pods are stuck in the Pending state and not being scheduled.

Common Causes:

  1. Insufficient resources on nodes: Not enough CPU/memory available
  2. 🔍 Node selector mismatch: Pod's nodeSelector doesn't match any node labels
  3. 🚫 Taints and tolerations: Nodes are tainted and pod lacks required tolerations

Fix:

kubectl describe pod <pod-name> -n <namespace>
# Check Events section for scheduling failures

# Add label to node if needed
kubectl label nodes <node-name> <label-key>=<label-value>

# Check node capacity
kubectl describe nodes
Enter fullscreen mode Exit fullscreen mode

Missing Pods ❓

Description: Expected pods are not running.

Possible Causes:

  1. 🚧 Pod quota exceeded: Namespace has reached its resource quota
  2. 🔑 Service account missing in deployment: Required service account doesn't exist

Fix:

# Check events for quota issues
kubectl get events -n uat

# Create missing service account
kubectl create sa service-account-uat -n uat
Enter fullscreen mode Exit fullscreen mode

Schrodinger's Deployment 🐱

Description: Multiple deployments sharing common selectors causing pod management issues.

Problem: Using common selectors like version=1 across multiple deployments.

Fix:

# Check affected pods
kubectl get pods -l version=1

# Verify endpoints
kubectl get endpoints

# Use unique selectors for each deployment
Enter fullscreen mode Exit fullscreen mode

CreateContainerError / CreateContainerConfigError ⚙️

CreateContainerConfigError:

  • 🔍 Missing Secret
  • 🔍 Missing ConfigMap
  • 🔍 Missing environment variable

CreateContainerError:

  • ❌ Missing entrypoint or command
  • ❌ Invalid container configuration

Fix:

kubectl describe pod <pod-name> -n <namespace>
# Check Events section for specific error

# Verify ConfigMap exists
kubectl get configmap -n <namespace>

# Verify Secret exists
kubectl get secret -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Config Out of Date 🔄

Description: ConfigMap or Secret changes not reflected in running pods.

Cause: ConfigMaps and Secrets are mounted at pod creation time.

Fix:

# Option 1: Rollout restart
kubectl rollout restart deployment/<deployment-name> -n <namespace>

# Option 2: Use reloader controller
# Install and configure reloader to automatically restart pods on config changes
Enter fullscreen mode Exit fullscreen mode

Endless Terminating State ♾️

Description: Pod stuck in Terminating state.

Possible Causes:

  • Finalizer preventing deletion
  • Node where pod was running is unavailable

Fix:

# Force delete the pod
kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0

# Check for finalizers
kubectl get pod <pod-name> -n <namespace> -o yaml | grep finalizers -A5
Enter fullscreen mode Exit fullscreen mode

Field Immutability 🔒

Description: Cannot update certain fields after resource creation.

Problem: Metadata fields like matchLabels cannot be changed directly.

Fix:

# ❌ Delete and re-create the deployment
kubectl delete deployment <deployment-name> -n <namespace>
kubectl apply -f <deployment-file>.yaml
Enter fullscreen mode Exit fullscreen mode

EnableServiceLinks Issue 🔄

Description: Too many environment variables created for services.

Problem: By default, Kubernetes creates environment variables for all services.

Fix:

spec:
  template:
    spec:
      enableServiceLinks: false
Enter fullscreen mode Exit fullscreen mode

Network Policy Issues 🌐

Description: Pods cannot communicate due to network policies.

Diagnosis:

# Check network policies
kubectl get netpol -n uat

# Describe network policy
kubectl describe netpol <policy-name> -n uat
Enter fullscreen mode Exit fullscreen mode

Verify:

  • Ingress rules (incoming traffic)
  • Egress rules (outgoing traffic)
  • Pod selectors
  • Namespace selectors

Multi-Attach Volume Error 💾

Description: Volume cannot be attached to multiple pods on different nodes.

Quick Fix:

# Scale down to 0
kubectl scale deployment/<deployment-name> --replicas=0 -n <namespace>

# Scale back to 1
kubectl scale deployment/<deployment-name> --replicas=1 -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Recommended Fix: Use Recreate strategy in deployment:

spec:
  strategy:
    type: Recreate
Enter fullscreen mode Exit fullscreen mode

Persistent Volume Access Modes:

  • RWO (ReadWriteOnce): Volume can be mounted as read-write by a single node
  • RWX (ReadWriteMany): Volume can be mounted as read-write by multiple nodes
  • ROX (ReadOnlyMany): Volume can be mounted as read-only by multiple nodes

Check PV Access Mode:

kubectl get pv
kubectl describe pv <pv-name>
Enter fullscreen mode Exit fullscreen mode

Quick Reference Cheat Sheet

Most Used Commands

# Get resources
kubectl get pods -n <namespace>
kubectl get all -A

# Describe resources
kubectl describe pod <pod-name> -n <namespace>
kubectl describe node <node-name>

# View logs
kubectl logs <pod-name> -n <namespace>
kubectl logs -f <pod-name> -n <namespace>

# Execute commands
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash

# Events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Resource usage
kubectl top nodes
kubectl top pods -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Debugging Workflow

  1. Check pod status: kubectl get pods
  2. Describe pod: kubectl describe pod <pod-name>
  3. Check events: kubectl get events
  4. View logs: kubectl logs <pod-name>
  5. Check resource usage: kubectl top pod <pod-name>
  6. Exec into container: kubectl exec -it <pod-name> -- /bin/bash

Document Version: 1.0

Last Updated: October 2025

Top comments (1)

Collapse
 
kollati_raju_b438ac33113e profile image
Kollati Raju

Nice Explanation