DEV Community

Mohammed Nasser
Mohammed Nasser

Posted on

Kubernetes Troubleshooting 2025

Kubernetes Troubleshooting Guide for Application Developers

1. Inspecting Resources πŸ› οΈ

General Information πŸ“‹

Get an overview of all resources across namespaces:

kubectl get all -A
Enter fullscreen mode Exit fullscreen mode

Checking Deployment Details πŸ”

Get Full YAML Configuration:

kubectl get -n uat deployments.apps uat-deployment -o yaml
Enter fullscreen mode Exit fullscreen mode

Check Replica Count:

kubectl get -n uat deployments.apps uat-deployment -o yaml | grep replicas
Enter fullscreen mode Exit fullscreen mode

Search for Specific Deployments:

kubectl get deployments --all-namespaces | grep frontend
Enter fullscreen mode Exit fullscreen mode

View Labels:

kubectl get -n uat deployments.apps uat-deployment -o yaml | grep labels -A5
Enter fullscreen mode Exit fullscreen mode

Get Replica Count in JSON Format:

kubectl get -n uat deployments.apps uat-deployment -o=jsonpath='{.spec.replicas}'
Enter fullscreen mode Exit fullscreen mode

Check Containers:

kubectl get -n uat deployments.apps uat-deployment -o=jsonpath='{.spec.template.spec.containers}'
Enter fullscreen mode Exit fullscreen mode

Get Pods on Specific Node:

kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=node01
Enter fullscreen mode Exit fullscreen mode

2. Describing Nodes and Pods πŸ—οΈ

Get Node Details:

kubectl describe node node01
Enter fullscreen mode Exit fullscreen mode

Describe a Specific Pod:

kubectl describe -n uat pod/uat-pod
Enter fullscreen mode Exit fullscreen mode

3. Viewing Events πŸ“…

Events provide crucial information about what's happening in your cluster:

kubectl events -n uat
Enter fullscreen mode Exit fullscreen mode

4. Checking Logs πŸ“œ

Basic Log Commands

Get Logs for a Deployment:

kubectl logs -n uat deployments/uat-deployment
Enter fullscreen mode Exit fullscreen mode

Logs for All Containers in a Deployment:

kubectl logs -n uat deployments/uat-deployment --all-containers
Enter fullscreen mode Exit fullscreen mode

Save Logs to File:

kubectl logs -n app deployments/frontend >> logs.txt
Enter fullscreen mode Exit fullscreen mode

Logs for a Specific Container:

kubectl logs -n uat deployments/uat-deployment -c uat-container01
Enter fullscreen mode Exit fullscreen mode

Advanced Log Options

Logs Based on Label:

kubectl logs -n uat -l app=uat-app
Enter fullscreen mode Exit fullscreen mode

Logs with Timestamps:

kubectl logs -n uat uat-pod --timestamps
Enter fullscreen mode Exit fullscreen mode

Save Timestamped Logs to File:

kubectl logs -n app myapp --timestamps >> timestamps.txt
Enter fullscreen mode Exit fullscreen mode

Time-Based Log Filtering:

kubectl logs nginx --since=10s    # Last 10 seconds
kubectl logs nginx --since=1h     # Last hour
Enter fullscreen mode Exit fullscreen mode

Follow Logs in Real-Time:

kubectl logs nginx -f
Enter fullscreen mode Exit fullscreen mode

5. Executing Commands Inside Containers πŸ–₯️

List Files in Container:

kubectl exec -n uat nginx -- ls
Enter fullscreen mode Exit fullscreen mode

Read a File:

kubectl exec -n uat nginx -- cat /usr/share/nginx/html/index.html
Enter fullscreen mode Exit fullscreen mode

Open Interactive Bash Shell:

kubectl exec -it -n uat nginx -- /bin/bash
Enter fullscreen mode Exit fullscreen mode

6. Port Forwarding πŸ”€

Forward local port to service port for testing:

kubectl port-forward -n uat svc/uat-svc 8000:80
Enter fullscreen mode Exit fullscreen mode

This forwards local port 8000 to service port 80.

7. Authentication and Authorization πŸ”‘

Check Current User

kubectl auth whoami
Enter fullscreen mode Exit fullscreen mode

Check Permissions

Check Your Own Permissions:

kubectl auth can-i list pods -n uat
kubectl auth can-i get pods -n uat
kubectl auth can-i update pods -n uat
kubectl auth can-i patch pods -n uat
kubectl auth can-i delete pods -n uat
Enter fullscreen mode Exit fullscreen mode

Check Permissions as Another User:

kubectl auth can-i get pods --as=jane --v=10
Enter fullscreen mode Exit fullscreen mode

Check Service Account Permissions:

kubectl auth can-i delete pods --as=system:serviceaccount:default:default
Enter fullscreen mode Exit fullscreen mode

8. Resource Utilization πŸ“Š

Node Resources

Get Node Details:

kubectl get nodes -o wide
Enter fullscreen mode Exit fullscreen mode

View Node Resource Usage:

kubectl top nodes
Enter fullscreen mode Exit fullscreen mode

Pod Resources

Get Pods in Namespace:

kubectl get pods -n uat
Enter fullscreen mode Exit fullscreen mode

View Pod Resource Usage:

kubectl top pods -n uat
Enter fullscreen mode Exit fullscreen mode

9. Explaining Kubernetes Objects πŸ“–

The explain command provides documentation about Kubernetes resources:

Explain Pod Resource:

kubectl explain pods
Enter fullscreen mode Exit fullscreen mode

Explain Pod Specifications:

kubectl explain pods.spec
Enter fullscreen mode Exit fullscreen mode

Explain Security Settings (Recursive):

kubectl explain pods.spec.securityContext --recursive
Enter fullscreen mode Exit fullscreen mode

10. Debugging πŸ› οΈ

Compare Configuration Changes

kubectl diff -f nginx.yaml
Enter fullscreen mode Exit fullscreen mode

Debug a Running Pod

kubectl debug -it nginx-pod --image=busybox --target=nginx
Enter fullscreen mode Exit fullscreen mode

Copy and Debug a Pod

kubectl debug nginx-pod --image=busybox -it --copy-to=debugging-pod --share-processes
Enter fullscreen mode Exit fullscreen mode

11. Common Issues and Fixes 🚨

ImagePullBackOff Error ❗

Description: Pod cannot pull the container image from the registry.

Diagnosis:

  • Describe the pod and check the events section to find the reason

Possible Causes:

  1. ❌ Incorrect image name: Verify the image name in your deployment YAML
  2. πŸ”‘ Missing imagePullSecrets: Results in 401 authentication error
  3. 🏷️ Incorrect image tag: Check if the specified tag exists
  4. 🌐 Cluster cannot resolve registry hostname: Check DNS and network connectivity

Fix:

kubectl describe pod <pod-name> -n <namespace>
# Check Events section for detailed error
Enter fullscreen mode Exit fullscreen mode

CrashLoopBackOff Error πŸ”„

Description: Container keeps crashing and Kubernetes restarts it repeatedly.

Key Indicators:

  • restartPolicy in pod YAML is set to Always

Exit Code Analysis:

  • Exit Code 1: Application error (check application logs)
  • Exit Code 137: Possible liveness probe failure or OOM kill
  • Exit Code 127: Trying to access a non-existent file or command

Other Causes:

  • πŸ“‚ Volume mount issues: Check if volumes are properly mounted

Fix:

kubectl logs <pod-name> -n <namespace> --previous
kubectl describe pod <pod-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Pending Pods ⏳

Description: Pods are stuck in the Pending state and not being scheduled.

Common Causes:

  1. ⚑ Insufficient resources on nodes: Not enough CPU/memory available
  2. πŸ” Node selector mismatch: Pod's nodeSelector doesn't match any node labels
  3. 🚫 Taints and tolerations: Nodes are tainted and pod lacks required tolerations

Fix:

kubectl describe pod <pod-name> -n <namespace>
# Check Events section for scheduling failures

# Add label to node if needed
kubectl label nodes <node-name> <label-key>=<label-value>

# Check node capacity
kubectl describe nodes
Enter fullscreen mode Exit fullscreen mode

Missing Pods ❓

Description: Expected pods are not running.

Possible Causes:

  1. 🚧 Pod quota exceeded: Namespace has reached its resource quota
  2. πŸ”‘ Service account missing in deployment: Required service account doesn't exist

Fix:

# Check events for quota issues
kubectl get events -n uat

# Create missing service account
kubectl create sa service-account-uat -n uat
Enter fullscreen mode Exit fullscreen mode

Schrodinger's Deployment 🐱

Description: Multiple deployments sharing common selectors causing pod management issues.

Problem: Using common selectors like version=1 across multiple deployments.

Fix:

# Check affected pods
kubectl get pods -l version=1

# Verify endpoints
kubectl get endpoints

# Use unique selectors for each deployment
Enter fullscreen mode Exit fullscreen mode

CreateContainerError / CreateContainerConfigError βš™οΈ

CreateContainerConfigError:

  • πŸ” Missing Secret
  • πŸ” Missing ConfigMap
  • πŸ” Missing environment variable

CreateContainerError:

  • ❌ Missing entrypoint or command
  • ❌ Invalid container configuration

Fix:

kubectl describe pod <pod-name> -n <namespace>
# Check Events section for specific error

# Verify ConfigMap exists
kubectl get configmap -n <namespace>

# Verify Secret exists
kubectl get secret -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Config Out of Date πŸ”„

Description: ConfigMap or Secret changes not reflected in running pods.

Cause: ConfigMaps and Secrets are mounted at pod creation time.

Fix:

# Option 1: Rollout restart
kubectl rollout restart deployment/<deployment-name> -n <namespace>

# Option 2: Use reloader controller
# Install and configure reloader to automatically restart pods on config changes
Enter fullscreen mode Exit fullscreen mode

Endless Terminating State ♾️

Description: Pod stuck in Terminating state.

Possible Causes:

  • Finalizer preventing deletion
  • Node where pod was running is unavailable

Fix:

# Force delete the pod
kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0

# Check for finalizers
kubectl get pod <pod-name> -n <namespace> -o yaml | grep finalizers -A5
Enter fullscreen mode Exit fullscreen mode

Field Immutability πŸ”’

Description: Cannot update certain fields after resource creation.

Problem: Metadata fields like matchLabels cannot be changed directly.

Fix:

# ❌ Delete and re-create the deployment
kubectl delete deployment <deployment-name> -n <namespace>
kubectl apply -f <deployment-file>.yaml
Enter fullscreen mode Exit fullscreen mode

EnableServiceLinks Issue πŸ”„

Description: Too many environment variables created for services.

Problem: By default, Kubernetes creates environment variables for all services.

Fix:

spec:
  template:
    spec:
      enableServiceLinks: false
Enter fullscreen mode Exit fullscreen mode

Network Policy Issues 🌐

Description: Pods cannot communicate due to network policies.

Diagnosis:

# Check network policies
kubectl get netpol -n uat

# Describe network policy
kubectl describe netpol <policy-name> -n uat
Enter fullscreen mode Exit fullscreen mode

Verify:

  • Ingress rules (incoming traffic)
  • Egress rules (outgoing traffic)
  • Pod selectors
  • Namespace selectors

Multi-Attach Volume Error πŸ’Ύ

Description: Volume cannot be attached to multiple pods on different nodes.

Quick Fix:

# Scale down to 0
kubectl scale deployment/<deployment-name> --replicas=0 -n <namespace>

# Scale back to 1
kubectl scale deployment/<deployment-name> --replicas=1 -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Recommended Fix: Use Recreate strategy in deployment:

spec:
  strategy:
    type: Recreate
Enter fullscreen mode Exit fullscreen mode

Persistent Volume Access Modes:

  • βœ… RWO (ReadWriteOnce): Volume can be mounted as read-write by a single node
  • βœ… RWX (ReadWriteMany): Volume can be mounted as read-write by multiple nodes
  • βœ… ROX (ReadOnlyMany): Volume can be mounted as read-only by multiple nodes

Check PV Access Mode:

kubectl get pv
kubectl describe pv <pv-name>
Enter fullscreen mode Exit fullscreen mode

Quick Reference Cheat Sheet

Most Used Commands

# Get resources
kubectl get pods -n <namespace>
kubectl get all -A

# Describe resources
kubectl describe pod <pod-name> -n <namespace>
kubectl describe node <node-name>

# View logs
kubectl logs <pod-name> -n <namespace>
kubectl logs -f <pod-name> -n <namespace>

# Execute commands
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash

# Events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Resource usage
kubectl top nodes
kubectl top pods -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Debugging Workflow

  1. Check pod status: kubectl get pods
  2. Describe pod: kubectl describe pod <pod-name>
  3. Check events: kubectl get events
  4. View logs: kubectl logs <pod-name>
  5. Check resource usage: kubectl top pod <pod-name>
  6. Exec into container: kubectl exec -it <pod-name> -- /bin/bash

Document Version: 1.0

Last Updated: October 2025

Top comments (0)