DEV Community

Mohammed Nasser
Mohammed Nasser

Posted on

Kubernetes Troubleshooting 2025

Kubernetes Troubleshooting Guide for Application Developers

1. Inspecting Resources ๐Ÿ› ๏ธ

General Information ๐Ÿ“‹

Get an overview of all resources across namespaces:

kubectl get all -A
Enter fullscreen mode Exit fullscreen mode

Checking Deployment Details ๐Ÿ”

Get Full YAML Configuration:

kubectl get -n uat deployments.apps uat-deployment -o yaml
Enter fullscreen mode Exit fullscreen mode

Check Replica Count:

kubectl get -n uat deployments.apps uat-deployment -o yaml | grep replicas
Enter fullscreen mode Exit fullscreen mode

Search for Specific Deployments:

kubectl get deployments --all-namespaces | grep frontend
Enter fullscreen mode Exit fullscreen mode

View Labels:

kubectl get -n uat deployments.apps uat-deployment -o yaml | grep labels -A5
Enter fullscreen mode Exit fullscreen mode

Get Replica Count in JSON Format:

kubectl get -n uat deployments.apps uat-deployment -o=jsonpath='{.spec.replicas}'
Enter fullscreen mode Exit fullscreen mode

Check Containers:

kubectl get -n uat deployments.apps uat-deployment -o=jsonpath='{.spec.template.spec.containers}'
Enter fullscreen mode Exit fullscreen mode

Get Pods on Specific Node:

kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=node01
Enter fullscreen mode Exit fullscreen mode

2. Describing Nodes and Pods ๐Ÿ—๏ธ

Get Node Details:

kubectl describe node node01
Enter fullscreen mode Exit fullscreen mode

Describe a Specific Pod:

kubectl describe -n uat pod/uat-pod
Enter fullscreen mode Exit fullscreen mode

3. Viewing Events ๐Ÿ“…

Events provide crucial information about what's happening in your cluster:

kubectl events -n uat
Enter fullscreen mode Exit fullscreen mode

4. Checking Logs ๐Ÿ“œ

Basic Log Commands

Get Logs for a Deployment:

kubectl logs -n uat deployments/uat-deployment
Enter fullscreen mode Exit fullscreen mode

Logs for All Containers in a Deployment:

kubectl logs -n uat deployments/uat-deployment --all-containers
Enter fullscreen mode Exit fullscreen mode

Save Logs to File:

kubectl logs -n app deployments/frontend >> logs.txt
Enter fullscreen mode Exit fullscreen mode

Logs for a Specific Container:

kubectl logs -n uat deployments/uat-deployment -c uat-container01
Enter fullscreen mode Exit fullscreen mode

Advanced Log Options

Logs Based on Label:

kubectl logs -n uat -l app=uat-app
Enter fullscreen mode Exit fullscreen mode

Logs with Timestamps:

kubectl logs -n uat uat-pod --timestamps
Enter fullscreen mode Exit fullscreen mode

Save Timestamped Logs to File:

kubectl logs -n app myapp --timestamps >> timestamps.txt
Enter fullscreen mode Exit fullscreen mode

Time-Based Log Filtering:

kubectl logs nginx --since=10s    # Last 10 seconds
kubectl logs nginx --since=1h     # Last hour
Enter fullscreen mode Exit fullscreen mode

Follow Logs in Real-Time:

kubectl logs nginx -f
Enter fullscreen mode Exit fullscreen mode

5. Executing Commands Inside Containers ๐Ÿ–ฅ๏ธ

List Files in Container:

kubectl exec -n uat nginx -- ls
Enter fullscreen mode Exit fullscreen mode

Read a File:

kubectl exec -n uat nginx -- cat /usr/share/nginx/html/index.html
Enter fullscreen mode Exit fullscreen mode

Open Interactive Bash Shell:

kubectl exec -it -n uat nginx -- /bin/bash
Enter fullscreen mode Exit fullscreen mode

6. Port Forwarding ๐Ÿ”€

Forward local port to service port for testing:

kubectl port-forward -n uat svc/uat-svc 8000:80
Enter fullscreen mode Exit fullscreen mode

This forwards local port 8000 to service port 80.

7. Authentication and Authorization ๐Ÿ”‘

Check Current User

kubectl auth whoami
Enter fullscreen mode Exit fullscreen mode

Check Permissions

Check Your Own Permissions:

kubectl auth can-i list pods -n uat
kubectl auth can-i get pods -n uat
kubectl auth can-i update pods -n uat
kubectl auth can-i patch pods -n uat
kubectl auth can-i delete pods -n uat
Enter fullscreen mode Exit fullscreen mode

Check Permissions as Another User:

kubectl auth can-i get pods --as=jane --v=10
Enter fullscreen mode Exit fullscreen mode

Check Service Account Permissions:

kubectl auth can-i delete pods --as=system:serviceaccount:default:default
Enter fullscreen mode Exit fullscreen mode

8. Resource Utilization ๐Ÿ“Š

Node Resources

Get Node Details:

kubectl get nodes -o wide
Enter fullscreen mode Exit fullscreen mode

View Node Resource Usage:

kubectl top nodes
Enter fullscreen mode Exit fullscreen mode

Pod Resources

Get Pods in Namespace:

kubectl get pods -n uat
Enter fullscreen mode Exit fullscreen mode

View Pod Resource Usage:

kubectl top pods -n uat
Enter fullscreen mode Exit fullscreen mode

9. Explaining Kubernetes Objects ๐Ÿ“–

The explain command provides documentation about Kubernetes resources:

Explain Pod Resource:

kubectl explain pods
Enter fullscreen mode Exit fullscreen mode

Explain Pod Specifications:

kubectl explain pods.spec
Enter fullscreen mode Exit fullscreen mode

Explain Security Settings (Recursive):

kubectl explain pods.spec.securityContext --recursive
Enter fullscreen mode Exit fullscreen mode

10. Debugging ๐Ÿ› ๏ธ

Compare Configuration Changes

kubectl diff -f nginx.yaml
Enter fullscreen mode Exit fullscreen mode

Debug a Running Pod

kubectl debug -it nginx-pod --image=busybox --target=nginx
Enter fullscreen mode Exit fullscreen mode

Copy and Debug a Pod

kubectl debug nginx-pod --image=busybox -it --copy-to=debugging-pod --share-processes
Enter fullscreen mode Exit fullscreen mode

11. Common Issues and Fixes ๐Ÿšจ

ImagePullBackOff Error โ—

Description: Pod cannot pull the container image from the registry.

Diagnosis:

  • Describe the pod and check the events section to find the reason

Possible Causes:

  1. โŒ Incorrect image name: Verify the image name in your deployment YAML
  2. ๐Ÿ”‘ Missing imagePullSecrets: Results in 401 authentication error
  3. ๐Ÿท๏ธ Incorrect image tag: Check if the specified tag exists
  4. ๐ŸŒ Cluster cannot resolve registry hostname: Check DNS and network connectivity

Fix:

kubectl describe pod <pod-name> -n <namespace>
# Check Events section for detailed error
Enter fullscreen mode Exit fullscreen mode

CrashLoopBackOff Error ๐Ÿ”„

Description: Container keeps crashing and Kubernetes restarts it repeatedly.

Key Indicators:

  • restartPolicy in pod YAML is set to Always

Exit Code Analysis:

  • Exit Code 1: Application error (check application logs)
  • Exit Code 137: Possible liveness probe failure or OOM kill
  • Exit Code 127: Trying to access a non-existent file or command

Other Causes:

  • ๐Ÿ“‚ Volume mount issues: Check if volumes are properly mounted

Fix:

kubectl logs <pod-name> -n <namespace> --previous
kubectl describe pod <pod-name> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Pending Pods โณ

Description: Pods are stuck in the Pending state and not being scheduled.

Common Causes:

  1. โšก Insufficient resources on nodes: Not enough CPU/memory available
  2. ๐Ÿ” Node selector mismatch: Pod's nodeSelector doesn't match any node labels
  3. ๐Ÿšซ Taints and tolerations: Nodes are tainted and pod lacks required tolerations

Fix:

kubectl describe pod <pod-name> -n <namespace>
# Check Events section for scheduling failures

# Add label to node if needed
kubectl label nodes <node-name> <label-key>=<label-value>

# Check node capacity
kubectl describe nodes
Enter fullscreen mode Exit fullscreen mode

Missing Pods โ“

Description: Expected pods are not running.

Possible Causes:

  1. ๐Ÿšง Pod quota exceeded: Namespace has reached its resource quota
  2. ๐Ÿ”‘ Service account missing in deployment: Required service account doesn't exist

Fix:

# Check events for quota issues
kubectl get events -n uat

# Create missing service account
kubectl create sa service-account-uat -n uat
Enter fullscreen mode Exit fullscreen mode

Schrodinger's Deployment ๐Ÿฑ

Description: Multiple deployments sharing common selectors causing pod management issues.

Problem: Using common selectors like version=1 across multiple deployments.

Fix:

# Check affected pods
kubectl get pods -l version=1

# Verify endpoints
kubectl get endpoints

# Use unique selectors for each deployment
Enter fullscreen mode Exit fullscreen mode

CreateContainerError / CreateContainerConfigError โš™๏ธ

CreateContainerConfigError:

  • ๐Ÿ” Missing Secret
  • ๐Ÿ” Missing ConfigMap
  • ๐Ÿ” Missing environment variable

CreateContainerError:

  • โŒ Missing entrypoint or command
  • โŒ Invalid container configuration

Fix:

kubectl describe pod <pod-name> -n <namespace>
# Check Events section for specific error

# Verify ConfigMap exists
kubectl get configmap -n <namespace>

# Verify Secret exists
kubectl get secret -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Config Out of Date ๐Ÿ”„

Description: ConfigMap or Secret changes not reflected in running pods.

Cause: ConfigMaps and Secrets are mounted at pod creation time.

Fix:

# Option 1: Rollout restart
kubectl rollout restart deployment/<deployment-name> -n <namespace>

# Option 2: Use reloader controller
# Install and configure reloader to automatically restart pods on config changes
Enter fullscreen mode Exit fullscreen mode

Endless Terminating State โ™พ๏ธ

Description: Pod stuck in Terminating state.

Possible Causes:

  • Finalizer preventing deletion
  • Node where pod was running is unavailable

Fix:

# Force delete the pod
kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0

# Check for finalizers
kubectl get pod <pod-name> -n <namespace> -o yaml | grep finalizers -A5
Enter fullscreen mode Exit fullscreen mode

Field Immutability ๐Ÿ”’

Description: Cannot update certain fields after resource creation.

Problem: Metadata fields like matchLabels cannot be changed directly.

Fix:

# โŒ Delete and re-create the deployment
kubectl delete deployment <deployment-name> -n <namespace>
kubectl apply -f <deployment-file>.yaml
Enter fullscreen mode Exit fullscreen mode

EnableServiceLinks Issue ๐Ÿ”„

Description: Too many environment variables created for services.

Problem: By default, Kubernetes creates environment variables for all services.

Fix:

spec:
  template:
    spec:
      enableServiceLinks: false
Enter fullscreen mode Exit fullscreen mode

Network Policy Issues ๐ŸŒ

Description: Pods cannot communicate due to network policies.

Diagnosis:

# Check network policies
kubectl get netpol -n uat

# Describe network policy
kubectl describe netpol <policy-name> -n uat
Enter fullscreen mode Exit fullscreen mode

Verify:

  • Ingress rules (incoming traffic)
  • Egress rules (outgoing traffic)
  • Pod selectors
  • Namespace selectors

Multi-Attach Volume Error ๐Ÿ’พ

Description: Volume cannot be attached to multiple pods on different nodes.

Quick Fix:

# Scale down to 0
kubectl scale deployment/<deployment-name> --replicas=0 -n <namespace>

# Scale back to 1
kubectl scale deployment/<deployment-name> --replicas=1 -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Recommended Fix: Use Recreate strategy in deployment:

spec:
  strategy:
    type: Recreate
Enter fullscreen mode Exit fullscreen mode

Persistent Volume Access Modes:

  • โœ… RWO (ReadWriteOnce): Volume can be mounted as read-write by a single node
  • โœ… RWX (ReadWriteMany): Volume can be mounted as read-write by multiple nodes
  • โœ… ROX (ReadOnlyMany): Volume can be mounted as read-only by multiple nodes

Check PV Access Mode:

kubectl get pv
kubectl describe pv <pv-name>
Enter fullscreen mode Exit fullscreen mode

Quick Reference Cheat Sheet

Most Used Commands

# Get resources
kubectl get pods -n <namespace>
kubectl get all -A

# Describe resources
kubectl describe pod <pod-name> -n <namespace>
kubectl describe node <node-name>

# View logs
kubectl logs <pod-name> -n <namespace>
kubectl logs -f <pod-name> -n <namespace>

# Execute commands
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash

# Events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Resource usage
kubectl top nodes
kubectl top pods -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Debugging Workflow

  1. Check pod status: kubectl get pods
  2. Describe pod: kubectl describe pod <pod-name>
  3. Check events: kubectl get events
  4. View logs: kubectl logs <pod-name>
  5. Check resource usage: kubectl top pod <pod-name>
  6. Exec into container: kubectl exec -it <pod-name> -- /bin/bash

Document Version: 1.0

Last Updated: October 2025

Top comments (0)