Kubernetes Troubleshooting Guide for Application Developers
1. Inspecting Resources π οΈ
General Information π
Get an overview of all resources across namespaces:
kubectl get all -A
Checking Deployment Details π
Get Full YAML Configuration:
kubectl get -n uat deployments.apps uat-deployment -o yaml
Check Replica Count:
kubectl get -n uat deployments.apps uat-deployment -o yaml | grep replicas
Search for Specific Deployments:
kubectl get deployments --all-namespaces | grep frontend
View Labels:
kubectl get -n uat deployments.apps uat-deployment -o yaml | grep labels -A5
Get Replica Count in JSON Format:
kubectl get -n uat deployments.apps uat-deployment -o=jsonpath='{.spec.replicas}'
Check Containers:
kubectl get -n uat deployments.apps uat-deployment -o=jsonpath='{.spec.template.spec.containers}'
Get Pods on Specific Node:
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=node01
2. Describing Nodes and Pods ποΈ
Get Node Details:
kubectl describe node node01
Describe a Specific Pod:
kubectl describe -n uat pod/uat-pod
3. Viewing Events π
Events provide crucial information about what's happening in your cluster:
kubectl events -n uat
4. Checking Logs π
Basic Log Commands
Get Logs for a Deployment:
kubectl logs -n uat deployments/uat-deployment
Logs for All Containers in a Deployment:
kubectl logs -n uat deployments/uat-deployment --all-containers
Save Logs to File:
kubectl logs -n app deployments/frontend >> logs.txt
Logs for a Specific Container:
kubectl logs -n uat deployments/uat-deployment -c uat-container01
Advanced Log Options
Logs Based on Label:
kubectl logs -n uat -l app=uat-app
Logs with Timestamps:
kubectl logs -n uat uat-pod --timestamps
Save Timestamped Logs to File:
kubectl logs -n app myapp --timestamps >> timestamps.txt
Time-Based Log Filtering:
kubectl logs nginx --since=10s # Last 10 seconds
kubectl logs nginx --since=1h # Last hour
Follow Logs in Real-Time:
kubectl logs nginx -f
5. Executing Commands Inside Containers π₯οΈ
List Files in Container:
kubectl exec -n uat nginx -- ls
Read a File:
kubectl exec -n uat nginx -- cat /usr/share/nginx/html/index.html
Open Interactive Bash Shell:
kubectl exec -it -n uat nginx -- /bin/bash
6. Port Forwarding π
Forward local port to service port for testing:
kubectl port-forward -n uat svc/uat-svc 8000:80
This forwards local port 8000 to service port 80.
7. Authentication and Authorization π
Check Current User
kubectl auth whoami
Check Permissions
Check Your Own Permissions:
kubectl auth can-i list pods -n uat
kubectl auth can-i get pods -n uat
kubectl auth can-i update pods -n uat
kubectl auth can-i patch pods -n uat
kubectl auth can-i delete pods -n uat
Check Permissions as Another User:
kubectl auth can-i get pods --as=jane --v=10
Check Service Account Permissions:
kubectl auth can-i delete pods --as=system:serviceaccount:default:default
8. Resource Utilization π
Node Resources
Get Node Details:
kubectl get nodes -o wide
View Node Resource Usage:
kubectl top nodes
Pod Resources
Get Pods in Namespace:
kubectl get pods -n uat
View Pod Resource Usage:
kubectl top pods -n uat
9. Explaining Kubernetes Objects π
The explain
command provides documentation about Kubernetes resources:
Explain Pod Resource:
kubectl explain pods
Explain Pod Specifications:
kubectl explain pods.spec
Explain Security Settings (Recursive):
kubectl explain pods.spec.securityContext --recursive
10. Debugging π οΈ
Compare Configuration Changes
kubectl diff -f nginx.yaml
Debug a Running Pod
kubectl debug -it nginx-pod --image=busybox --target=nginx
Copy and Debug a Pod
kubectl debug nginx-pod --image=busybox -it --copy-to=debugging-pod --share-processes
11. Common Issues and Fixes π¨
ImagePullBackOff Error β
Description: Pod cannot pull the container image from the registry.
Diagnosis:
- Describe the pod and check the events section to find the reason
Possible Causes:
- β Incorrect image name: Verify the image name in your deployment YAML
- π Missing imagePullSecrets: Results in 401 authentication error
- π·οΈ Incorrect image tag: Check if the specified tag exists
- π Cluster cannot resolve registry hostname: Check DNS and network connectivity
Fix:
kubectl describe pod <pod-name> -n <namespace>
# Check Events section for detailed error
CrashLoopBackOff Error π
Description: Container keeps crashing and Kubernetes restarts it repeatedly.
Key Indicators:
-
restartPolicy
in pod YAML is set toAlways
Exit Code Analysis:
- Exit Code 1: Application error (check application logs)
- Exit Code 137: Possible liveness probe failure or OOM kill
- Exit Code 127: Trying to access a non-existent file or command
Other Causes:
- π Volume mount issues: Check if volumes are properly mounted
Fix:
kubectl logs <pod-name> -n <namespace> --previous
kubectl describe pod <pod-name> -n <namespace>
Pending Pods β³
Description: Pods are stuck in the Pending state and not being scheduled.
Common Causes:
- β‘ Insufficient resources on nodes: Not enough CPU/memory available
- π Node selector mismatch: Pod's nodeSelector doesn't match any node labels
- π« Taints and tolerations: Nodes are tainted and pod lacks required tolerations
Fix:
kubectl describe pod <pod-name> -n <namespace>
# Check Events section for scheduling failures
# Add label to node if needed
kubectl label nodes <node-name> <label-key>=<label-value>
# Check node capacity
kubectl describe nodes
Missing Pods β
Description: Expected pods are not running.
Possible Causes:
- π§ Pod quota exceeded: Namespace has reached its resource quota
- π Service account missing in deployment: Required service account doesn't exist
Fix:
# Check events for quota issues
kubectl get events -n uat
# Create missing service account
kubectl create sa service-account-uat -n uat
Schrodinger's Deployment π±
Description: Multiple deployments sharing common selectors causing pod management issues.
Problem: Using common selectors like version=1
across multiple deployments.
Fix:
# Check affected pods
kubectl get pods -l version=1
# Verify endpoints
kubectl get endpoints
# Use unique selectors for each deployment
CreateContainerError / CreateContainerConfigError βοΈ
CreateContainerConfigError:
- π Missing Secret
- π Missing ConfigMap
- π Missing environment variable
CreateContainerError:
- β Missing entrypoint or command
- β Invalid container configuration
Fix:
kubectl describe pod <pod-name> -n <namespace>
# Check Events section for specific error
# Verify ConfigMap exists
kubectl get configmap -n <namespace>
# Verify Secret exists
kubectl get secret -n <namespace>
Config Out of Date π
Description: ConfigMap or Secret changes not reflected in running pods.
Cause: ConfigMaps and Secrets are mounted at pod creation time.
Fix:
# Option 1: Rollout restart
kubectl rollout restart deployment/<deployment-name> -n <namespace>
# Option 2: Use reloader controller
# Install and configure reloader to automatically restart pods on config changes
Endless Terminating State βΎοΈ
Description: Pod stuck in Terminating state.
Possible Causes:
- Finalizer preventing deletion
- Node where pod was running is unavailable
Fix:
# Force delete the pod
kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0
# Check for finalizers
kubectl get pod <pod-name> -n <namespace> -o yaml | grep finalizers -A5
Field Immutability π
Description: Cannot update certain fields after resource creation.
Problem: Metadata fields like matchLabels
cannot be changed directly.
Fix:
# β Delete and re-create the deployment
kubectl delete deployment <deployment-name> -n <namespace>
kubectl apply -f <deployment-file>.yaml
EnableServiceLinks Issue π
Description: Too many environment variables created for services.
Problem: By default, Kubernetes creates environment variables for all services.
Fix:
spec:
template:
spec:
enableServiceLinks: false
Network Policy Issues π
Description: Pods cannot communicate due to network policies.
Diagnosis:
# Check network policies
kubectl get netpol -n uat
# Describe network policy
kubectl describe netpol <policy-name> -n uat
Verify:
- Ingress rules (incoming traffic)
- Egress rules (outgoing traffic)
- Pod selectors
- Namespace selectors
Multi-Attach Volume Error πΎ
Description: Volume cannot be attached to multiple pods on different nodes.
Quick Fix:
# Scale down to 0
kubectl scale deployment/<deployment-name> --replicas=0 -n <namespace>
# Scale back to 1
kubectl scale deployment/<deployment-name> --replicas=1 -n <namespace>
Recommended Fix: Use Recreate
strategy in deployment:
spec:
strategy:
type: Recreate
Persistent Volume Access Modes:
- β RWO (ReadWriteOnce): Volume can be mounted as read-write by a single node
- β RWX (ReadWriteMany): Volume can be mounted as read-write by multiple nodes
- β ROX (ReadOnlyMany): Volume can be mounted as read-only by multiple nodes
Check PV Access Mode:
kubectl get pv
kubectl describe pv <pv-name>
Quick Reference Cheat Sheet
Most Used Commands
# Get resources
kubectl get pods -n <namespace>
kubectl get all -A
# Describe resources
kubectl describe pod <pod-name> -n <namespace>
kubectl describe node <node-name>
# View logs
kubectl logs <pod-name> -n <namespace>
kubectl logs -f <pod-name> -n <namespace>
# Execute commands
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash
# Events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Resource usage
kubectl top nodes
kubectl top pods -n <namespace>
Debugging Workflow
- Check pod status:
kubectl get pods
- Describe pod:
kubectl describe pod <pod-name>
- Check events:
kubectl get events
- View logs:
kubectl logs <pod-name>
- Check resource usage:
kubectl top pod <pod-name>
- Exec into container:
kubectl exec -it <pod-name> -- /bin/bash
Document Version: 1.0
Last Updated: October 2025
Top comments (0)