Sadeek M

Posted on Dec 5, 2024 • Edited on Dec 8, 2024

Debugging Kubernetes cluster part 2

#kubernetes #linux #debug #debugging

Debugging a Kubernetes cluster requires a deep understanding of its components and inter dependencies. Here’s a comprehensive Part 2 guide focusing on advanced debugging techniques for common cluster issues:

Node Issues

A. Node Not Ready

Check Node Status:bash

kubectl get nodes
kubectl describe node <node-name>

Inspect Kubelet Logs:
SSH into the node and review logs for errors:bash

journalctl -u kubelet -l

Possible Causes:
Resource exhaustion (e.g., CPU, memory, disk).
Misconfigured networking (e.g., unable to reach the API server).
Issues with container runtime (Docker, containerd).
B. Node Disk Pressure or Memory Pressure

Check Allocations:bash

kubectl describe node <node-name> | grep Allocated

Clean Up Disk Space:
Remove unused images and logs:bash

docker system prune

Reconfigure Resource Limits:
Adjust resource requests and limits for pods.

Pod Issues

A. Pod Stuck in Pending

Inspect Events:bash

kubectl describe pod <pod-name>

Possible Causes:
Insufficient resources: Check node capacity and pod requests.
Scheduling constraints: Inspect nodeSelector, taints, and tolerations.
Networking issues: Ensure the CNI plugin is functioning correctly.
B. CrashLoopBackOff

View Logs:bash

kubectl logs <pod-name> --previous

Check Events:bash

kubectl describe pod <pod-name>

Debugging Steps:
Ensure the container's entrypoint is correct.
Verify environment variables and mounted volumes.
Test locally using the same image.
C. Container Image Pull Issues

Inspect Events:bash

kubectl describe pod <pod-name>

Common Errors:
Unauthorized: Verify image pull secrets.
Image not found: Confirm the image exists in the registry.

Networking Issues

A. Pods Can't Communicate

Ping Other Pods:bash

kubectl exec -it <pod-name> -- ping <pod-ip>

Check Network Policies:bash

kubectl get networkpolicy -n <namespace>

Debugging CNI Plugins:
Inspect CNI logs:bash

cat /var/log/containers/<cni-plugin-name>*.log

B. Service Not Accessible

Check Service Description:bash

kubectl describe svc <service-name>

Inspect Endpoints:bash

kubectl get endpoints <service-name>

Test Connectivity:
From within a pod:bash

curl http://<service-name>.<namespace>:<port>

API Server Issues

Inspect Logs:bash

journalctl -u kube-apiserver

Test API Server Availability:bash

kubectl get --raw /healthz

Common Causes:
SSL/TLS issues: Check certificates and CA bundle.
Resource bottlenecks: Monitor CPU/memory usage.

Persistent Volume Issues

A. PVC Pending

Inspect Events:bash

kubectl describe pvc <pvc-name>

Common Causes:
No matching StorageClass.
Insufficient storage on nodes.
B. PV Bound But Pod Can't Mount

Inspect Logs:bash

kubectl logs <pod-name>

Debugging Steps:
Verify volume permissions.
Test mounting the volume manually on a node.

Cluster DNS Issues

Test DNS Resolution:bash

kubectl exec -it -- nslookup

Inspect CoreDNS Logs:bash

kubectl logs -n kube-system <coredns-pod-name>

Common Fixes:
Restart CoreDNS pods if unresponsive.
Validate ConfigMap for CoreDNS (kubectl get cm -n kube-system coredns).

Troubleshooting Tools

A. kubectl Debugging Tools

Debug running pods:bash

kubectl exec -it <pod-name> -- /bin/sh

Debug containers with ephemeral containers (Kubernetes v1.18+):bash

kubectl debug -it <pod-name> --image=busybox

B. Third-Party Tools

Lens: GUI for Kubernetes cluster monitoring.
K9s: Terminal-based cluster management.
kubectl-trace: System-level tracing for Kubernetes.
C. Logs Aggregation

Use tools like Fluentd, ELK Stack, or Loki for centralized logging.

Proactive Cluster Monitoring

Implement monitoring systems like Prometheus, Grafana, or Datadog.
Set up alerting for critical metrics (e.g., node health, pod restarts).

Example: Debugging Workflow for a Non-Responsive Service

Check Pod Status:bash

kubectl get pods -n <namespace>

Describe the Service:bash

kubectl describe svc <service-name> -n <namespace>

Inspect Logs:bash

kubectl logs <pod-name> -n <namespace>

Test Connectivity:
From within a cluster:bash

curl http://<service-name>.<namespace>:<port>

From outside:bash

curl http://<external-ip>:<port>

This deeper dive equips you to troubleshoot and resolve complex Kubernetes issues effectively. Let me know if you'd like specific scenarios or additional examples!

DEV Community

Debugging Kubernetes cluster part 2

Top comments (0)