ANKUSH CHOUDHARY JOHAL

Posted on May 2 • Originally published at johal.in

How to Debug a Kubernetes 1.32 Crash Loop with crictl 1.30 and journalctl

#debug #kubernetes #crash #loop

How to Debug Kubernetes 1.32 CrashLoopBackOff with crictl 1.30 and journalctl

Kubernetes 1.32 CrashLoopBackOff errors occur when a container repeatedly crashes and the kubelet restarts it with exponentially increasing backoff delays. Debugging these issues requires visibility into both container runtime activity and node-level system logs. This guide walks through using crictl 1.30 (compatible with Kubernetes 1.32’s CRI requirements) and journalctl to isolate root causes quickly.

Prerequisites

A running Kubernetes 1.32 cluster
crictl 1.30 installed on all worker nodes (matches Kubernetes 1.32 CRI version compatibility)
SSH access to cluster worker nodes
kubectl configured to access the cluster
Systemd-based node OS (for journalctl access; adjust if using non-systemd init)

Step 1: Identify the CrashLooping Pod

First, list pods in the affected namespace to find CrashLoopBackOff statuses:

kubectl get pods -n <namespace>

Note the pod name and namespace, then retrieve detailed pod metadata to confirm the target node and container names:

kubectl describe pod <pod-name> -n <namespace>

Look for the Node: field to identify which worker node the pod is scheduled to, and note the container name(s) under the Containers section.

Step 2: Access the Target Node

SSH into the worker node identified in Step 1, as crictl and journalctl are node-level tools that cannot be run remotely via kubectl:

ssh user@<node-ip>

Step 3: Inspect Containers with crictl 1.30

crictl interacts directly with the CRI-compatible container runtime (e.g., containerd, CRI-O) to retrieve low-level container data. First, list all containers (including stopped/crashed ones) to find the target container:

crictl ps -a

Match the pod name or container name from Step 1 to the output. You can also filter by pod sandbox ID for precision: first list pod sandboxes with crictl pods, then filter containers by the sandbox ID:

crictl pods | grep <pod-name>
crictl ps -a --pod=<sandbox-id>

Once you have the container ID, retrieve container logs (stdout/stderr from the application):

crictl logs <container-id>

For deeper inspection, view the container’s configuration and exit status:

crictl inspect <container-id> | grep -A 10 "status"

Note the exitCode and reason fields: common exit codes include 1 (application error), 137 (OOM killed), 126/127 (command not found/permission denied).

Step 4: Check Runtime and Kubelet Logs with journalctl

journalctl provides access to systemd service logs, including the container runtime and kubelet (the Kubernetes node agent responsible for pod lifecycle management).

First, check kubelet logs for pod-related errors, filtering by time since the crashes started:

journalctl -u kubelet --since "15 minutes ago" | grep -i <pod-name>

Next, check the container runtime logs (replace containerd with cri-o if using CRI-O):

journalctl -u containerd --since "15 minutes ago" | grep -i <container-id>

Look for errors such as image pull failures, volume mount issues, or runtime crashes. To check for OOM events across the node:

journalctl --since "15 minutes ago" | grep -i oom

Step 5: Correlate and Resolve Issues

Combine findings from crictl and journalctl to identify the root cause:

Application errors (exit code 1): Fix bugs in the containerized application, rebuild the image, and update the pod spec.
OOM kills (exit code 137): Increase the container’s memory limits in the pod spec.
Image pull errors: Verify the image name/tag, confirm registry authentication, and check network connectivity to the registry.
Volume mount failures: Check PVC status, node filesystem permissions, and volume plugin health.
Runtime bugs: Restart the container runtime (systemctl restart containerd) or upgrade to a patched version.

Step 6: Verify the Fix

After applying the fix, restart the pod to trigger a new deployment:

kubectl rollout restart deployment <deployment-name> -n <namespace>

Or manually delete the crashed pod to let the deployment recreate it:

kubectl delete pod <pod-name> -n <namespace>

Confirm the pod is running without errors:

kubectl get pods -n <namespace>

Conclusion

Debugging Kubernetes 1.32 CrashLoopBackOff errors requires visibility into both container-level runtime activity and node-level system events. crictl 1.30 provides direct access to CRI runtime data, while journalctl surfaces kubelet, runtime, and system errors. Using these tools together covers the full stack of potential failure points, reducing mean time to resolution for crash loop issues.

DEV Community