Debug a CrashLoopBackOff
Why this matters
CrashLoopBackOff is one of the most common Kubernetes incidents.
What you really want is not “what does CrashLoopBackOff mean?” but “why is this pod crashing and what should I try next?”
KubeGraf helps you go from a red pod to a plausible root cause and fix plan by combining logs, events, and an incident timeline.
Scenario: payments API keeps crashing in prod-cluster
kubectl config use-context prod-cluster
kubectl get pods -n payments
NAME READY STATUS RESTARTS AGE
payments-api-66cbd9d4dc-7xg9n 0/1 CrashLoopBackOff 5 2m31s
payments-api-66cbd9d4dc-87zc2 1/1 Running 0 5m12s
redis-payments-0 1/1 Running 0 10m
Step‑by‑step flow
1. Confirm the problem with kubectl
kubectl get pods -n payments
kubectl describe pod payments-api-66cbd9d4dc-7xg9n -n payments | sed -n '1,40p'
2. Open KubeGraf on the right cluster and namespace
kubegraf
- Press
c and select prod-cluster if needed.
- Press
n and select the payments namespace.
- Switch to the Pods view and filter with
/payments-api.
Tip: Use status filters (if available) to quickly highlight only unhealthy workloads (CrashLoopBackOff, Error, ImagePullBackOff).
3. Inspect logs and events through KubeGraf
2025-03-22T12:01:03Z ERROR payments-api Failed to start HTTP server: DB_CONNECTION_STRING not set
2025-03-22T12:01:03Z ERROR payments-api Exiting with code 1
4. Use the Incident Timeline and Brain Panel
- Incident Timeline shows a new deployment of
payments-api, a config map update, and failing probes.
- Brain Panel summarizes: “
payments-api started crashing after a new rollout. Logs show DB_CONNECTION_STRING not set. Check the associated config or secret.”
5. Fix the underlying issue
kubectl rollout undo deployment/payments-api -n payments
kubectl edit configmap payments-api-config -n payments
kubectl rollout status deployment/payments-api -n payments
kubectl get pods -n payments
Expected outcome
After following this workflow you should be able to:
- Take a CrashLoopBackOff from a red pod to a concrete, likely root cause.
- Use KubeGraf’s logs, events, and Incident Timeline instead of guessing from raw
kubectl output alone.
- Apply and validate a fix confidently, knowing what changed and why.