CrashLoopBackOff
Why this matters
CrashLoopBackOff means a container is repeatedly starting and crashing. It usually points to a real bug or a bad
configuration. Every restart wastes resources and delays user traffic; in production this often shows up as 5xx errors,
timeouts, or failing background jobs.
Tip: Always confirm kubectl config current-context and namespace before diving into logs.
Symptoms
kubectl get pods shows STATUS CrashLoopBackOff for one or more pods.
- Service or ingress in front of the workload is returning 5xx or timeouts.
- Containers start, run for a few seconds, then exit with a non-zero code.
- Logs show the same error pattern every time the pod restarts.
Common root causes
- Application startup failures (missing env vars, invalid config, missing migrations).
- Crash on dependency connection (DB not reachable, message broker auth failure).
- Image mismatch vs expected config (new image expects different env/flags).
- Fatal probe configuration (liveness/readiness probe failing on a new path/port).
- Secrets or ConfigMaps missing, wrong key names, or wrong mount paths.
How KubeGraf helps
- Highlights CrashLooping pods in the namespace so you don't hunt through raw
kubectl output.
- Shows restart counts, last state (e.g.
Error, exit code), and recent events next to the pod.
- Lets you pivot quickly between pod logs, events, Deployment, ConfigMap, and Secret.
- Incident timeline view helps correlate: image deployed → config changed → probes failing → CrashLoopBackOff.
Step-by-step using KubeGraf UI
1. Confirm the problem in the right cluster/namespace
kubectl config current-context
kubectl get pods -n <namespace>
- Start KubeGraf Terminal UI with
kubegraf.
- Ensure the context and namespace in KubeGraf match what you just checked.
2. Locate CrashLooping pods
- Open the Pods view for the affected namespace.
- Use filters to show only unhealthy pods (status
CrashLoopBackOff / Error).
- Note restart count, age, and container name (if multi-container pod).
3. Inspect recent events and reasons
- From the pod details, open Events.
- Look for messages such as
Back-off restarting failed container, probe failures, or image pull errors.
4. Inspect logs around the crash
2025-03-22T12:01:03Z ERROR app Failed to start HTTP server: DB_CONNECTION_STRING not set
2025-03-22T12:01:03Z ERROR app Exiting with code 1
- From the same pod, open Logs in KubeGraf and scroll to the last lines before exit.
- Capture the exact error message and exit code.
5. Check configuration linked to the pod
- From pod details, jump to its Deployment (or StatefulSet/Job).
- Review container image tag, env vars, and probes.
- Follow links to ConfigMaps and Secrets referenced by the pod and compare with what the app expects.
6. Use Incident Timeline / change history
- Open the Incident Timeline for this workload/namespace.
- Look for deploys, config updates, or probe changes just before the CrashLoopBackOff started.
7. Apply fix and watch recovery
- Typical fixes include reverting a bad config/Secret, fixing missing env vars, or correcting probe path/port.
kubectl rollout undo deployment/<name> -n <namespace>
kubectl edit configmap <config-name> -n <namespace>
- Use KubeGraf to watch new pods transition from
CrashLoopBackOff to Running.
What to check next
- Are other pods in the same Deployment also impacted, or only one replica?
- Does the issue correlate with a specific node (node-local problem)?
- Is the CrashLoop only in one namespace or across multiple environments (dev/staging/prod)?
Common mistakes
- Debugging the wrong cluster/namespace because the kubeconfig context was not checked.
- Only looking at logs and ignoring Events (probe misconfig is often obvious there).
- Fixing a single pod manually instead of changing the Deployment/ConfigMap/Secret.
- Rolling back an image without rolling back the config that was changed at the same time.
Related issues
Expected outcome
After following this playbook you should:
- Identify whether the CrashLoopBackOff is due to configuration, code, or environment.
- Know which change introduced the failure and either roll back or fix forward safely.
- See pods return to
Running and external symptoms (5xx, latency) disappear.
[ TODO: screenshot showing KubeGraf with a CrashLoopBackOff pod selected, logs + events visible. ]