🏠

KubeGraf Documentation

Local-first Kubernetes tool for detecting incidents, understanding root causes with evidence, and safely responding to failures—without SaaS lock-in.

What KubeGraf Does

Incident Detection

Automatically monitors for common Kubernetes failures:

  • CrashLoopBackOff - Containers repeatedly failing to start
  • OOMKilled - Pods killed due to memory limits
  • ImagePullBackOff - Failed image pulls
  • Probe failures - Liveness and readiness check failures
  • Restart storms - Excessive pod restarts

Evidence-Based Diagnosis

Correlates multiple data sources to explain failures:

  • Event timeline showing when failures occurred
  • Recent changes (deployments, configs, secrets)
  • Container logs with error extraction
  • Resource usage patterns
  • Related failures across the cluster

Safe Fix Previews

Preview changes before applying them:

  • Dry-run validation before any changes
  • Diff view showing exactly what will change
  • Rollback suggestions for failed deployments
  • Read-only mode by default
  • All actions require explicit confirmation

Knowledge Bank

Local incident storage for learning and reporting:

  • SQLite database stores all incident history
  • Search by pod, namespace, error type, or fix
  • Export reports for postmortems
  • Track recurring patterns across time
  • No data leaves your machine

Three Ways to Use KubeGraf

⌨️

Terminal UI

Keyboard-driven interface for SSH sessions and power users. Works over slow connections.

🌐

Web Dashboard

Browser-based UI with resource map visualization and real-time updates. Run locally with kubegraf web.

🚀

Modern SPA

Single-page app interface for teams. All three interfaces work with the same local backend.

Requirements

  • A working kubectl configuration
  • Access to a Kubernetes cluster (local or remote)
  • macOS, Linux, or Windows

Quick Install

curl -sSL https://kubegraf.io/install.sh | bash

See the Installation Guide for more options including Homebrew, Scoop, and manual installation.