<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: KubeGraf</title>
    <description>The latest articles on DEV Community by KubeGraf (@kubegraf_io).</description>
    <link>https://dev.to/kubegraf_io</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3691480%2F369101a2-4e44-4877-bdf3-1d6a7f9b0a28.jpg</url>
      <title>DEV Community: KubeGraf</title>
      <link>https://dev.to/kubegraf_io</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kubegraf_io"/>
    <language>en</language>
    <item>
      <title>🚀 KubeGraf — Smarter Kubernetes Incident Response, Today</title>
      <dc:creator>KubeGraf</dc:creator>
      <pubDate>Wed, 14 Jan 2026 01:41:13 +0000</pubDate>
      <link>https://dev.to/kubegraf_io/kubegraf-smarter-kubernetes-incident-response-today-5b5a</link>
      <guid>https://dev.to/kubegraf_io/kubegraf-smarter-kubernetes-incident-response-today-5b5a</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i7tvqxntl9tdfyj1ars.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i7tvqxntl9tdfyj1ars.jpeg" alt="KubeGraf — Smarter Kubernetes Incident Response, Today" width="800" height="1356"&gt;&lt;/a&gt;🚀&lt;/p&gt;

&lt;p&gt;KubeGraf is the first product in the Kontrolity platform — the AI control layer for autonomous infrastructure intelligence. It’s designed for engineers who face real production incidents and need fast, evidence-backed decisions without relying on SaaS, telemetry, or blind automation.&lt;/p&gt;

&lt;p&gt;Why KubeGraf matters:&lt;br&gt;
• Detect incidents instantly — CrashLoopBackOff, OOMKilled, probe failures, deployment issues&lt;br&gt;
• Diagnose with evidence — correlates logs, events, metrics, and configs automatically&lt;br&gt;
• Preview fixes safely — dry-run validation keeps humans in control&lt;br&gt;
• Run locally — all data stays on your machine, no vendor lock-in&lt;/p&gt;

&lt;p&gt;Early adopters are already seeing results:&lt;br&gt;
• 80% reduction in MTTR&lt;br&gt;
• 40% fewer escalations to senior engineers&lt;br&gt;
• 60% time recovered for building new features&lt;/p&gt;

&lt;p&gt;KubeGraf is just the beginning. Kontrolity is building the future of autonomous infrastructure, where AI systems manage incidents, predict failures, and take action — so teams can focus on innovation.&lt;/p&gt;

&lt;p&gt;Learn more: kubegraf.io | Platform vision: kontrolity.com&lt;/p&gt;

&lt;h1&gt;
  
  
  KubeGraf #KubēGraf #Kubernetes #SRE #DevOps #PlatformEngineering #IncidentResponse #AutonomousInfrastructure #DeveloperTools #InfraAI #Kontrolity
&lt;/h1&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>incident</category>
      <category>cicd</category>
    </item>
    <item>
      <title>A Local-First Way to Debug Kubernetes Incidents: KubeGraf</title>
      <dc:creator>KubeGraf</dc:creator>
      <pubDate>Sat, 03 Jan 2026 16:11:41 +0000</pubDate>
      <link>https://dev.to/kubegraf_io/a-local-first-way-to-debug-kubernetes-incidents-kubegraf-3ihd</link>
      <guid>https://dev.to/kubegraf_io/a-local-first-way-to-debug-kubernetes-incidents-kubegraf-3ihd</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Detect incidents, understand root causes with evidence analysis, and safely preview fixes—all running locally on your machine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwvfsnutcfzkl9hlz7wf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwvfsnutcfzkl9hlz7wf.png" alt=" " width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes becomes real during incidents—not during tutorials.&lt;/p&gt;

&lt;p&gt;When production is down, alerts are firing, and users are impacted, the hardest part isn’t running kubectl. The hardest part is that the truth is scattered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pods and objects in kubectl&lt;/li&gt;
&lt;li&gt;events in another place&lt;/li&gt;
&lt;li&gt;logs somewhere else&lt;/li&gt;
&lt;li&gt;rollout history buried in tooling&lt;/li&gt;
&lt;li&gt;“what changed?” living in people’s memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;And during an incident, everyone needs the same answer fast:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What’s going on in this cluster right now?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Recently, while exploring tools that make incident response less chaotic, I came across KubeGraf — a local-first Kubernetes incident response control plane designed to reduce cognitive load when time matters most.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KubeGraf’s promise is simple:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Detect incidents, understand root causes with evidence analysis, and safely preview fixes — all running locally on your machine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why this matters during real incidents
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;On-call engineers usually care about a few core questions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What changed recently?&lt;/li&gt;
&lt;li&gt;Is the issue isolated or system-wide?&lt;/li&gt;
&lt;li&gt;Is it config/secrets, resources, rollout, or an external dependency?&lt;/li&gt;
&lt;li&gt;What’s the safest next step to restore service?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;But answering those questions often turns into tab-switching across:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;kubectl&lt;/li&gt;
&lt;li&gt;logs and events&lt;/li&gt;
&lt;li&gt;metrics dashboards&lt;/li&gt;
&lt;li&gt;deployment history / GitOps trails&lt;/li&gt;
&lt;li&gt;Slack threads and guesswork&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;KubeGraf tries to unify these signals into an incident-focused view, so you don’t have to manually stitch together a narrative under pressure.&lt;/p&gt;

&lt;p&gt;Importantly, KubeGraf isn’t trying to replace kubectl.&lt;br&gt;
It uses your existing &lt;strong&gt;~/.kube/config&lt;/strong&gt; and respects Kubernetes RBAC — the same access model teams already trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  What KubeGraf is (the mental model)
&lt;/h2&gt;

&lt;p&gt;KubeGraf is built around one idea:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Incidents are first-class objects.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Instead of showing raw error spam, it aims to structure what’s happening into:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an incident summary (human-readable)&lt;/li&gt;
&lt;li&gt;a timeline (what happened before/during/after)&lt;/li&gt;
&lt;li&gt;an evidence pack (events, logs, and object state supporting conclusions)&lt;/li&gt;
&lt;li&gt;recommendations that stay grounded in evidence&lt;/li&gt;
&lt;li&gt;safe fix previews (diff-first, apply only if approved)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That &lt;strong&gt;“incident-first”&lt;/strong&gt; framing matters because it matches how engineers actually work when production is failing: stabilize, understand, act safely, and document what happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  How you interact with it
&lt;/h2&gt;

&lt;p&gt;KubeGraf supports different workflows while keeping the same mental model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Terminal UI (TUI):&lt;/strong&gt; a fast, keyboard-driven interface (the kubegraf CLI) inspired by tools like k9s, but centered around incidents, context, and topology.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local Web Dashboard:&lt;/strong&gt; a browser-based dashboard focused on topology graphs, incident timelines, live event streams, and evidence views.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The point is consistency:&lt;/strong&gt; whether you’re in CLI mode or UI mode, you should still be looking at the same “incident picture.”&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A realistic incident scenario
&lt;/h2&gt;

&lt;p&gt;Imagine you’re on call. Someone messages:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Payments API is returning 500s in prod.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The typical response looks like this:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;switch context&lt;/li&gt;
&lt;li&gt;list pods&lt;/li&gt;
&lt;li&gt;spot CrashLoopBackOff&lt;/li&gt;
&lt;li&gt;open logs&lt;/li&gt;
&lt;li&gt;inspect events&lt;/li&gt;
&lt;li&gt;check rollout history&lt;/li&gt;
&lt;li&gt;compare ConfigMaps/Secrets&lt;/li&gt;
&lt;li&gt;guess what changed&lt;/li&gt;
&lt;li&gt;try a fix and hope it’s safe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;KubeGraf’s incident-focused workflow is meant to be more direct:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Failing pods/workloads are highlighted immediately&lt;/li&gt;
&lt;li&gt;A unified incident timeline correlates deploy/rollout updates, config/secret changes, failure events and restarts, and (where available) resource pressure signals&lt;/li&gt;
&lt;li&gt;An analysis panel can summarize likely root causes based on evidence (not vibes)&lt;/li&gt;
&lt;li&gt;Fix suggestions are shown as a preview first (diff + impact), not auto-applied&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The theme is consistent: faster understanding with evidence and safer next steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Evidence over “AI magic”
&lt;/h2&gt;

&lt;p&gt;One thing that stood out is the intent to keep diagnosis reproducible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instead of opaque answers, the system aims to show:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what signal triggered the incident&lt;/li&gt;
&lt;li&gt;what evidence supports the conclusion (events/log snippets/object diffs)&lt;/li&gt;
&lt;li&gt;confidence scores&lt;/li&gt;
&lt;li&gt;command transparency (what it ran or would run)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters in real operations. During incidents, teams don’t want a black box. They want a tool that can say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;“Here’s what I saw, here’s why I think this is happening, and here’s the exact change I would apply.”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Safe-by-default: preview fixes, don’t auto-remediate
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;KubeGraf’s posture is intentionally conservative:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;preview first&lt;/li&gt;
&lt;li&gt;you approve or reject&lt;/li&gt;
&lt;li&gt;rollback is explicit&lt;/li&gt;
&lt;li&gt;no blind automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That safety model fits how real teams operate, where accidental changes can be worse than the incident itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this could go next
&lt;/h2&gt;

&lt;p&gt;The direction around KubeGraf is especially interesting because it’s not just “make dashboards nicer.” It’s about building an incident intelligence layer on top of Kubernetes workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Post-launch, it could expand into capabilities like&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Incident replay/time-travel debugging for stronger postmortems&lt;/li&gt;
&lt;li&gt;Change attribution (“this started 7 minutes after X changed”) across deployments, config updates, image tags, and scaling events&lt;/li&gt;
&lt;li&gt;Exportable incident and fix history to build a reusable knowledge bank&lt;/li&gt;
&lt;li&gt;A separate Security &amp;amp; Diagnostics module (health checks, posture, attack/vulnerability summaries) without mixing it into incident workflows&lt;/li&gt;
&lt;li&gt;Deeper log-based intelligence for common production failure patterns (5xx spikes, upstream errors, misconfigurations)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This split—Incident Intelligence vs. Security &amp;amp; Diagnostics—is a strong product framing because the user intent is different in each mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Learn more
&lt;/h2&gt;

&lt;p&gt;Website: &lt;a href="https://kubegraf.io" rel="noopener noreferrer"&gt;https://kubegraf.io&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Documentation: &lt;a href="https://kubegraf.io/docs/" rel="noopener noreferrer"&gt;https://kubegraf.io/docs/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvzgmugdf1nnjxdp7zo9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvzgmugdf1nnjxdp7zo9.png" alt=" " width="800" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;Kubernetes has plenty of tools that are great for day-to-day operations.&lt;br&gt;
But incident response is a different mindset: speed, context, and safety matter more than feature count.&lt;/p&gt;

&lt;p&gt;KubeGraf’s local-first + evidence + safe preview fixes approach feels aligned with what on-call engineers actually need when things go wrong.&lt;/p&gt;

&lt;p&gt;If you’re an SRE/DevOps engineer, I’d love a reality check:&lt;/p&gt;

&lt;p&gt;What’s the slowest part of your Kubernetes incident workflow today?&lt;br&gt;
Change attribution? Context switching? noisy alerts? unsafe fixes?&lt;/p&gt;

&lt;p&gt;It uses your existing ~/.kube/config and respects Kubernetes RBAC — the same access model teams already trust.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>kubegraf</category>
      <category>sre</category>
    </item>
  </channel>
</rss>
