<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: JOE</title>
    <description>The latest articles on DEV Community by JOE (@solojoe).</description>
    <link>https://dev.to/solojoe</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3984391%2F6dc215fd-ba97-4dc5-b88b-de3290759ee1.png</url>
      <title>DEV Community: JOE</title>
      <link>https://dev.to/solojoe</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/solojoe"/>
    <language>en</language>
    <item>
      <title>K8s Necromancer — a Black Box Flight Recorder for dead Kubernetes pods.</title>
      <dc:creator>JOE</dc:creator>
      <pubDate>Sun, 14 Jun 2026 20:58:57 +0000</pubDate>
      <link>https://dev.to/solojoe/k8s-necromancer-a-black-box-flight-recorder-for-dead-kubernetes-pods-4pa8</link>
      <guid>https://dev.to/solojoe/k8s-necromancer-a-black-box-flight-recorder-for-dead-kubernetes-pods-4pa8</guid>
      <description>&lt;p&gt;Every K8s operator knows the feeling, a pod dies, kubelet garbage-collects it in 30 seconds, and you're left staring at a CrashLoopBackOff with no context. The logs have rotated. The events are gone. You're switching between kubectl describe, Loki, Grafana, and deploy histories trying to reconstruct what happened.&lt;/p&gt;

&lt;p&gt;I got tired of this, so I built K8s Necromancer — a controller that intercepts pod deaths before GC and freezes the entire forensic state.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What it does&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a pod crashes, it captures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Container logs (previous container + fallback)&lt;/li&gt;
&lt;li&gt;K8s events timeline&lt;/li&gt;
&lt;li&gt;Resolved ENV vars (reads ConfigMaps and Secrets via K8s API)&lt;/li&gt;
&lt;li&gt;Full pod spec snapshot&lt;/li&gt;
&lt;li&gt;ConfigMap volume snapshots&lt;/li&gt;
&lt;li&gt;CPU/Memory sparklines from Prometheus (optional)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this goes into a Tomb CRD (lightweight metadata in etcd) + PersistentVolume (heavy data). SHA-256 dedup prevents duplicate tombs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6 capture triggers&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Restart count increase&lt;/li&gt;
&lt;li&gt;Pod phase = Failed&lt;/li&gt;
&lt;li&gt;Image pull error&lt;/li&gt;
&lt;li&gt;Non-zero exit code&lt;/li&gt;
&lt;li&gt;First restart detected&lt;/li&gt;
&lt;li&gt;Pending timeout (&amp;gt;10m, configurable)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Skips kube-system, necromancer namespaces, and clean job exits (exit=0).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The CLI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;necromancer autopsy &amp;lt;id&amp;gt;&lt;/code&gt; - generates a coroner's report with forensic timeline, resource sparklines, and ENV DIFF (compares against last healthy pod). Outputs to terminal or Markdown.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;necromancer resurrect &amp;lt;id&amp;gt;&lt;/code&gt; - spins up a ghost pod in a sandboxed namespace so you can inspect the dead pod's filesystem and config. Not to run the app but to investigate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;necromancer list / inspect / bury&lt;/code&gt; - browse, query, and clean up old tombs with dry-run support.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Safety (this was important to me)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ghost pods run in a locked-down namespace:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deny-all-egress NetworkPolicy (DNS allowed only)&lt;/li&gt;
&lt;li&gt;Secrets stripped by default (opt-in with --include-secret-volumes)&lt;/li&gt;
&lt;li&gt;No SA tokens, no host namespaces, probes removed&lt;/li&gt;
&lt;li&gt;Entrypoint overridden to sleep infinity&lt;/li&gt;
&lt;li&gt;LimitRange (500m/512Mi) + ResourceQuota (10 pods) enforced&lt;/li&gt;
&lt;li&gt;Controller runs as non-root (UID 1000, drops ALL capabilities)&lt;/li&gt;
&lt;li&gt;SSRF protection on Prometheus URL (blocks loopback, cloud metadata)&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Path traversal prevention on all API inputs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;HA&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Default deployment runs 2 replicas with leader election enabled + ReadWriteMany PVC (EFS, Filestore, CephFS, etc). Dev overlay for Kind/Minikube runs 1 replica with ReadWriteOnce.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stack&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Go 1.26, controller-runtime, Cobra CLI, Prometheus integration, Kustomize overlays, Kind-based e2e tests.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/privjoesrepos/k8s-necromancer" rel="noopener noreferrer"&gt;https://github.com/privjoesrepos/k8s-necromancer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Docker: &lt;a href="https://hub.docker.com/r/privjoesrepos/k8s-necromancer" rel="noopener noreferrer"&gt;https://hub.docker.com/r/privjoesrepos/k8s-necromancer&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MIT licensed.&lt;/p&gt;

&lt;p&gt;Thrilled to answer questions or take feedback.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>go</category>
      <category>docker</category>
      <category>github</category>
    </item>
  </channel>
</rss>
