<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Trevor Chikambure</title>
    <description>The latest articles on DEV Community by Trevor Chikambure (@trevorchiks).</description>
    <link>https://dev.to/trevorchiks</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3739909%2F63ae9b7c-8b16-4cef-bae3-c75471f7c1de.jpg</url>
      <title>DEV Community: Trevor Chikambure</title>
      <link>https://dev.to/trevorchiks</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/trevorchiks"/>
    <language>en</language>
    <item>
      <title>Debugging Missing Kubernetes Events: A Deep Dive into the Event Spam Filter</title>
      <dc:creator>Trevor Chikambure</dc:creator>
      <pubDate>Thu, 29 Jan 2026 15:43:32 +0000</pubDate>
      <link>https://dev.to/trevorchiks/debugging-missing-kubernetes-events-a-deep-dive-into-the-event-spam-filter-37kj</link>
      <guid>https://dev.to/trevorchiks/debugging-missing-kubernetes-events-a-deep-dive-into-the-event-spam-filter-37kj</guid>
      <description>&lt;h5&gt;
  
  
  How I traced 13 init containers down to a hardcoded rate limit buried in client-go
&lt;/h5&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;While debugging a Kubernetes cluster, I noticed something odd: pods with many init containers were missing events. Specifically, pods with more than 8 init containers only showed events for the first 8-9 containers. The remaining containers ran successfully, but Kubernetes had no record of their lifecycle events.&lt;/p&gt;

&lt;p&gt;This could cause confusion and panic in production, missing events; blind spots in observability. When things go wrong, you need those events to debug.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Investigation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;First clue:&lt;/strong&gt; The events weren't failing randomly. It was consistent - always around 24-25 events, then nothing.&lt;br&gt;
I whipped up my &lt;code&gt;kind&lt;/code&gt; cluster, enabled audit logging on the API server to trace event creation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In kube-apiserver manifest&lt;/span&gt;
- &lt;span class="nt"&gt;--audit-policy-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/etc/kubernetes/audit-policy.yaml
- &lt;span class="nt"&gt;--audit-log-path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/var/log/kubernetes/audit.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Second clue:&lt;/strong&gt; The audit logs showed events for only the first 8 containers reaching the API server. So either they were never being emitted, or they were somehow failing to reach the API Server. Option 1 sounded easier to debug..&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third clue:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In /var/lib/kubelet/config.yaml&lt;/span&gt;
logging:
  verbosity: 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then&lt;br&gt;
&lt;code&gt;journalctl -u kubelet -n 200 | grep -A2 -B2 -i "event-storm-pod4"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I quickly had to turn that verbosity back down to 0 because wow!&lt;/p&gt;

&lt;p&gt;The kubelet logs showed it was broadcasting events for all 13 containers, but they never arrived at the API server.&lt;/p&gt;

&lt;p&gt;This was really confusing and took me the best part of a night to figure out, but it turns out broadcast != sent!!!&lt;/p&gt;
&lt;h3&gt;
  
  
  Whodunnit??
&lt;/h3&gt;

&lt;p&gt;Digging through the kubelet source code, I found the event spam filter in &lt;/p&gt;

&lt;p&gt;&lt;code&gt;staging/src/k8s.io/client-go/tools/record/events_cache.go&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;    &lt;span class="n"&gt;defaultSpamBurst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;25&lt;/span&gt;              &lt;span class="c"&gt;// ← The limit!&lt;/span&gt;
    &lt;span class="n"&gt;defaultSpamQPS&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="m"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="m"&gt;300.0&lt;/span&gt;    &lt;span class="c"&gt;// 1 event per 5 minutes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The spam filter groups events by (Source, InvolvedObject, Type) - notably not by container name. This means:&lt;/p&gt;

&lt;p&gt;Init container 1's &lt;code&gt;Started&lt;/code&gt; event&lt;br&gt;
Init container 10's &lt;code&gt;Started&lt;/code&gt; event&lt;br&gt;
...share the same spam key&lt;/p&gt;

&lt;p&gt;After 25 similar events (3 per container × 8 containers = 24 events), the spam filter kicks in and silently drops subsequent events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Proof&lt;/strong&gt;&lt;br&gt;
I modified &lt;code&gt;defaultSpamBurst = 30&lt;/code&gt; and rebuilt kubelet. Suddenly, all events appeared:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;kubectl get events | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"Pulled&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;Created&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;Started"&lt;/span&gt;
33  &lt;span class="c"&gt;# Success! (11 containers × 3 events each)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why This Matters
&lt;/h3&gt;

&lt;p&gt;These days Kubernetes pods commonly have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple init containers (data prep, migrations, etc.)&lt;/li&gt;
&lt;li&gt;Sidecar containers (service mesh, logging, monitoring)&lt;/li&gt;
&lt;li&gt;The main application container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A pod with 10-15 containers isn't unusual, and each generates 3 lifecycle events. You hit the 25-event spam limit easily.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Fix (Maybe)
&lt;/h3&gt;

&lt;p&gt;There's &lt;a href="https://github.com/kubernetes/kubernetes/pull/122942" rel="noopener noreferrer"&gt;a PR&lt;/a&gt; to include &lt;code&gt;fieldPath&lt;/code&gt; (container name) in the spam key, so each container gets independent rate limiting. However, it stalled due to concerns about event volume; too many container event could lead to throttling of other pod events.&lt;/p&gt;

&lt;p&gt;Possible solutions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Make the limit configurable&lt;/strong&gt; - Add &lt;code&gt;--event-spam-burst&lt;/code&gt; flag to kubelet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Include container name in spam key&lt;/strong&gt; - Treat each container independently&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-tier rate limiting&lt;/strong&gt; - Per-container + per-pod limits&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I've documented this in &lt;a href="https://github.com/kubernetes/kubernetes/issues/122904#issuecomment-3817813146" rel="noopener noreferrer"&gt;the original issue&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes has spam protection for events&lt;/strong&gt; - by design, to prevent event storms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The default limit (25) is hardcoded&lt;/strong&gt; and not configurable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-container pods can hit this limit&lt;/strong&gt; during normal operation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit logs are invaluable&lt;/strong&gt; for tracing API server behavior&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're debugging missing events, check if you have many containers generating similar events rapidly. You might be hitting the spam filter.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tools I used:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kind&lt;/code&gt; (local Kubernetes cluster)&lt;/li&gt;
&lt;li&gt;Audit logging (&lt;code&gt;--audit-policy-file&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;journalctl&lt;/code&gt; (kubelet logs)&lt;/li&gt;
&lt;li&gt;Kubernetes source code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Further reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/kubernetes/issues/122904" rel="noopener noreferrer"&gt;Original issue #122904&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/kubernetes/pull/122942" rel="noopener noreferrer"&gt;Stalled PR #122942&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/record/events_cache.go" rel="noopener noreferrer"&gt;Event spam filter code&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>kubernetes</category>
      <category>debugging</category>
      <category>cloudnative</category>
      <category>sre</category>
    </item>
  </channel>
</rss>
