DEV Community

Cover image for Debugging Missing Kubernetes Events: A Deep Dive into the Event Spam Filter
Trevor Chikambure
Trevor Chikambure

Posted on

Debugging Missing Kubernetes Events: A Deep Dive into the Event Spam Filter

How I traced 13 init containers down to a hardcoded rate limit buried in client-go

The Problem

While debugging a Kubernetes cluster, I noticed something odd: pods with many init containers were missing events. Specifically, pods with more than 8 init containers only showed events for the first 8-9 containers. The remaining containers ran successfully, but Kubernetes had no record of their lifecycle events.

This could cause confusion and panic in production, missing events; blind spots in observability. When things go wrong, you need those events to debug.

The Investigation

First clue: The events weren't failing randomly. It was consistent - always around 24-25 events, then nothing.
I whipped up my kind cluster, enabled audit logging on the API server to trace event creation:

# In kube-apiserver manifest
- --audit-policy-file=/etc/kubernetes/audit-policy.yaml
- --audit-log-path=/var/log/kubernetes/audit.log
Enter fullscreen mode Exit fullscreen mode

Second clue: The audit logs showed events for only the first 8 containers reaching the API server. So either they were never being emitted, or they were somehow failing to reach the API Server. Option 1 sounded easier to debug..

Third clue:

# In /var/lib/kubelet/config.yaml
logging:
  verbosity: 4
Enter fullscreen mode Exit fullscreen mode

then
journalctl -u kubelet -n 200 | grep -A2 -B2 -i "event-storm-pod4"

I quickly had to turn that verbosity back down to 0 because wow!

The kubelet logs showed it was broadcasting events for all 13 containers, but they never arrived at the API server.

This was really confusing and took me the best part of a night to figure out, but it turns out broadcast != sent!!!

Whodunnit??

Digging through the kubelet source code, I found the event spam filter in

staging/src/k8s.io/client-go/tools/record/events_cache.go:

    defaultSpamBurst = 25              // ← The limit!
    defaultSpamQPS   = 1.0 / 300.0    // 1 event per 5 minutes
Enter fullscreen mode Exit fullscreen mode

The spam filter groups events by (Source, InvolvedObject, Type) - notably not by container name. This means:

Init container 1's Started event
Init container 10's Started event
...share the same spam key

After 25 similar events (3 per container × 8 containers = 24 events), the spam filter kicks in and silently drops subsequent events.

The Proof
I modified defaultSpamBurst = 30 and rebuilt kubelet. Suddenly, all events appeared:

$ kubectl get events | grep -c "Pulled\|Created\|Started"
33  # Success! (11 containers × 3 events each)
Enter fullscreen mode Exit fullscreen mode

Why This Matters

These days Kubernetes pods commonly have:

  • Multiple init containers (data prep, migrations, etc.)
  • Sidecar containers (service mesh, logging, monitoring)
  • The main application container

A pod with 10-15 containers isn't unusual, and each generates 3 lifecycle events. You hit the 25-event spam limit easily.

The Fix (Maybe)

There's an PR to include fieldPath (container name) in the spam key, so each container gets independent rate limiting. However, it stalled due to concerns about event volume; too many container event could lead to throttling of other pod events.

Possible solutions:

  1. Make the limit configurable - Add --event-spam-burst flag to kubelet
  2. Include container name in spam key - Treat each container independently
  3. Two-tier rate limiting - Per-container + per-pod limits

I've documented this in the original issue.

Key Takeaways

  1. Kubernetes has spam protection for events - by design, to prevent event storms
  2. The default limit (25) is hardcoded and not configurable
  3. Multi-container pods can hit this limit during normal operation
  4. Audit logs are invaluable for tracing API server behavior

If you're debugging missing events, check if you have many containers generating similar events rapidly. You might be hitting the spam filter.


Tools I used:

  • kind (local Kubernetes cluster)
  • Audit logging (--audit-policy-file)
  • journalctl (kubelet logs)
  • The Kubernetes source code

Further reading:

Top comments (0)