In the previous article we saw how Indexer stores and indexes resource objects for fast local queries. Now we look at the other half of the controller pattern: WorkQueue — the component that bridges event callbacks and controller reconciliation workers.
Source: k8s.io/client-go/util/workqueue/
1. Where WorkQueue Fits
informer.AddEventHandler(ResourceEventHandlerFuncs{
AddFunc: func(obj) { queue.Add(keyOf(obj)) },
UpdateFunc: func(old, new) { queue.Add(keyOf(new)) },
DeleteFunc: func(obj) { queue.Add(keyOf(obj)) },
})
│
│ event fires → key enqueued
▼
┌─────────────────────────────────────────────────┐
│ WorkQueue │
│ - deduplicates keys │
│ - rate limits re-enqueues on error │
│ - supports multiple concurrent workers │
└─────────────────────────────────────────────────┘
│
│ worker goroutine calls queue.Get()
▼
Controller reconciliation logic
├── success → queue.Done(key)
└── error → queue.AddRateLimited(key) ← retry with backoff
Key pattern: Event handlers only enqueue the key (e.g.
"default/nginx"), never the full object. The worker fetches the latest object from Indexer at processing time — ensuring it always works with the most current state.
2. WorkQueue vs Plain FIFO
WorkQueue is not a simple queue. It adds critical properties that make it production-safe for controller development:
| Property | Plain FIFO | WorkQueue |
|---|---|---|
| Ordered (FIFO) | ✅ | ✅ |
| Concurrent producers & consumers | ❌ | ✅ |
| Deduplication | ❌ | ✅ |
| Rate limiting on retry | ❌ | ✅ |
| Prometheus metrics | ❌ | ✅ |
| Graceful shutdown (SIGTERM) | ❌ | ✅ |
Deduplication in detail
Event burst: Pod "nginx" Updated 5 times in 100ms
Plain FIFO: [nginx, nginx, nginx, nginx, nginx] ← processed 5 times
WorkQueue: [nginx] ← processed once ✅
Rule: if a key is already in the queue (not yet processed),
subsequent Add() calls for the same key are silently dropped.
The key is only re-eligible after Done() is called.
3. Three Queue Types
client-go ships three queue implementations, each building on the previous:
┌─────────────────────────────────────────────────────────┐
│ Type 1: FIFO Queue │
│ k8s.io/client-go/util/workqueue/queue.go │
│ Basic ordered queue with deduplication │
└─────────────────────────────┬───────────────────────────┘
│ extends
┌─────────────────────────────▼───────────────────────────┐
│ Type 2: Delaying Queue │
│ k8s.io/client-go/util/workqueue/delaying_queue.go │
│ Adds: AddAfter(item, duration) — insert after a delay │
└─────────────────────────────┬───────────────────────────┘
│ extends
┌─────────────────────────────▼───────────────────────────┐
│ Type 3: RateLimiting Queue ← used in custom controllers│
│ k8s.io/client-go/util/workqueue/rate_limiting_queue.go │
│ Adds: AddRateLimited, Forget, NumRequeues │
│ Uses a RateLimiter to compute delay before re-insert │
└─────────────────────────────────────────────────────────┘
In practice, custom controllers always use the RateLimiting Queue.
4. The RateLimiting Queue Interface
// The rate limiter: decides HOW LONG to delay a re-enqueue
type RateLimiter interface {
When(item interface{}) time.Duration // return delay for this item
Forget(item interface{}) // reset the item's failure history
NumRequeues(item interface{}) int // how many times has this item failed
}
// The queue interface: wraps DelayingInterface + rate limiting methods
type RateLimitingInterface interface {
DelayingInterface // inherits Add, Get, Done, Len, etc.
AddRateLimited(item interface{}) // enqueue with rate-limited delay
Forget(item interface{}) // clear failure history for this item
NumRequeues(item interface{}) int // query failure count
}
Typical controller worker pattern
func (c *Controller) runWorker() {
for c.processNextItem() {}
}
func (c *Controller) processNextItem() bool {
// Block until an item is available
key, quit := c.queue.Get()
if quit {
return false
}
defer c.queue.Done(key) // MUST call Done() when finished
err := c.reconcile(key.(string))
if err == nil {
// Success: clear the failure history
c.queue.Forget(key)
return true
}
if c.queue.NumRequeues(key) < maxRetries {
// Transient error: re-enqueue with rate-limited delay
c.queue.AddRateLimited(key)
return true
}
// Too many retries: give up and clear history
c.queue.Forget(key)
utilruntime.HandleError(err)
return true
}
5. Three Rate-Limiting Algorithms
Algorithm 1: Exponential Backoff (ItemExponentialFailureRateLimiter)
When the same item fails repeatedly, the delay grows exponentially with each failure.
Failure count → delay:
1st failure: base delay × 2^0 = 5ms
2nd failure: base delay × 2^1 = 10ms
3rd failure: base delay × 2^2 = 20ms
4th failure: base delay × 2^3 = 40ms
...
nth failure: min(base × 2^(n-1), maxDelay) ← capped at maxDelay
// Default values used in most controllers
workqueue.NewItemExponentialFailureRateLimiter(
5*time.Millisecond, // base delay
1000*time.Second, // max delay cap
)
Rate-limiting period: starts at AddRateLimited(), ends at Forget().
Timeline:
t=0ms AddRateLimited("nginx") → delay 5ms → enqueued at t=5ms
t=5ms Get("nginx") → reconcile → fails
t=5ms AddRateLimited("nginx") → delay 10ms → enqueued at t=15ms
t=15ms Get("nginx") → reconcile → fails
t=15ms AddRateLimited("nginx") → delay 20ms → enqueued at t=35ms
...
t=Xms Get("nginx") → reconcile → SUCCESS
Forget("nginx") → failure count reset to 0
Real-world example: Kubernetes Pod restart policy
Alwaysuses exponential backoff — that's why a crash-looping Pod shows increasing restart intervals (the familiarCrashLoopBackOffstate).
Algorithm 2: Token Bucket (BucketRateLimiter)
The token bucket algorithm controls the overall throughput of the queue, regardless of which specific items are being enqueued.
┌─────────────────────────────────────────────────────────┐
│ Token Bucket │
│ │
│ ┌──┬──┬──┬──┬──┐ │
│ │🪙│🪙│🪙│🪙│🪙│ ← bucket (fixed capacity) │
│ └──┴──┴──┴──┴──┘ │
│ ▲ ▲ │
│ tokens refilled bucket full: │
│ at fixed rate r/s stop refilling │
│ │
│ Each AddRateLimited() consumes one token │
│ No token available → item must wait │
└─────────────────────────────────────────────────────────┘
// Uses golang.org/x/time/rate under the hood
workqueue.NewMaxOfRateLimiter(
workqueue.NewItemExponentialFailureRateLimiter(5*time.Millisecond, 1000*time.Second),
&workqueue.BucketRateLimiter{
Limiter: rate.NewLimiter(rate.Limit(10), 100),
// 10 tokens/second refill rate, bucket capacity 100
},
)
How it works:
- Tokens are added to the bucket at a fixed rate (e.g. 10/sec)
- Bucket has a maximum capacity — once full, new tokens are discarded
- Each
AddRateLimited()call requires one token - If no token is available, the item is delayed until a token is refilled
Used in: Nginx rate limiting, Envoy Ratelimit, Sentinel — token bucket is one of the most widely adopted rate-limiting algorithms in distributed systems.
Algorithm 3: Counter with Fast/Slow Lanes (ItemFastSlowRateLimiter)
The counter algorithm limits how many items can be enqueued within a time window. WorkQueue extends the basic counter with two insertion rates: a fast lane and a slow lane.
Configuration:
fastDelay = 5ms ← interval between items in fast lane
slowDelay = 1000ms ← interval between items in slow lane
maxFastAttempts = 3 ← how many failures use fast lane before switching
Failure timeline for one item:
┌──────────────────────────────────────────────────────────┐
│ Attempt 1 → delay 5ms (fast lane, attempt 1/3) │
│ Attempt 2 → delay 5ms (fast lane, attempt 2/3) │
│ Attempt 3 → delay 5ms (fast lane, attempt 3/3) │
│ Attempt 4 → delay 1000ms (slow lane — maxFastAttempts │
│ exceeded) │
│ Attempt 5 → delay 1000ms (slow lane) │
│ ... │
└──────────────────────────────────────────────────────────┘
workqueue.NewItemFastSlowRateLimiter(
5*time.Millisecond, // fastDelay
1000*time.Millisecond,// slowDelay
3, // maxFastAttempts
)
Use case: Suitable for scenarios where a few quick retries are acceptable (transient network blips), but sustained failures should back off aggressively to avoid resource exhaustion.
6. Choosing a Rate Limiter
| Algorithm | Best for | Behavior |
|---|---|---|
| Exponential Backoff | Per-item retry (most common) | Delay doubles with each failure; resets on success |
| Token Bucket | Global throughput cap | Smooth rate limit across all items; burst-friendly |
| Fast/Slow Counter | Tiered retry strategy | Quick retries first, then aggressive backoff |
In practice, the default rate limiter used by controller-runtime and most Kubernetes controllers combines exponential backoff and token bucket via NewMaxOfRateLimiter — taking the maximum delay from both algorithms:
// Default: use whichever algorithm gives the LONGER delay
// This ensures both per-item backoff AND global rate cap are respected
workqueue.DefaultControllerRateLimiter()
// = NewMaxOfRateLimiter(
// NewItemExponentialFailureRateLimiter(5ms, 1000s),
// &BucketRateLimiter{rate.NewLimiter(rate.Limit(10), 100)},
// )
7. Summary
WorkQueue lifecycle in a controller:
Event fires (Add/Update/Delete)
│
▼
queue.Add(key) ← deduplicated: same key added once
│
▼
worker: queue.Get(key) ← blocks until item available
│
├── reconcile OK → queue.Forget(key) + queue.Done(key)
│
└── reconcile ERR → queue.AddRateLimited(key)
│
└── RateLimiter.When(key) → delay
AddAfter(key, delay) → DelayingQueue
→ re-enters queue after delay
| Component | Role |
|---|---|
| FIFO Queue | Base: ordered, deduplicated, concurrent-safe |
| Delaying Queue | Adds AddAfter(item, duration) for time-delayed insertion |
| RateLimiting Queue | Adds AddRateLimited — computes delay via RateLimiter before calling AddAfter |
| Exponential Backoff | Per-item delay that doubles on each failure — prevents hammering a broken resource |
| Token Bucket | Global throughput cap — prevents queue from overwhelming the API Server |
| Fast/Slow Counter | Tiered retry — fast initial retries, slow sustained retries |
WorkQueue is the safety net of every Kubernetes controller. Its combination of deduplication, rate limiting, and retry semantics ensures that controllers converge to desired state reliably — even under API Server failures, network partitions, or event storms.
Next in this series: EventBroadcaster: How Kubernetes Records and Distributes Cluster Events (Part 6)
Follow the series for more deep dives into Kubernetes development.
Top comments (0)