In the previous article we defined the Canary CRD. Now we build the Operator: the controller that watches Canary resources and drives the actual release process. This is the final article in the series — everything we've covered comes together here.
1. What Is an Operator?
Operator = CRD (schema) + Controller (reconciliation logic)
┌──────────────────────────────────────────────────────┐
│ Kubernetes Controller Manager │
│ manages built-in controllers: │
│ DeploymentController, ReplicaSetController, ... │
└──────────────────────────────────────────────────────┘
+
┌──────────────────────────────────────────────────────┐
│ Your Operator │
│ = CRD (Canary) + CanaryController │
│ encodes domain knowledge: │
│ "how to safely roll out a new version in batches" │
└──────────────────────────────────────────────────────┘
Operator = a Kubernetes extension that solves domain-specific problems the platform itself doesn't know how to solve.
2. Project Structure
canary-controller/
├── main.go ← controller entrypoint
├── go.mod
├── manifests/
│ └── crd-canary.yaml ← CRD definition (Part 8)
└── pkg/
├── apis/
│ └── canarycontroller/
│ ├── register.go ← GroupName constant
│ └── v1alpha1/
│ ├── doc.go ← code-gen annotations
│ ├── types.go ← Canary/CanarySpec/CanaryStatus
│ └── register.go ← register types with runtime.Scheme
├── controllers/
│ └── canary_controller.go ← all reconciliation logic
├── generated/ ← auto-generated by code-generator
│ ├── clientset/ ← typed Canary clientset
│ ├── informers/ ← typed Canary informer
│ └── listers/ ← typed Canary lister
└── signals/ ← OS signal handling (SIGTERM/SIGINT)
3. Generating the Typed Client Code
The built-in kubernetes.Clientset only supports built-in resources. For your CRD, you need a generated typed client:
git clone https://github.com/kubernetes/code-generator $GOPATH/src/k8s.io/code-generator
cd $GOPATH/src/k8s.io/code-generator && ./generate-groups.sh all \
canary-controller/pkg/client \ ← output
canary-controller/pkg/apis \ ← input: your types
canarycontroller:v1alpha1
| Generated output | Purpose |
|---|---|
zz_generated.deepcopy.go |
DeepCopy() for all types |
generated/clientset/ |
CanarycontrollerV1alpha1().Canaries(ns) — typed CRUD |
generated/informers/ |
Typed CanaryInformer with Lister() and HasSynced()
|
generated/listers/ |
CanaryLister — fast local cache queries |
4. Controller Struct & Wiring
type Controller struct {
kubeclientset kubernetes.Interface // for built-in resources (Deployment)
canaryclientset clientset.Interface // for Canary resources (generated)
deploymentsLister appslisters.DeploymentLister
deploymentsSynced cache.InformerSynced
canariesLister listers.CanaryLister
canariesSynced cache.InformerSynced
workqueue workqueue.RateLimitingInterface
recorder record.EventRecorder
}
In NewController, two event subscriptions are registered:
① Canary Add/Update → enqueueCanary() → workqueue.Add(key)
Direct: user changed Canary spec → reconcile immediately
② Deployment Add/Update/Delete → handleObject()
→ find owning Canary by name prefix → enqueueCanary()
Indirect: managed Deployment changed state → re-check Canary
5. Run → Worker → syncHandler
controller.Run(threadiness=2, stopCh)
│
├── cache.WaitForCacheSync() ← block until Indexer ready
│
└── 2× goroutine: runWorker()
└── processNextWorkItem()
├── workqueue.Get(key)
├── syncHandler(key)
│ ├── OK → workqueue.Forget(key)
│ └── ERR → workqueue.AddRateLimited(key) ← backoff retry
└── workqueue.Done(key)
syncHandler is the reconciliation core — it reads the Canary spec and routes to the correct release strategy:
func (c *Controller) syncHandler(key string) error {
// ① Fetch from local cache — no API Server call
canary, err := c.canariesLister.Canaries(namespace).Get(name)
// ② Route to release strategy
switch canary.Spec.Info["type"] {
case "NormalDeploy": c.normalDeploy(...)
case "CanaryDeploy": c.firstCanaryDeploy(...) or c.notFirstCanaryDeploy(...)
case "CanaryRollback": c.canaryRollback(...)
}
// ③ Update status + emit Event
c.updateCanaryStatus(canary, deployment)
c.recorder.Event(canary, corev1.EventTypeNormal, SuccessSynced, ...)
}
6. The Three Release Strategies
NormalDeploy — Rolling Update
Deployment exists?
├── No → Create
└── Yes → image / replicas / CPU / memory drifted? → Update
Standard Kubernetes rolling update. No batching, no pause. The controller simply ensures the Deployment matches what's declared in the Canary spec.
CanaryDeploy — Batched Release
Splits the rollout into N batches. Batch 1 always pauses for human approval; subsequent batches auto-advance (when pauseType=First).
Example: 4 replicas, 2 batches, pauseType=First
Start: [v1][v1][v1][v1] old=4, new=0
Batch 1: [v2][v1][v1][v1] old=3, new=1
│
└── PAUSE — wait for: kubectl patch canary ... currentBatch=2
Batch 2: [v2][v2][v2][v2] old=0, new=4 → old Deployment deleted ✅
Replica calculation per batch:
everyAddReplicas = round(totalReplicas / totalBatches)
batch N target replicas = 1 + (N-1) × everyAddReplicas
final batch = totalReplicas (full rollout)
old replicas = totalReplicas - new.AvailableReplicas
The controller continuously reconciles both Deployments — if anything drifts (pod crash, manual edit), the next reconcile loop corrects it automatically.
CanaryRollback — Reverse the Canary
Rollback type?
├── NormalDeploy rollback:
│ old.Name == new.Name → just update image back, no second Deployment
│
└── CanaryDeploy rollback:
Restore old version replicas while draining new version
new.AvailableReplicas → 0 → delete new Deployment ✅
7. Status Reporting: updateCanaryStatus
After every reconcile, the controller updates the Canary's status subresource to reflect real-time progress:
CanaryDeploy:
batch N running → status.info["batchNStatus"] = "Ing"
batch N complete → status.info["batchNStatus"] = "Finished"
availableReplicas updated on every loop
CanaryRollback:
old Deployment still exists → status.info["rollbackStatus"] = "Ing"
old Deployment deleted → status.info["rollbackStatus"] = "Finished"
NormalDeploy:
no status update needed
Always
DeepCopy()before mutating — the object from Lister is a read-only reference to the Indexer cache. Modifying it directly corrupts the local cache.
canaryCopy := canary.DeepCopy() // ✅ safe to mutate
canaryCopy.Status.Info[...] = "Finished"
c.canaryclientset.CanarycontrollerV1alpha1().
Canaries(ns).UpdateStatus(canaryCopy, ...)
8. Complete Execution Flow
main.go
└── NewController(kubeclientset, canaryclientset, deployInformer, canaryInformer)
├── AddEventHandler(Canary) → enqueueCanary
└── AddEventHandler(Deployment) → handleObject → enqueueCanary
factory.Start(stopCh) → Reflector List/Watch (Deployment + Canary)
cache.WaitForCacheSync(...) → wait for Indexer fully populated
controller.Run(2, stopCh)
└── syncHandler("tech-prod/go-hello-canary")
│
├── canariesLister.Get() ← Indexer (no API call)
├── switch type:
│ NormalDeploy → Create or Update Deployment
│ CanaryDeploy → Batch 1: new=1, old=N-1, PAUSE
│ Batch N: new=total, old=0, DELETE old
│ CanaryRollback → drain new, restore old
│
├── updateCanaryStatus() ← PATCH /status
└── recorder.Event(SuccessSynced)
9. Key Design Principles
| Principle | How |
|---|---|
| Never mutate cache objects | Always DeepCopy() before modifying |
| Read cache, write API |
Lister.Get() for reads; clientset.Update() for writes |
| Idempotent reconciliation | Every loop checks actual vs desired — safe to re-run |
| Ownership |
OwnerReferences ensures Deployments are GC'd with their Canary |
| Error → retry |
workqueue.AddRateLimited(key) — exponential backoff on failure |
Series Complete 🎉
| # | Article | Key concept |
|---|---|---|
| 1 | Informer | List/Watch, SharedInformerFactory |
| 2 | Reflector | ListerWatcher, ListAndWatch, ResourceVersion |
| 3 | DeltaFIFO | queue+items, produce/consume, HandleDeltas |
| 4 | Indexer | ThreadSafeMap, IndexFunc, ByIndex |
| 5 | WorkQueue | Deduplication, Exponential Backoff, Token Bucket |
| 6 | EventBroadcaster | Event resource, Recorder, Sink |
| 7 | Controller (internal) | Run, processLoop, WaitForCacheSync |
| 8 | CRD | Schema, Go types, spec/status contract |
| 9 | Operator in Practice | Full canary release controller end-to-end |
Thank you for following the series. If you found it useful, share it with your team!
Top comments (0)