DEV Community

James Lee
James Lee

Posted on

client-go Deep Dive: Operator in Practice — Building a Canary Release Controller

In the previous article we defined the Canary CRD. Now we build the Operator: the controller that watches Canary resources and drives the actual release process. This is the final article in the series — everything we've covered comes together here.


1. What Is an Operator?

Operator = CRD (schema) + Controller (reconciliation logic)

┌──────────────────────────────────────────────────────┐
│  Kubernetes Controller Manager                       │
│  manages built-in controllers:                       │
│  DeploymentController, ReplicaSetController, ...     │
└──────────────────────────────────────────────────────┘
                    +
┌──────────────────────────────────────────────────────┐
│  Your Operator                                       │
│  = CRD (Canary) + CanaryController                  │
│  encodes domain knowledge:                           │
│  "how to safely roll out a new version in batches"   │
└──────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Operator = a Kubernetes extension that solves domain-specific problems the platform itself doesn't know how to solve.


2. Project Structure

canary-controller/
├── main.go                          ← controller entrypoint
├── go.mod
├── manifests/
│   └── crd-canary.yaml              ← CRD definition (Part 8)
└── pkg/
    ├── apis/
    │   └── canarycontroller/
    │       ├── register.go          ← GroupName constant
    │       └── v1alpha1/
    │           ├── doc.go           ← code-gen annotations
    │           ├── types.go         ← Canary/CanarySpec/CanaryStatus
    │           └── register.go      ← register types with runtime.Scheme
    ├── controllers/
    │   └── canary_controller.go     ← all reconciliation logic
    ├── generated/                   ← auto-generated by code-generator
    │   ├── clientset/               ← typed Canary clientset
    │   ├── informers/               ← typed Canary informer
    │   └── listers/                 ← typed Canary lister
    └── signals/                     ← OS signal handling (SIGTERM/SIGINT)
Enter fullscreen mode Exit fullscreen mode

3. Generating the Typed Client Code

The built-in kubernetes.Clientset only supports built-in resources. For your CRD, you need a generated typed client:

git clone https://github.com/kubernetes/code-generator $GOPATH/src/k8s.io/code-generator

cd $GOPATH/src/k8s.io/code-generator && ./generate-groups.sh all \
  canary-controller/pkg/client \      ← output
  canary-controller/pkg/apis \        ← input: your types
  canarycontroller:v1alpha1
Enter fullscreen mode Exit fullscreen mode
Generated output Purpose
zz_generated.deepcopy.go DeepCopy() for all types
generated/clientset/ CanarycontrollerV1alpha1().Canaries(ns) — typed CRUD
generated/informers/ Typed CanaryInformer with Lister() and HasSynced()
generated/listers/ CanaryLister — fast local cache queries

4. Controller Struct & Wiring

type Controller struct {
    kubeclientset   kubernetes.Interface   // for built-in resources (Deployment)
    canaryclientset clientset.Interface    // for Canary resources (generated)

    deploymentsLister appslisters.DeploymentLister
    deploymentsSynced cache.InformerSynced
    canariesLister    listers.CanaryLister
    canariesSynced    cache.InformerSynced

    workqueue workqueue.RateLimitingInterface
    recorder  record.EventRecorder
}
Enter fullscreen mode Exit fullscreen mode

In NewController, two event subscriptions are registered:

① Canary Add/Update  → enqueueCanary()  → workqueue.Add(key)
  Direct: user changed Canary spec → reconcile immediately

② Deployment Add/Update/Delete → handleObject()
  → find owning Canary by name prefix → enqueueCanary()
  Indirect: managed Deployment changed state → re-check Canary
Enter fullscreen mode Exit fullscreen mode

5. Run → Worker → syncHandler

controller.Run(threadiness=2, stopCh)
     │
     ├── cache.WaitForCacheSync()     ← block until Indexer ready
     │
     └── 2× goroutine: runWorker()
             └── processNextWorkItem()
                   ├── workqueue.Get(key)
                   ├── syncHandler(key)
                   │     ├── OK  → workqueue.Forget(key)
                   │     └── ERR → workqueue.AddRateLimited(key)  ← backoff retry
                   └── workqueue.Done(key)
Enter fullscreen mode Exit fullscreen mode

syncHandler is the reconciliation core — it reads the Canary spec and routes to the correct release strategy:

func (c *Controller) syncHandler(key string) error {
    // ① Fetch from local cache — no API Server call
    canary, err := c.canariesLister.Canaries(namespace).Get(name)

    // ② Route to release strategy
    switch canary.Spec.Info["type"] {
    case "NormalDeploy":   c.normalDeploy(...)
    case "CanaryDeploy":   c.firstCanaryDeploy(...) or c.notFirstCanaryDeploy(...)
    case "CanaryRollback": c.canaryRollback(...)
    }

    // ③ Update status + emit Event
    c.updateCanaryStatus(canary, deployment)
    c.recorder.Event(canary, corev1.EventTypeNormal, SuccessSynced, ...)
}
Enter fullscreen mode Exit fullscreen mode

6. The Three Release Strategies

NormalDeploy — Rolling Update

Deployment exists?
├── No  → Create
└── Yes → image / replicas / CPU / memory drifted? → Update
Enter fullscreen mode Exit fullscreen mode

Standard Kubernetes rolling update. No batching, no pause. The controller simply ensures the Deployment matches what's declared in the Canary spec.


CanaryDeploy — Batched Release

Splits the rollout into N batches. Batch 1 always pauses for human approval; subsequent batches auto-advance (when pauseType=First).

Example: 4 replicas, 2 batches, pauseType=First

Start:    [v1][v1][v1][v1]   old=4, new=0

Batch 1:  [v2][v1][v1][v1]   old=3, new=1
               │
               └── PAUSE — wait for: kubectl patch canary ... currentBatch=2

Batch 2:  [v2][v2][v2][v2]   old=0, new=4  → old Deployment deleted ✅
Enter fullscreen mode Exit fullscreen mode

Replica calculation per batch:

everyAddReplicas = round(totalReplicas / totalBatches)

batch N target replicas = 1 + (N-1) × everyAddReplicas
final batch             = totalReplicas (full rollout)

old replicas = totalReplicas - new.AvailableReplicas
Enter fullscreen mode Exit fullscreen mode

The controller continuously reconciles both Deployments — if anything drifts (pod crash, manual edit), the next reconcile loop corrects it automatically.


CanaryRollback — Reverse the Canary

Rollback type?
├── NormalDeploy rollback:
│     old.Name == new.Name → just update image back, no second Deployment
│
└── CanaryDeploy rollback:
      Restore old version replicas while draining new version
      new.AvailableReplicas → 0 → delete new Deployment ✅
Enter fullscreen mode Exit fullscreen mode

7. Status Reporting: updateCanaryStatus

After every reconcile, the controller updates the Canary's status subresource to reflect real-time progress:

CanaryDeploy:
  batch N running  → status.info["batchNStatus"] = "Ing"
  batch N complete → status.info["batchNStatus"] = "Finished"
  availableReplicas updated on every loop

CanaryRollback:
  old Deployment still exists → status.info["rollbackStatus"] = "Ing"
  old Deployment deleted      → status.info["rollbackStatus"] = "Finished"

NormalDeploy:
  no status update needed
Enter fullscreen mode Exit fullscreen mode

Always DeepCopy() before mutating — the object from Lister is a read-only reference to the Indexer cache. Modifying it directly corrupts the local cache.

canaryCopy := canary.DeepCopy()   // ✅ safe to mutate
canaryCopy.Status.Info[...] = "Finished"
c.canaryclientset.CanarycontrollerV1alpha1().
    Canaries(ns).UpdateStatus(canaryCopy, ...)
Enter fullscreen mode Exit fullscreen mode

8. Complete Execution Flow

main.go
  └── NewController(kubeclientset, canaryclientset, deployInformer, canaryInformer)
        ├── AddEventHandler(Canary)      enqueueCanary
        └── AddEventHandler(Deployment)  handleObject  enqueueCanary

factory.Start(stopCh)           Reflector List/Watch (Deployment + Canary)
cache.WaitForCacheSync(...)     wait for Indexer fully populated

controller.Run(2, stopCh)
  └── syncHandler("tech-prod/go-hello-canary")
        
        ├── canariesLister.Get()         Indexer (no API call)
        ├── switch type:
             NormalDeploy    Create or Update Deployment
             CanaryDeploy    Batch 1: new=1, old=N-1, PAUSE
                              Batch N: new=total, old=0, DELETE old
             CanaryRollback  drain new, restore old
        
        ├── updateCanaryStatus()         PATCH /status
        └── recorder.Event(SuccessSynced)
Enter fullscreen mode Exit fullscreen mode

9. Key Design Principles

Principle How
Never mutate cache objects Always DeepCopy() before modifying
Read cache, write API Lister.Get() for reads; clientset.Update() for writes
Idempotent reconciliation Every loop checks actual vs desired — safe to re-run
Ownership OwnerReferences ensures Deployments are GC'd with their Canary
Error → retry workqueue.AddRateLimited(key) — exponential backoff on failure

Series Complete 🎉

# Article Key concept
1 Informer List/Watch, SharedInformerFactory
2 Reflector ListerWatcher, ListAndWatch, ResourceVersion
3 DeltaFIFO queue+items, produce/consume, HandleDeltas
4 Indexer ThreadSafeMap, IndexFunc, ByIndex
5 WorkQueue Deduplication, Exponential Backoff, Token Bucket
6 EventBroadcaster Event resource, Recorder, Sink
7 Controller (internal) Run, processLoop, WaitForCacheSync
8 CRD Schema, Go types, spec/status contract
9 Operator in Practice Full canary release controller end-to-end

Thank you for following the series. If you found it useful, share it with your team!

Top comments (0)