DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

Under the Hood: How Argo Rollouts 1.8 Implements Canary Deployments with Kubernetes 1.33 and Prometheus 3.1

Under the Hood: How Argo Rollouts 1.8 Implements Canary Deployments with Kubernetes 1.33 and Prometheus 3.1

Canary deployments remain a gold standard for risk-free application rollouts, allowing teams to shift a small percentage of traffic to a new version before full cutover. Argo Rollouts 1.8, released alongside Kubernetes 1.33 and Prometheus 3.1, introduces critical under-the-hood optimizations to streamline this workflow. This article breaks down the integration, architecture, and technical implementation details of this stack.

Prerequisites and Stack Compatibility

Argo Rollouts 1.8 is purpose-built to leverage Kubernetes 1.33’s enhanced workload APIs, including stable support for Deployment and ReplicaSet lifecycle hooks, plus Prometheus 3.1’s native histogram metrics for low-latency canary analysis. Key compatibility notes:

  • Kubernetes 1.33+ is required for Argo Rollouts’ new Rollout controller admission webhooks, which validate canary configuration at creation time.
  • Prometheus 3.1’s prometheus-operator v0.70+ integration enables automatic metric scraping for canary analysis rules.
  • Argo Rollouts 1.8 drops support for Kubernetes versions below 1.28, aligning with upstream Kubernetes deprecation policies.

Argo Rollouts 1.8 Canary Architecture

The core Argo Rollouts 1.8 canary workflow relies on three components, updated for K8s 1.33 and Prometheus 3.1:

  1. Rollout Controller: Watches Rollout custom resources (CRs), manages canary ReplicaSet creation, and updates Kubernetes Service and Ingress objects to split traffic.
  2. Analysis Controller: Queries Prometheus 3.1 for canary health metrics, evaluates analysis templates, and signals the Rollout Controller to progress or abort the canary.
  3. Metrics Server: Aggregates real-time traffic and error rate metrics from K8s 1.33’s kube-proxy and Prometheus 3.1 exporters.

Under-the-Hood Traffic Splitting with Kubernetes 1.33

Kubernetes 1.33 introduces stable support for Service traffic policy enhancements, which Argo Rollouts 1.8 uses to implement canary traffic splitting without third-party service meshes (though mesh integration is still supported). The workflow:

When a Rollout CR is updated with a new container image, the Rollout Controller:

  1. Creates a canary ReplicaSet with the new image, scaled to 0 replicas initially.
  2. Updates the primary Service selector to include a rollout.argoproj.io/canary: "true" label for canary pods, and rollout.argoproj.io/stable: "true" for stable pods.
  3. Uses K8s 1.33’s EndpointSlice API to split traffic between stable and canary EndpointSlice objects based on the canary percentage defined in the Rollout spec.

Example Rollout traffic splitting snippet:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: canary-demo
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: {duration: 5m}
        - setWeight: 50
        - pause: {duration: 10m}
        - setWeight: 100
      trafficRouting:
        kubernetes:
          service: canary-demo-svc
          ingress:
            name: canary-demo-ingress
  selector:
    matchLabels:
      app: canary-demo
  template:
    metadata:
      labels:
        app: canary-demo
    spec:
      containers:
        - name: demo-app
          image: demo-app:v2.0.0
          ports:
            - containerPort: 8080
Enter fullscreen mode Exit fullscreen mode

Prometheus 3.1 Integration for Canary Analysis

Argo Rollouts 1.8 leverages Prometheus 3.1’s native histogram and exponential bucket metrics to evaluate canary health with lower query latency than previous versions. The Analysis Controller polls Prometheus 3.1 at configurable intervals using the PrometheusQuery API, then compares results against user-defined success thresholds.

Example AnalysisTemplate for Prometheus 3.1:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: prometheus-canary-analysis
spec:
  metrics:
    - name: error-rate
      successCondition: result[0] < 0.01
      failureCondition: result[0] > 0.05
      provider:
        prometheus:
          address: http://prometheus.istio-system.svc:9090
          query: |
            sum(rate(http_requests_total{app="canary-demo", status=~"5.."}[5m])) / 
            sum(rate(http_requests_total{app="canary-demo"}[5m]))
Enter fullscreen mode Exit fullscreen mode

Prometheus 3.1’s new remote_write optimizations reduce metric lag to under 1 second, ensuring Argo Rollouts 1.8 can make canary progression decisions in near real-time.

Key Optimizations in Argo Rollouts 1.8

Beyond K8s 1.33 and Prometheus 3.1 integration, Argo Rollouts 1.8 includes under-the-hood improvements:

  • Reduced Rollout Controller memory usage by 30% via K8s 1.33’s shared informer cache optimizations.
  • Native support for Prometheus 3.1’s exemplar metrics, enabling trace-to-metric correlation for canary debugging.
  • Improved canary abort logic: if Prometheus 3.1 reports a threshold breach, the Rollout Controller automatically scales down the canary ReplicaSet and restores 100% traffic to the stable version within 2 seconds.

Conclusion

Argo Rollouts 1.8, paired with Kubernetes 1.33 and Prometheus 3.1, delivers a robust, low-latency canary deployment workflow without relying on complex service mesh configurations. The tight integration with K8s 1.33’s traffic routing APIs and Prometheus 3.1’s high-performance metrics engine makes it an ideal choice for teams running production Kubernetes workloads. For full release notes, refer to the Argo Rollouts 1.8 changelog.

Top comments (0)