Under the Hood: How Argo Rollouts 1.8 Implements Canary Deployments with Kubernetes 1.33 and Prometheus 3.1
Canary deployments remain a gold standard for risk-free application rollouts, allowing teams to shift a small percentage of traffic to a new version before full cutover. Argo Rollouts 1.8, released alongside Kubernetes 1.33 and Prometheus 3.1, introduces critical under-the-hood optimizations to streamline this workflow. This article breaks down the integration, architecture, and technical implementation details of this stack.
Prerequisites and Stack Compatibility
Argo Rollouts 1.8 is purpose-built to leverage Kubernetes 1.33’s enhanced workload APIs, including stable support for Deployment and ReplicaSet lifecycle hooks, plus Prometheus 3.1’s native histogram metrics for low-latency canary analysis. Key compatibility notes:
- Kubernetes 1.33+ is required for Argo Rollouts’ new
Rolloutcontroller admission webhooks, which validate canary configuration at creation time. - Prometheus 3.1’s
prometheus-operatorv0.70+ integration enables automatic metric scraping for canary analysis rules. - Argo Rollouts 1.8 drops support for Kubernetes versions below 1.28, aligning with upstream Kubernetes deprecation policies.
Argo Rollouts 1.8 Canary Architecture
The core Argo Rollouts 1.8 canary workflow relies on three components, updated for K8s 1.33 and Prometheus 3.1:
- Rollout Controller: Watches
Rolloutcustom resources (CRs), manages canaryReplicaSetcreation, and updates KubernetesServiceandIngressobjects to split traffic. - Analysis Controller: Queries Prometheus 3.1 for canary health metrics, evaluates analysis templates, and signals the Rollout Controller to progress or abort the canary.
- Metrics Server: Aggregates real-time traffic and error rate metrics from K8s 1.33’s
kube-proxyand Prometheus 3.1 exporters.
Under-the-Hood Traffic Splitting with Kubernetes 1.33
Kubernetes 1.33 introduces stable support for Service traffic policy enhancements, which Argo Rollouts 1.8 uses to implement canary traffic splitting without third-party service meshes (though mesh integration is still supported). The workflow:
When a Rollout CR is updated with a new container image, the Rollout Controller:
- Creates a canary
ReplicaSetwith the new image, scaled to 0 replicas initially. - Updates the primary
Serviceselector to include arollout.argoproj.io/canary: "true"label for canary pods, androllout.argoproj.io/stable: "true"for stable pods. - Uses K8s 1.33’s
EndpointSliceAPI to split traffic between stable and canaryEndpointSliceobjects based on the canary percentage defined in theRolloutspec.
Example Rollout traffic splitting snippet:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: canary-demo
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 5m}
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 100
trafficRouting:
kubernetes:
service: canary-demo-svc
ingress:
name: canary-demo-ingress
selector:
matchLabels:
app: canary-demo
template:
metadata:
labels:
app: canary-demo
spec:
containers:
- name: demo-app
image: demo-app:v2.0.0
ports:
- containerPort: 8080
Prometheus 3.1 Integration for Canary Analysis
Argo Rollouts 1.8 leverages Prometheus 3.1’s native histogram and exponential bucket metrics to evaluate canary health with lower query latency than previous versions. The Analysis Controller polls Prometheus 3.1 at configurable intervals using the PrometheusQuery API, then compares results against user-defined success thresholds.
Example AnalysisTemplate for Prometheus 3.1:
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: prometheus-canary-analysis
spec:
metrics:
- name: error-rate
successCondition: result[0] < 0.01
failureCondition: result[0] > 0.05
provider:
prometheus:
address: http://prometheus.istio-system.svc:9090
query: |
sum(rate(http_requests_total{app="canary-demo", status=~"5.."}[5m])) /
sum(rate(http_requests_total{app="canary-demo"}[5m]))
Prometheus 3.1’s new remote_write optimizations reduce metric lag to under 1 second, ensuring Argo Rollouts 1.8 can make canary progression decisions in near real-time.
Key Optimizations in Argo Rollouts 1.8
Beyond K8s 1.33 and Prometheus 3.1 integration, Argo Rollouts 1.8 includes under-the-hood improvements:
- Reduced Rollout Controller memory usage by 30% via K8s 1.33’s shared informer cache optimizations.
- Native support for Prometheus 3.1’s
exemplarmetrics, enabling trace-to-metric correlation for canary debugging. - Improved canary abort logic: if Prometheus 3.1 reports a threshold breach, the Rollout Controller automatically scales down the canary
ReplicaSetand restores 100% traffic to the stable version within 2 seconds.
Conclusion
Argo Rollouts 1.8, paired with Kubernetes 1.33 and Prometheus 3.1, delivers a robust, low-latency canary deployment workflow without relying on complex service mesh configurations. The tight integration with K8s 1.33’s traffic routing APIs and Prometheus 3.1’s high-performance metrics engine makes it an ideal choice for teams running production Kubernetes workloads. For full release notes, refer to the Argo Rollouts 1.8 changelog.
Top comments (0)