Service Mesh in 2026: The Landscape Has Changed (Istio Ambient Mode Update)
A Confession: My Previous Post Was Already Outdated
Last week, I published an article about why service mesh never took off, based on my experiences from years ago. The challenges I described were real:
- 30-90% infrastructure cost increase with sidecar-based architectures
- Per-pod sidecar overhead (50 pods = 50 extra containers)
- Complex upgrades and troubleshooting that killed adoption
Days after publishing, I discovered the landscape had fundamentally changed. The service mesh story I told was based on outdated knowledge from 2017-2023. I considered updating that post, but decided to leave it as-is (a historical perspective) and write this follow-up instead.
The truth: While I was away from the service mesh world, Istio evolved dramatically. What I experienced years ago is no longer the reality in 2026.
What Changed: AWS Gave Up, Istio Evolved
1. AWS App Mesh: Deprecated
In 2024, AWS announced the deprecation of App Mesh, their managed service mesh offering. This validates exactly what I wrote in my previous post—the economics didn't work.
AWS's reasoning:
- High operational overhead for customers
- Better alternatives emerged (AWS observability services, application-level instrumentation)
- Limited adoption outside large enterprises
Key insight: Even AWS, with infinite resources, couldn't make the traditional sidecar model economically viable for most customers.
2. Istio Ambient Mode: The Game Changer
While AWS retreated, Istio made a bold architectural shift. Ambient mode (GA in 2024/2025) eliminates per-pod sidecars entirely, replacing them with:
- Node-level proxies (ztunnel): 1 DaemonSet pod per node instead of N sidecars
- Optional service-level proxies (Waypoint): Deployed only for services needing advanced L7 features
This is Kubernetes-native innovation—only viable in K8s environments (EKS, GKE, self-managed). It fundamentally changes the cost equation.
The Ambient Architecture: Node-Level vs Pod-Level
Before: Traditional Sidecar Mode
┌─────────────────────────────────────────┐
│ Your Application Pod │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ App │ │ Envoy Sidecar│ │
│ │ Container │◄───┤ Proxy │ │
│ │ │ │ │ │
│ │ 500m CPU │ │ 100m CPU │ │ ← 20% overhead PER POD
│ │ 512Mi RAM │ │ 128Mi RAM │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────┘
50 pods × 100m CPU = 5 vCPU overhead
50 pods × 128Mi = 6.4GB overhead
Problems:
- Every pod needs sidecar = 100 containers for 50 apps
- Sidecar upgrades = restart all application pods
- Debug complexity = app logic + sidecar config
After: Ambient Mode (ztunnel + Waypoint)
Layer 1: ztunnel (L4 - TCP/Connection Level)
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Node │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ App Pod 1 │ │ App Pod 2 │ │ App Pod 3 │ │
│ │ (No sidecar!)│ │ (No sidecar!)│ │ (No sidecar!)│ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ ▼ │
│ ┌────────────────┐ │
│ │ ztunnel │ ◄─ DaemonSet │
│ │ (L4 Proxy) │ (1 per node) │
│ │ │ │
│ │ 100m CPU │ │
│ │ 128Mi RAM │ │
│ └────────────────┘ │
└─────────────────────────────────────────────────────────────┘
3 nodes × 100m CPU = 0.3 vCPU overhead
3 nodes × 128Mi = 0.4GB overhead
Savings: 94% reduction in proxy overhead!
ztunnel provides:
- ✅ mTLS encryption between all services (zero-trust networking)
- ✅ L4 connection metrics (bytes, connections, TCP stats)
- ✅ Service authentication and authorization
- ✅ Kiali service graph visualization
- ❌ No L7 features (circuit breakers, retries require Waypoint)
Layer 2: Waypoint (L7 - HTTP/gRPC Level) - Optional
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Frontend │──────►│ Waypoint │──────►Backend │
│ │ Service │ │ Proxy │ Service │
│ │ (10 pods) │ │ │ (5 pods) │
│ └──────────────┘ │ 200m CPU │ └──────────┘│
│ │ 256Mi RAM │ │
│ │ │ │
│ │ • Circuit │ │
│ Deploy only for │ breakers │ │
│ services needing │ • Retries │ │
│ advanced L7 features │ • Timeouts │ │
│ │ • Tracing │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
1 Waypoint serves 10 frontend pods (10:1 ratio)
Not 10 sidecars serving 10 pods (1:1 ratio)
Savings: 90% reduction even with L7 features!
Waypoint adds:
- ✅ Circuit breakers (prevent cascading failures)
- ✅ HTTP-level retries, timeouts, traffic splitting
- ✅ Request-level metrics (latency, status codes, throughput)
- ✅ Distributed tracing (Jaeger integration)
- ✅ Canary deployments, A/B testing
Key strategy: Deploy Waypoints only for critical services (20% of apps), keep ztunnel-only for the rest (80% of apps).
The Complete Observability Stack (All Pods in Your Cluster)
One major clarification: All Istio components run as pods in your GKE/EKS cluster. Nothing is external.
Deployment Overview
kubectl get pods --all-namespaces
NAMESPACE NAME TYPE
─────────────────────────────────────────────────────────────
istio-system ztunnel-abc123 DaemonSet
istio-system ztunnel-def456 DaemonSet
istio-system ztunnel-ghi789 DaemonSet
istio-system istiod-7d4b9c8f9-abc123 Deployment
google-boutique waypoint-frontend-abc123 Deployment
google-boutique waypoint-checkout-def456 Deployment
monitoring prometheus-0 StatefulSet
monitoring prometheus-1 StatefulSet
monitoring grafana-5b7d8c9f4-abc123 Deployment
observability jaeger-all-in-one-abc123 Deployment
istio-system kiali-abc123 Deployment
Resource Requirements
| Component | Deployment Type | Replicas | CPU | Memory | Purpose |
|---|---|---|---|---|---|
| ztunnel | DaemonSet | 3 (1 per node) | 0.3 vCPU | 0.4GB | L4 proxy, mTLS |
| istiod | Deployment | 1 | 0.5 vCPU | 2GB | Control plane |
| Waypoints | Deployment | 2-5 | 0.4 vCPU | 0.5GB | L7 features (selective) |
| Prometheus | StatefulSet | 2 (HA) | 1 vCPU | 2GB | Metrics storage |
| Grafana | Deployment | 1 | 0.1 vCPU | 0.25GB | Dashboards UI |
| Jaeger | Deployment | 1 | 0.5 vCPU | 1GB | Distributed tracing |
| Kiali | Deployment | 1 | 0.1 vCPU | 0.25GB | Service graph UI |
| Total | 10-15 pods | ~3 vCPU | ~6.4GB | Full stack |
Storage (GCP Persistent Disks):
- Prometheus: 2 × 50Gi = 100Gi ($4/month)
- Jaeger: 20Gi ($0.80/month)
- Total: $4.80/month
Total monthly cost:
- Compute: $165/month (3 nodes + mesh overhead)
- Storage: $5/month (persistent disks)
- Total: $170/month for 50 pods with full observability
Compare to AWS X-Ray: $1,400+/month for distributed tracing alone!
What You Get (and What You Give Up)
✅ With ztunnel Only (L4) - 80% Use Case
Benefits:
- mTLS encryption between all services (zero-trust)
- L4 connection metrics (TCP stats, bytes)
- Service authentication/authorization
- Kiali service graph
- Cost: +3% infrastructure
Limitations:
- No circuit breakers (need Waypoint)
- No HTTP-level metrics (only TCP)
- No distributed tracing
- No request retries/timeouts
Best for: Internal microservices, background workers, databases, caches
✅ With ztunnel + Selective Waypoints (L4 + L7) - 20% Use Case
Benefits (all of ztunnel, plus):
- Circuit breakers (prevent cascading failures)
- HTTP retries, timeouts, traffic splitting
- Request-level metrics (latency, status codes)
- Distributed tracing (Jaeger)
- Canary deployments, A/B testing
- Cost: +10-15% infrastructure
Best for: User-facing APIs, payment services, checkout flows, critical paths
⚠️ Trade-off: Granularity
Sidecar mode: Per-pod metrics
frontend-pod-1: 100 req/s, 50ms p95
frontend-pod-2: 120 req/s, 45ms p95
frontend-pod-3: 80 req/s, 60ms p95 ← Can identify slow pod
Ambient mode: Service-level metrics
frontend service: 300 req/s, 52ms p95 ← Aggregated
Cannot see individual pod performance
Solution:
- Add application-level Prometheus instrumentation (expose
/metricsendpoint) - Use Kubernetes native metrics (kubelet, cAdvisor) for pod-level CPU/memory
- Most teams don't need per-pod HTTP metrics—service-level is sufficient
When Service Mesh Makes Sense Now (2025)
✅ Ambient Mode Opens Doors For:
Mid-size teams (15-30 services)
- Cost overhead: 10-15% (vs 66% sidecar mode)
- Gradual adoption: Start with L4, add L7 selectively
- mTLS security without code changes
Cost-conscious organizations
- 3-5% overhead for zero-trust networking
- Self-hosted observability: $170/month (vs $1,400+ cloud tracing)
- Works perfectly with spot instances
Compliance requirements
- mTLS encryption by default (HIPAA, PCI-DSS, SOC2)
- Zero-trust networking (mutual authentication)
- No application code changes needed
Complex microservices (20+ services)
- Distributed tracing: $46/month Jaeger vs $1,400/month X-Ray
- Circuit breakers for failure isolation
- Real-time service dependency graphs
❌ Still Not Worth It For:
Small teams (<10 services)
- Operational overhead not justified
- Application-level instrumentation simpler (Prometheus client libraries)
Monoliths
- No inter-service communication complexity
- Traditional monitoring sufficient
Teams without Kubernetes expertise
- Requires K8s debugging skills
- Mesh troubleshooting adds complexity
Installation: Quick Start
Step 1: Install Istio with Ambient Mode
# Install istioctl
curl -L https://istio.io/downloadIstio | sh -
cd istio-*
export PATH=$PWD/bin:$PATH
# Install Ambient profile
istioctl install --set profile=ambient -y
# Verify installation
kubectl get pods -n istio-system
# Output:
# istiod-xxx 1/1 Running
# ztunnel-xxx 1/1 Running (DaemonSet, 1 per node)
Step 2: Enable Ambient for Your Namespace
# Enable Ambient mode (ztunnel L4 only)
kubectl label namespace google-boutique istio.io/dataplane-mode=ambient
# Deploy your application
kubectl apply -f your-app.yaml -n google-boutique
# All pods now have:
# ✅ mTLS encryption (via ztunnel)
# ✅ Zero-trust authentication
# ❌ No L7 features yet (no circuit breakers)
Step 3: Add Waypoint for Critical Services (L7)
# Deploy Waypoint for frontend service only
istioctl waypoint apply \
--service-account frontend \
--namespace google-boutique
# Verify Waypoint deployment
kubectl get pods -n google-boutique | grep waypoint
# waypoint-frontend-abc123 1/1 Running
# Now configure circuit breaker for frontend → payment calls
kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: payment-circuit-breaker
namespace: google-boutique
spec:
host: payment-service
trafficPolicy:
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
EOF
Step 4: Install Observability Stack
# Install Prometheus + Grafana
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set prometheus.prometheusSpec.retention=7d \
--set prometheus.prometheusSpec.resources.requests.cpu=500m \
--set prometheus.prometheusSpec.resources.requests.memory=2Gi
# Install Jaeger
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm install jaeger jaegertracing/jaeger \
--namespace observability --create-namespace \
--set allInOne.enabled=true \
--set allInOne.resources.requests.cpu=500m \
--set allInOne.resources.requests.memory=1Gi
# Install Kiali (included with Istio)
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.21/samples/addons/kiali.yaml
# Access UIs via port-forward
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
kubectl port-forward -n observability svc/jaeger-query 16686:16686
kubectl port-forward -n istio-system svc/kiali 20001:20001
# Open in browser:
# Grafana: http://localhost:3000 (user: admin, password: prom-operator)
# Jaeger: http://localhost:16686
# Kiali: http://localhost:20001
The Bottom Line: From Luxury to Practical
2017-2023: Service mesh was prohibitively expensive (30-90% cost increase). AWS gave up on App Mesh.
2024-2025: Istio Ambient mode makes service mesh affordable and practical:
- 3-15% overhead vs 66% sidecar mode
- Node-level proxies (DaemonSet) vs per-pod sidecars
- Graduated adoption: L4 first (cheap), L7 where needed (selective)
- Kubernetes-native: Only works in K8s (EKS, GKE)
Who should reconsider?
- Mid-size teams (15-30 services) previously priced out
- Cost-conscious orgs needing mTLS compliance
- Teams wanting circuit breakers without code changes
- Anyone paying $1,400+/month for AWS X-Ray
The shift: Service mesh is no longer a luxury for large enterprises—it's a viable option for mid-size teams building on Kubernetes.
If you ruled out service mesh before due to cost, revisit it now. The economics have fundamentally changed.
Appendix: Complete Deployment Manifests
A. ztunnel (DaemonSet) - Installed by Istio
# Managed by: istioctl install --set profile=ambient
# You don't create this manually, but here's what it looks like:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ztunnel
namespace: istio-system
spec:
selector:
matchLabels:
app: ztunnel
template:
metadata:
labels:
app: ztunnel
spec:
serviceAccountName: ztunnel
hostNetwork: true
containers:
- name: istio-proxy
image: gcr.io/istio-release/ztunnel:1.21.0
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
securityContext:
privileged: true # Needed for iptables manipulation
volumeMounts:
- name: cni-bin
mountPath: /host/opt/cni/bin
volumes:
- name: cni-bin
hostPath:
path: /opt/cni/bin
B. Waypoint Proxy (Deployment) - Created by istioctl
# Created by: istioctl waypoint apply --service-account frontend
# This is what gets deployed:
apiVersion: apps/v1
kind: Deployment
metadata:
name: waypoint-frontend
namespace: google-boutique
labels:
istio.io/gateway-name: waypoint-frontend
spec:
replicas: 1 # Scale up for HA
selector:
matchLabels:
istio.io/gateway-name: waypoint-frontend
template:
metadata:
labels:
istio.io/gateway-name: waypoint-frontend
spec:
serviceAccountName: frontend
containers:
- name: istio-proxy
image: gcr.io/istio-release/proxyv2:1.21.0
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
ports:
- containerPort: 15021 # Health check
- containerPort: 15090 # Metrics
---
apiVersion: v1
kind: Service
metadata:
name: waypoint-frontend
namespace: google-boutique
spec:
selector:
istio.io/gateway-name: waypoint-frontend
ports:
- port: 15021
name: status-port
C. Prometheus (StatefulSet)
# Helm values for prometheus-community/kube-prometheus-stack
# Save as: prometheus-values.yaml
# Install: helm install prometheus prometheus-community/kube-prometheus-stack -f prometheus-values.yaml
prometheus:
prometheusSpec:
replicas: 2 # High availability
retention: 7d # Keep metrics for 7 days
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
storageClassName: standard-rwo # GKE persistent disk
# Scrape Istio metrics
additionalScrapeConfigs:
- job_name: 'istio-mesh'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- istio-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: istio-telemetry;prometheus
grafana:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# Pre-load Istio dashboards
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'istio'
orgId: 1
folder: 'Istio'
type: file
disableDeletion: false
options:
path: /var/lib/grafana/dashboards/istio
dashboardsConfigMaps:
istio: "istio-grafana-dashboards"
alertmanager:
enabled: true
resources:
requests:
cpu: 100m
memory: 128Mi
D. Jaeger (All-in-One Deployment)
# For production, use separate collector/query/storage
# This is simplified all-in-one for learning
apiVersion: apps/v1
kind: Deployment
metadata:
name: jaeger-all-in-one
namespace: observability
spec:
replicas: 1
selector:
matchLabels:
app: jaeger
template:
metadata:
labels:
app: jaeger
spec:
containers:
- name: jaeger
image: jaegertracing/all-in-one:1.52
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
env:
- name: COLLECTOR_ZIPKIN_HOST_PORT
value: ":9411"
- name: SPAN_STORAGE_TYPE
value: badger
- name: BADGER_EPHEMERAL
value: "false"
- name: BADGER_DIRECTORY_VALUE
value: /badger/data
- name: BADGER_DIRECTORY_KEY
value: /badger/key
ports:
- containerPort: 5775 # UDP Zipkin
protocol: UDP
- containerPort: 6831 # UDP Jaeger
protocol: UDP
- containerPort: 6832 # UDP Jaeger
protocol: UDP
- containerPort: 5778 # HTTP config
- containerPort: 16686 # HTTP UI
- containerPort: 14268 # HTTP collector
- containerPort: 14250 # gRPC collector
- containerPort: 9411 # HTTP Zipkin
volumeMounts:
- name: jaeger-storage
mountPath: /badger
volumes:
- name: jaeger-storage
persistentVolumeClaim:
claimName: jaeger-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jaeger-pvc
namespace: observability
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: standard-rwo
---
apiVersion: v1
kind: Service
metadata:
name: jaeger-query
namespace: observability
spec:
selector:
app: jaeger
ports:
- name: query-http
port: 16686
targetPort: 16686
---
apiVersion: v1
kind: Service
metadata:
name: jaeger-collector
namespace: observability
spec:
selector:
app: jaeger
ports:
- name: grpc
port: 14250
targetPort: 14250
- name: http
port: 14268
targetPort: 14268
E. Kiali (Service Graph UI)
# Simplified Kiali deployment
# Full version: kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.21/samples/addons/kiali.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kiali
namespace: istio-system
spec:
replicas: 1
selector:
matchLabels:
app: kiali
template:
metadata:
labels:
app: kiali
spec:
serviceAccountName: kiali
containers:
- name: kiali
image: quay.io/kiali/kiali:v1.79
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
env:
- name: PROMETHEUS_URL
value: "http://prometheus-kube-prometheus-prometheus.monitoring:9090"
- name: GRAFANA_URL
value: "http://prometheus-grafana.monitoring:80"
- name: JAEGER_URL
value: "http://jaeger-query.observability:16686"
ports:
- containerPort: 20001
- containerPort: 9090 # Metrics
---
apiVersion: v1
kind: Service
metadata:
name: kiali
namespace: istio-system
spec:
selector:
app: kiali
ports:
- name: http
port: 20001
targetPort: 20001
- name: metrics
port: 9090
targetPort: 9090
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kiali
namespace: istio-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kiali
rules:
- apiGroups: [""]
resources:
- configmaps
- endpoints
- namespaces
- nodes
- pods
- services
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources:
- deployments
- replicasets
- statefulsets
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.istio.io"]
resources:
- "*"
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kiali
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kiali
subjects:
- kind: ServiceAccount
name: kiali
namespace: istio-system
F. Complete Installation Script
#!/bin/bash
# install-ambient-stack.sh - Complete Istio Ambient + Observability setup
set -e
echo "=== Installing Istio Ambient Mode ==="
curl -L https://istio.io/downloadIstio | sh -
cd istio-*
export PATH=$PWD/bin:$PATH
istioctl install --set profile=ambient -y
echo "=== Installing Prometheus + Grafana ==="
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set prometheus.prometheusSpec.retention=7d \
--set prometheus.prometheusSpec.replicas=2 \
--set prometheus.prometheusSpec.resources.requests.cpu=500m \
--set prometheus.prometheusSpec.resources.requests.memory=2Gi \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
--set grafana.resources.requests.cpu=100m \
--set grafana.resources.requests.memory=256Mi
echo "=== Installing Jaeger ==="
kubectl create namespace observability || true
kubectl apply -f jaeger-all-in-one.yaml
echo "=== Installing Kiali ==="
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.21/samples/addons/kiali.yaml
echo "=== Enabling Ambient for google-boutique namespace ==="
kubectl create namespace google-boutique || true
kubectl label namespace google-boutique istio.io/dataplane-mode=ambient
echo "=== Installation Complete! ==="
echo ""
echo "Access dashboards:"
echo " Grafana: kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80"
echo " Prometheus: kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090"
echo " Jaeger: kubectl port-forward -n observability svc/jaeger-query 16686:16686"
echo " Kiali: kubectl port-forward -n istio-system svc/kiali 20001:20001"
echo ""
echo "Deploy Waypoint for a service:"
echo " istioctl waypoint apply --service-account <sa-name> --namespace google-boutique"
Have you tried Ambient mode? What's your experience with the new architecture? Share your thoughts!
Top comments (0)