Welcome back to Podo Stack. This week we're looking at how Istio finally killed the sidecar tax, a tool that turns SLO monitoring into a one-liner, and a policy that'll save your platform team from label chaos.
Here's what's good this week.
This post was originally published on Podo Stack
🚀 Sandbox Watch: Istio Ambient Mesh
What it is
Service mesh without sidecars. Istio Ambient hit GA in version 1.24, and it's not just a minor tweak — it's a completely different architecture.
Here's the problem with sidecars. Every pod gets an Envoy proxy injected. Run 100 pods, you're running 100 Envoys. Each one eats 50-100MB of RAM. Each one adds startup latency — your app waits for the sidecar to be ready before it can receive traffic. Scale to thousands of pods and you're burning serious resources on proxy overhead.
Ambient flips this model.
How it works
Two layers instead of one:
ztunnel — A lightweight L4 proxy that runs as a DaemonSet, one per node. It handles mTLS, basic routing, and telemetry. Most traffic never needs more than this.
Waypoint proxy — An L7 proxy that only spins up when you need HTTP-level features like header routing, retries, or traffic mirroring. It's on-demand. Don't need L7? Don't pay for it.
Think of it as "service mesh à la carte." You get the security baseline everywhere (ztunnel), and you add the fancy features only where they matter.
Why I like it
- Memory overhead drops from ~100MB per pod to ~20MB per node
- No more sidecar injection drama — pods start faster
- Incremental migration: some namespaces on sidecar, some on ambient, same control plane
- mTLS everywhere by default, no config needed
When to use it
You're running a large cluster. You're tired of the sidecar tax. You want mesh security without mesh complexity. You're okay with a newer (but now GA) approach.
Links
⚔️ The Showdown: Ambient vs Sidecar
When should you stick with sidecars? When should you go ambient?
Sidecar mode:
- Memory: ~50-100MB per pod
- L7 features always available
- Sidecar must init before your app starts
- All-or-nothing migration per namespace
- 5+ years in production, familiar debugging
Ambient mode:
- Memory: ~20MB per node (not per pod!)
- L7 features on-demand via waypoint
- No injection delay — pods start faster
- Gradual migration, per-workload
- GA since late 2024, newer tooling
The verdict:
Choose sidecar when you need fine-grained L7 control on every pod, you're already running it successfully, or your team knows the debugging patterns cold.
Choose ambient when memory is tight, you want mesh security without the overhead, you're starting fresh, or you want to migrate gradually without downtime.
Honestly? For new deployments in 2025+, ambient is the default choice. The sidecar tax was always the biggest complaint about service mesh — and now it's optional.
💎 The Hidden Gem: sloth
What it is
SLO monitoring without the PromQL PhD.
You want error budgets. You want burn rate alerts. You want dashboards that show if you're meeting your 99.9% availability target. The standard approach: spend three days writing Prometheus recording rules, debug the math, hope you got the multi-window burn rate calculation right.
sloth: write a YAML file, run one command, get everything.
How it works
service: payment-api
slos:
- name: requests-availability
objective: 99.9
sli:
events:
error_query: sum(rate(http_requests_total{status=~"5.."}[{{.window}}]))
total_query: sum(rate(http_requests_total[{{.window}}]))
alerting:
page_alert:
labels:
severity: critical
ticket_alert:
labels:
severity: warning
Run sloth generate and you get:
- Prometheus recording rules
- Prometheus alert rules (multi-window burn rates)
- Grafana dashboard JSON
- Proper error budget calculation
The math is correct. The windows are correct. You focus on "what's my SLO" instead of "how do I calculate burn rates."
Why I like it
- One YAML → complete SLO monitoring stack
- Follows Google SRE book patterns exactly
- Works with any Prometheus setup
- The generated rules are readable — you can audit them
Links
👮 The Policy: Require Labels
Copy this, apply it, and watch your platform governance improve overnight.
Why this matters
Labels aren't documentation — they're contracts. Without enforced labels, you get:
- Cost allocation that's impossible ("which team owns this $50K/month workload?")
- Access control that's broken (RBAC by label doesn't work if labels are missing)
- Incident response that's slow ("who do I page for this failing deployment?")
The policy
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-labels
annotations:
policies.kyverno.io/title: Require Labels
policies.kyverno.io/category: Best Practices
policies.kyverno.io/severity: medium
spec:
validationFailureAction: Enforce
background: true
rules:
- name: require-team-label
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Labels 'team', 'cost-center', and 'environment' are required."
pattern:
metadata:
labels:
team: "?*"
cost-center: "?*"
environment: "?*"
How to roll it out
- Start with
validationFailureAction: Audit— see what would be blocked - Fix your existing deployments
- Switch to
Enforce - Watch your platform team breathe easier
Links
🛠️ The One-Liner: kubectl debug
kubectl debug -it my-pod --image=busybox --target=my-container
Your pod runs a distroless image. No shell. No curl. No nothing. How do you debug it?
This command injects an ephemeral container into the running pod. Same network namespace. Same filesystem view. Full debugging power.
Use cases
- Check DNS resolution from inside the pod
- Inspect files in a distroless image
- Run tcpdump without rebuilding
- Test connectivity to other services
Works on Kubernetes 1.25+. No pod restart required.
Pro tip
Add --share-processes to see the target container's process tree. Great for debugging stuck applications.
Links
🍇 Podo Stack — Ripe for Prod.

Top comments (0)