DEV Community

Ilia Gusev
Ilia Gusev

Posted on • Originally published at podostack.substack.com

Sidecar-Free Mesh, SLO from YAML, and Labels as Contracts

Welcome back to Podo Stack. This week we're looking at how Istio finally killed the sidecar tax, a tool that turns SLO monitoring into a one-liner, and a policy that'll save your platform team from label chaos.

Here's what's good this week.

This post was originally published on Podo Stack


🚀 Sandbox Watch: Istio Ambient Mesh

Istio Ambient Mesh flow

What it is

Service mesh without sidecars. Istio Ambient hit GA in version 1.24, and it's not just a minor tweak — it's a completely different architecture.

Here's the problem with sidecars. Every pod gets an Envoy proxy injected. Run 100 pods, you're running 100 Envoys. Each one eats 50-100MB of RAM. Each one adds startup latency — your app waits for the sidecar to be ready before it can receive traffic. Scale to thousands of pods and you're burning serious resources on proxy overhead.

Ambient flips this model.

How it works

Two layers instead of one:

ztunnel — A lightweight L4 proxy that runs as a DaemonSet, one per node. It handles mTLS, basic routing, and telemetry. Most traffic never needs more than this.

Waypoint proxy — An L7 proxy that only spins up when you need HTTP-level features like header routing, retries, or traffic mirroring. It's on-demand. Don't need L7? Don't pay for it.

Think of it as "service mesh à la carte." You get the security baseline everywhere (ztunnel), and you add the fancy features only where they matter.

Why I like it

  • Memory overhead drops from ~100MB per pod to ~20MB per node
  • No more sidecar injection drama — pods start faster
  • Incremental migration: some namespaces on sidecar, some on ambient, same control plane
  • mTLS everywhere by default, no config needed

When to use it

You're running a large cluster. You're tired of the sidecar tax. You want mesh security without mesh complexity. You're okay with a newer (but now GA) approach.

Links


⚔️ The Showdown: Ambient vs Sidecar

When should you stick with sidecars? When should you go ambient?

Sidecar mode:

  • Memory: ~50-100MB per pod
  • L7 features always available
  • Sidecar must init before your app starts
  • All-or-nothing migration per namespace
  • 5+ years in production, familiar debugging

Ambient mode:

  • Memory: ~20MB per node (not per pod!)
  • L7 features on-demand via waypoint
  • No injection delay — pods start faster
  • Gradual migration, per-workload
  • GA since late 2024, newer tooling

The verdict:

Choose sidecar when you need fine-grained L7 control on every pod, you're already running it successfully, or your team knows the debugging patterns cold.

Choose ambient when memory is tight, you want mesh security without the overhead, you're starting fresh, or you want to migrate gradually without downtime.

Honestly? For new deployments in 2025+, ambient is the default choice. The sidecar tax was always the biggest complaint about service mesh — and now it's optional.


💎 The Hidden Gem: sloth

What it is

SLO monitoring without the PromQL PhD.

You want error budgets. You want burn rate alerts. You want dashboards that show if you're meeting your 99.9% availability target. The standard approach: spend three days writing Prometheus recording rules, debug the math, hope you got the multi-window burn rate calculation right.

sloth: write a YAML file, run one command, get everything.

How it works

service: payment-api
slos:
  - name: requests-availability
    objective: 99.9
    sli:
      events:
        error_query: sum(rate(http_requests_total{status=~"5.."}[{{.window}}]))
        total_query: sum(rate(http_requests_total[{{.window}}]))
    alerting:
      page_alert:
        labels:
          severity: critical
      ticket_alert:
        labels:
          severity: warning
Enter fullscreen mode Exit fullscreen mode

Run sloth generate and you get:

  • Prometheus recording rules
  • Prometheus alert rules (multi-window burn rates)
  • Grafana dashboard JSON
  • Proper error budget calculation

The math is correct. The windows are correct. You focus on "what's my SLO" instead of "how do I calculate burn rates."

Why I like it

  • One YAML → complete SLO monitoring stack
  • Follows Google SRE book patterns exactly
  • Works with any Prometheus setup
  • The generated rules are readable — you can audit them

Links


👮 The Policy: Require Labels

Copy this, apply it, and watch your platform governance improve overnight.

Why this matters

Labels aren't documentation — they're contracts. Without enforced labels, you get:

  • Cost allocation that's impossible ("which team owns this $50K/month workload?")
  • Access control that's broken (RBAC by label doesn't work if labels are missing)
  • Incident response that's slow ("who do I page for this failing deployment?")

The policy

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-labels
  annotations:
    policies.kyverno.io/title: Require Labels
    policies.kyverno.io/category: Best Practices
    policies.kyverno.io/severity: medium
spec:
  validationFailureAction: Enforce
  background: true
  rules:
    - name: require-team-label
      match:
        any:
        - resources:
            kinds:
              - Pod
      validate:
        message: "Labels 'team', 'cost-center', and 'environment' are required."
        pattern:
          metadata:
            labels:
              team: "?*"
              cost-center: "?*"
              environment: "?*"
Enter fullscreen mode Exit fullscreen mode

How to roll it out

  1. Start with validationFailureAction: Audit — see what would be blocked
  2. Fix your existing deployments
  3. Switch to Enforce
  4. Watch your platform team breathe easier

Links


🛠️ The One-Liner: kubectl debug

kubectl debug -it my-pod --image=busybox --target=my-container
Enter fullscreen mode Exit fullscreen mode

Your pod runs a distroless image. No shell. No curl. No nothing. How do you debug it?

This command injects an ephemeral container into the running pod. Same network namespace. Same filesystem view. Full debugging power.

Use cases

  • Check DNS resolution from inside the pod
  • Inspect files in a distroless image
  • Run tcpdump without rebuilding
  • Test connectivity to other services

Works on Kubernetes 1.25+. No pod restart required.

Pro tip

Add --share-processes to see the target container's process tree. Great for debugging stuck applications.

Links


🍇 Podo Stack — Ripe for Prod.

Top comments (0)