Darshil Mahraur

Posted on Mar 30

Saving $300/Month While Fixing WebRTC Drops: How Istio's Consistent Hashing Beat Redis

#istio #devops #redis #aws

I Spent hours Debugging Sessions Until I Discovered Istio Consistent Hashing

A developer's journey from late-night debugging to elegant infrastructure solutions

The Problem That Started a Journey

We had a scaling problem that wouldn't go away.

Our Jarvis service needed to grow. We were scaling from 3 pods to handle more AI inference requests and audio interactions. But there was a critical issue: each Jarvis pod maintained state—model weights loaded in memory, WebRTC connections via daily.co for live audio calls, cached inference results.

When requests bounced randomly between pods (thanks, round-robin load balancer), things broke:

Request 1 (user-123) → Jarvis-Pod-1
  ✓ Model weights loaded, WebRTC connection active

Request 2 (user-123) → Jarvis-Pod-2 (different pod!)
  ✗ "Connection lost" - WebRTC drops
  ✗ Model weights reloaded - 500ms latency spike
  ✗ User's audio call freezes mid-session

Request 3 (user-123) → Jarvis-Pod-1
  ✓ Works again, but connection restarted

Month 1: We implemented Redis

The natural solution was to implement session affinity with Redis. Smaran-v2 would query Redis: "which Jarvis pod for this user?" and route accordingly. It worked well. WebRTC connections stayed alive, latencies improved, users were happy.

But as we scaled, the economics changed:

Cost: $300/month for a Redis cluster
Latency overhead: Every request made a network hop to Redis (5-10ms per request)
Operational toil: TTL management, replica failovers, cache invalidation edge cases

Redis solved the problem, but the cost-benefit ratio wasn't sustainable. We were paying premium dollars for a solution that was becoming a bottleneck. There had to be a more elegant way.

After some R&D: We discovered Istio

Digging through Kubernetes networking docs, I found something elegant: Istio's consistent hashing. No external service. No Redis. Just hash the session ID, map it to a pod—all built into the load balancer.

"Wait, we can remove Redis?" I asked.

"Apparently. Let's build a POC."

What started as a "maybe this could work" conversation turned into a proof-of-concept that would reshape our infrastructure.

My first thought: This is going to be a long night.

The Problem I Didn't Know I Had

Here's the setup. We have two main services:

User Request
    │
    ▼
┌─────────────────────────┐
│   Smaran-v2             │ (coordinator)
│   Routing requests to   │
└──────────┬──────────────┘
           │
           ▼
   ┌───────┴────────┐
   │                │
 Pod-1          Pod-2          Pod-3
 Jarvis         Jarvis         Jarvis
 (w/ state)     (w/ state)     (w/ state)
 (WebRTC)       (WebRTC)       (WebRTC)

Jarvis is the workhorse—it runs AI inference, maintains WebRTC connections via daily.co, and caches model state. We needed to scale it to many pods, which introduced the session affinity problem.

The Jarvis pods weren't just serving requests—they were stateful. Each pod maintained:

Model weights in memory (500MB loaded per pod)
Per-user WebRTC connections via daily.co for audio calls

That last one was critical. When a user started a audio call with daily.co, the Jarvis pod opened a WebRTC connection to participate in that session. If the next request hit a different pod, that connection was completely lost. The WebRTC peer would see a disconnect. The user would see: "Connection lost."

So here's what was happening with Jarvis:

User starts audio call:
Request 1 → Jarvis-Pod-1
  ✓ Creates daily.co WebRTC connection
  ✓ Stream is live

User sends inference request:
Request 2 → Jarvis-Pod-2 (different pod!)
  ✗ Pod-2 has no WebRTC connection
  ✗ Connection drops
  ✗ audio feed freezes

User tries again:
Request 3 → Jarvis-Pod-1
  ✓ Pod-1 still has the connection
  ✓ audio works again (but connection restarted)

And that wasn't even counting the raw performance penalty when a request hit a different Jarvis pod just because Kubernetes's default load balancer said "hey, let's send this request to a different Jarvis pod today."

I definitely knew something was wrong now. That's not something you ignore.

The Debugging Session from Hell

I spent the next few days chasing ghosts:

Day 1: "Must be a database performance issue." Checked query logs. Queries were fine.
Day 2: "Models are degrading?" Profiled inference on a single pod. Blazingly fast.
Day 3: "Network latency?" Ran iperf between pods. Nothing unusual.
Day 4: I added logging to see which pod each request was hitting. That's when it clicked:

Request 1: user-12345 → Pod-1 (100ms) ✓
Request 2: user-12345 → Pod-2 (850ms) ✗ CACHE MISS
Request 3: user-12345 → Pod-3 (900ms) ✗ COLD START
Request 4: user-12345 → Pod-1 (100ms) ✓
Request 5: user-12345 → Pod-2 (850ms) ✗ CACHE MISS

The pattern was unmistakable. The same user's requests were bouncing around like a pinball machine.

First Attempt: The Redis Solution

Need to pin users to pods.

So we built a Redis solution. The idea was straightforward: maintain a mapping of user_id → pod_name (stateful) (where we know the pod name) in Redis.

It worked. Users stopped getting disconnected. WebRTC connections stayed alive. We felt clever.

Then we started scaling.

The Redis Problem:

Every request hit Redis - All traffic went through Redis first. With 10k req/sec, that's 10k Redis lookups per second.
Redis became a bottleneck - We had to add read replicas, tune connection pools, add caching layers.
Cost was insane - A large Redis cluster to handle our traffic? That's several hundred dollars per month.
Failover was sketchy - When Redis went down, pod assignment failed. Users got randomly routed. WebRTC connections dropped again.
TTL management was tricky - Set it too short and sessions expire mid-interaction. Set it too long and stale mappings pile up. We were constantly tuning this.
We could've coded it, but why? - Implementing session affinity logic directly in Smaran-v2 (consistent hashing code, pod registry, health checks, failover logic). But that would mean:
- Infrastructure logic bleeding into application code
- Every service that needs affinity has to reimplement this
- Scaling to new services means copy-paste the same logic again

We realized: "This isn't an application concern. This is infrastructure's job."

After two months of fighting Redis, our tech lead ARP said:

"Have you looked at service meshes? Specifically Istio?"

I hadn't. But ARP's mention was enough to get me digging.

That evening, I started exploring Istio's documentation. The concept of consistent hashing at the networking layer—no Redis, no external state, everything baked into the load balancer—kept gnawing at me.

I read through how DestinationRules work, how Envoy proxies handle traffic routing, and what consistent hashing actually means in practice. The deeper I dug, the more a question crystallized in my mind:

"Wait... so Istio could handle session affinity for us? Without any application code changes? Just configuration?"

Every resource I read pointed to the same answer: yes.

Exploring Istio for Our Use Case

I wasn't immediately convinced this would work for us. Our architecture was specific: Smaran-v2 needed to route requests to specific Jarvis pods based on user session. I had to figure out:

How does consistent hashing actually work in Istio? Dug into the DestinationRule spec and Envoy documentation.
How do we map our user session to a hashing key? We'd need to send a header (like x-call-session-id) that Istio could hash.
Will it actually solve the WebRTC affinity problem? Traced through the request flow to confirm pods would stay consistent.
What are the operational implications? Could we fail over gracefully? What happens when we scale pods?
How fast could we POC this? Estimated timeline for testing without impacting production.

The more I explored, the more elegant it seemed. Not a quick patch like Redis, but a real architectural improvement.

By the end of that week, I was convinced. Here's what made it click:

The Promise:

No external Redis cluster
Session affinity baked into the load balancer
Lower latency (no Redis hop)
Lower cost (no monthly bill)
Reduced operational complexity

The Technical Elegance:
Instead of "all requests from this IP go to Pod-1" (which breaks for our users behind a VPN), Istio offers: "all requests with x-call-session-id: user-12345 go to Pod-2."

And the brilliant part? The hashing is consistent. When we scale from 3 to 10 pods, most requests still map to the same pod. Only ~30% of traffic rebalances. Compare that to round-robin, where all traffic redistributes.

"Let's build a POC," I told the team. "If this works, we can retire Redis."

The Key Insight: DestinationRule

Here's the magic piece. In Istio, you define how traffic should be handled with a DestinationRule. Instead of maintaining Redis mappings, Istio computes the pod assignment on-the-fly using consistent hashing.

I created one for Jarvis:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: jarvis-consistent-hash
  namespace: dev
spec:
  host: jarvis-service.dev.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpHeaderName: "x-call-session-id"
    connectionPool:
      http:
        h2UpgradePolicy: UPGRADE

Let me break down what I was actually doing here:

`httpHeaderName: "x-call-session-id"`

This is the magic. Istio looks at the x-call-session-id header in each request, hashes it, and maps it to a pod. Same header value = same pod, always.

Unlike Redis, there's no external lookup. No network hop. No TTL management. The hash is deterministic and instant. If we have 3 pods and hash the value user-12345, it always maps to Pod-2. If we add a 4th pod tomorrow, most requests still map to the same pod (that's the magic of consistent hashing).

Compare to our Redis approach:

Aspect	Redis Solution	Istio Consistent Hash
Speed	~5-10ms lookup time	<1ms hash computation
Cost	$300+/month for Redis cluster	Built-in, $0
Failover	Redis goes down = broken	No external dependency
Scaling	All traffic rebalances	~1/N rebalance on N→N+1 pods
Operational burden	Tune TTL, manage replicas	Write YAML, forget about it

We literally saved money and got better performance by removing a dependency.

`h2UpgradePolicy: UPGRADE`

This one I learned the hard way. Without it, HTTP/1.1 connections weren't upgrading to HTTP/2. This meant connection pooling wasn't working as expected, and latencies were still higher than they should be. Added this line, and suddenly connection reuse kicked in.

This was especially important for our daily.co WebRTC connections—keeping TCP connections alive and reusing them meant the WebRTC establishment was faster.

The VirtualService: Routing Rules

But wait, where does the header actually come from? I had to ensure Smaran-v2 was sending that header to Jarvis.

I added a VirtualService to route traffic intelligently:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: jarvis-routing
  namespace: dev
spec:
  hosts:
    - jarvis-service.dev.svc.cluster.local
  http:
    # Route with header (gets consistent hashing)
    - match:
        - port: 80
          headers:
            x-call-session-id:
              regex: ".+"
      route:
        - destination:
            host: jarvis-service.dev.svc.cluster.local
            port:
              number: 80
      timeout: 300s
    # Fallback without header (round-robin)
    - match:
        - port: 80
      route:
        - destination:
            host: jarvis-service.dev.svc.cluster.local
            port:
              number: 80
      timeout: 300s

The first block handles requests with the header (these get consistent hashing). The second block is a fallback for requests without the header (they just get round-robin load balancing).

I added that fallback after almost shipping without it. Then I thought: "What if Smaran-v2 forgets the header? What happens?" With the fallback, they still work—they're just not affinity-pinned. That felt reasonable.

mTLS: The Security Layer I Didn't Expect to Need

While setting all this up, our security team (finally) looked at the architecture. They asked a simple question: "Are services talking over plain HTTP?"

Uh. Yes.

Their response was essentially: "Fix that."

Istio made this stupidly easy. One more YAML file:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
spec:
  mtls:
    mode: STRICT

STRICT mode means: if you try to talk to our services over plaintext, it gets rejected. Mutual TLS is enforced everywhere.

I was nervous about this breaking things. Turned out Istio handles the certificate generation and rotation automatically. Zero intervention from us. By the next morning, all inter-service traffic was encrypted.

How It Works: The Request Flow (Redis vs. Istio)

Now here's where Istio became elegant. Let me trace a real request through our system, comparing the two approaches:

OLD REDIS APPROACH (Worked well, but had trade-offs):
1. Client sends request to Smaran-v2 with user-12345 and audio call
2. Smaran-v2 queries Redis: "which Jarvis pod for this user?"
3. Redis lookup: 5-10ms latency
4. Returns: "jarvis-pod-1"
5. Smaran-v2 routes to jarvis-pod-1 directly
6. jarvis-pod-1 has the WebRTC connection and session state
✓ Works reliably, but adds latency and operational overhead

NEW ISTIO APPROACH (Current, way better):
1. Client makes request with session header:
   GET /inference
   Headers: {
     "x-call-session-id": "user-12345-session-audio"
   }

2. Smaran-v2 receives it
   ✓ Extracts x-call-session-id
   ✓ Forwards to Jarvis service with header intact
   (No Redis query needed!)

3. Istio's Envoy Proxy (sidecar in Smaran-v2 or at service boundary) intercepts:
   ✓ Reads x-call-session-id: "user-12345-session-audio"
   ✓ Hashes the value: hash("user-12345-session-audio") = 8421
   ✓ Maps to pod ring: 8421 % N_pods = Jarvis-Pod-1
   ✓ Routes request to jarvis-pod-1
   (All done in <1ms, no external calls!)

4. Jarvis Pod-1 processes request:
   ✓ Session state is hot in memory
   ✓ Model weights already loaded (500MB)
   ✓ WebRTC connection to daily.co is ALIVE
   ✓ Inference completes in ~95ms

5. User continues audio call, sends next inference request:
   Same x-call-session-id header → Jarvis-Pod-1 again
   ✓ WebRTC connection stays open
   ✓ Session context preserved
   ✓ No cold starts, no reconnections
   ✓ Smooth user experience

The difference? Istio eliminates the Redis hop entirely. Same consistent routing, but with:

No external service dependency
Sub-millisecond routing decisions
Zero additional cost
Cleaner architecture

And the WebRTC connections? They stay alive now. The latency improvement alone was worth the migration.

Three Things I Got Wrong (and Learned)

1. Thinking about hashing too much

I spent hours worrying about hash collisions, ring rebalancing, edge cases. In practice? Istio just works. The consistent hash algorithm is proven, and Envoy has optimized it to death. I was overthinking.

2. Not understanding the fallback behavior

I added the VirtualService fallback thinking "this is fine, they'll just get round-robin." Turns out, I should have been more explicit about requiring the header.

In production, we now validate that x-call-session-id is present in Smaran-v2 before it forwards anything to Jarvis. If it's missing, we reject it with a clear error. That's much better than silently degrading to round-robin load balancing and potentially breaking WebRTC connections.

3. Forgetting that Istio adds latency (sometimes)

The Envoy proxy running in each pod adds... actually, negligible latency. Maybe 1-2ms per request for header parsing and routing decisions. But I was paranoid about it anyway. Added observability metrics to track proxy latency specifically. Turned out I was worry for nothing.

When We Scale to 10 Jarvis Pods

This is where the real test will come.

When we scale from 3 to 10 Jarvis pods, I won't expect chaos—but I'll be watching closely. Here's what should happen:

Traffic will redistribute smoothly
No request storms
Existing WebRTC connections will stay pinned to their original pods
New requests will get distributed to the new pods based on their session IDs

It should be... boring. Anticlimactically boring. In the best way.

Pod utilization will spread evenly. Latencies will stay low. The on-call engineer (me) won't get paged.

Redis vs. Istio: The Comparison

After six months running both in different parts of our infrastructure, the choice became unmistakable. Here's how they compare side-by-side:

Aspect	Redis Solution	Istio Consistent Hashing
Architecture	External state store (Redis cluster)	Built into load balancer (no external service)
Cost	$300+/month	$0 (included with Istio)
Latency per Request	5-10ms (Redis lookup)	<1ms (hash computation)
Setup Complexity	Moderate (Redis deployment, tuning)	Low (2 YAML files)
Operational Burden	High (failover, TTL tuning, replicas)	Minimal (Istio manages it)
Network Calls per Request	Every request → Redis	Zero (local routing)
Failure Mode	Redis down → random routing → WebRTC drops	Graceful, no single point of failure
WebRTC Reliability	15-20 disconnects/hour	<1 disconnect/month
Scaling 3→10 pods	100% traffic rebalances (TTL expires)	~30% traffic rebalances (hash ring)
Developer Experience	"Query Redis, route" (app code)	"Send x-call-session-id header" (config)
Testing	Requires Redis mock + K8s API mock	No mocks needed
Scalability Limit	Redis throughput limits growth	Scales linearly with pods

The Verdict: Redis solved the immediate problem. Istio solved it better. Why maintain a separate service for something the routing layer can handle elegantly?

We haven't touched Redis since the migration. Consistent hashing is now foundational to our system—it's in the runbooks, baked into deployments, and new team members learn about it early. WebRTC stability through daily.co is rock solid.

The lesson: you solve a problem one way, it works, you move forward. Then someone shows you a smarter approach, and you realize the trade-offs weren't necessary. That's how infrastructure evolves.

If You're In My Shoes

If you're running stateful services in Kubernetes with WebRTC or persistent connections, or if you're seeing weird latency patterns that don't make sense, or if you're scaling and watching your p99 latencies explode—look into Istio's consistent hashing.

It might save you a 3 AM debugging session (and a lot of frustrated users with dropped audio calls).

The Configs (Copy-Paste Ready)

Save these three files and apply them:

destination-rule.yaml:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: jarvis-consistent-hash
  namespace: dev
spec:
  host: jarvis-service.dev.svc.cluster.local
  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpHeaderName: "x-call-session-id"
    connectionPool:
      http:
        h2UpgradePolicy: UPGRADE

virtual-service.yaml:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: jarvis-routing
  namespace: dev
spec:
  hosts:
    - jarvis-service.dev.svc.cluster.local
  http:
    - match:
        - port: 80
          headers:
            x-call-session-id:
              regex: ".+"
      route:
        - destination:
            host: jarvis-service.dev.svc.cluster.local
            port:
              number: 80
      timeout: 300s
    - match:
        - port: 80
      route:
        - destination:
            host: jarvis-service.dev.svc.cluster.local
            port:
              number: 80
      timeout: 300s

peer-authentication.yaml:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
spec:
  mtls:
    mode: STRICT

Deploy:

# Install Istio (if needed)
istioctl install --set profile=demo -y

# Label namespace for auto-injection
kubectl label namespace dev istio-injection=enabled

# Apply configs
kubectl apply -f destination-rule.yaml virtual-service.yaml peer-authentication.yaml

# Restart deployments to pick up new routing
kubectl rollout restart deployment/smaran-v2 -n dev
kubectl rollout restart deployment/jarvis -n dev

Test it:

# From Smaran-v2 or a test pod, send multiple requests with the same header
kubectl run -it debug --image=curlimages/curl --rm --restart=Never -- sh

# Then:
for i in {1..5}; do
  curl -H "x-call-session-id: test-user-123" \
       http://jarvis-service:80/inference
  sleep 1
done

# Watch the Jarvis pod logs and you'll see each request hitting the same pod
# Each request should be fast (no cold starts)

That's the story. From 3 AM panic to elegant infrastructure. From debugging in the dark to understanding why the system works.

It's a pretty good feeling.

DEV Community