binadit

Posted on May 14 • Originally published at binadit.com

How session affinity increased response times by 240% at a fintech platform

#sessionaffinity #loadbalancing #redis #performanceoptimization

When sticky sessions killed our payment platform performance

Ever wonder how a "performance optimization" can make your system 240% slower? Let me tell you about a European fintech platform that learned this lesson the hard way.

The problem: uneven load distribution

This payment processor handled 50,000+ daily transactions across 12 EU markets. Their setup looked reasonable: 6 application servers behind a load balancer with session affinity enabled. The theory was sound - keep users on the same server for better performance.

Reality hit during peak hours (8-10 AM). While some users breezed through transactions, others waited forever. The culprit? Their "optimization" was creating bottlenecks.

What the data revealed

When we audited their infrastructure, the numbers were shocking:

Server utilization: 23% to 94% across the cluster
Traffic distribution: 3 servers handling 67% of all requests
Memory usage: 3.2GB on hot servers vs 1.1GB on idle ones
Response times: P99 times exceeded 8 seconds

The root cause was IP hash-based routing combined with customers from shared corporate networks. Session data lived in server memory, creating hot spots that couldn't be redistributed.

The solution: go stateless

Instead of fixing sticky sessions, we eliminated them entirely. Here's how:

1. External session storage with Redis

redis-server --port 7000 --cluster-enabled yes \
  --cluster-config-file nodes-7000.conf \
  --appendonly yes

Session structure optimized for speed:

{
  "user_id": 12345,
  "auth_token": "...",
  "last_activity": 1640995200,
  "fraud_score": 0.23,
  "recent_transactions": [...]
}

2. True load balancing

Replaced IP hash with least connections in Nginx:

upstream payment_backend {
  least_conn;
  server app1.internal:8080 max_fails=3 fail_timeout=30s;
  server app2.internal:8080 max_fails=3 fail_timeout=30s;
  server app3.internal:8080 max_fails=3 fail_timeout=30s;
  # ... remaining servers
}

3. Stateless application design

Minimized session dependencies by caching user preferences in Redis with 1-hour TTL instead of keeping them in server memory for entire sessions.

The results

Performance improvements were immediate:

P50 response times: 420ms → 280ms (33% faster)
P95 response times: 3.4s → 1.0s (71% faster)
P99 response times: 8s+ → 1.8s (78% faster)
Server utilization: Now balanced at 45-52% across all servers
Customer complaints: Down 89%

Key takeaways for your architecture

Session affinity hides problems until they become critical
External session storage is worth the added complexity
Monitor per-server metrics, not just averages
Gradual migration reduces risk (we switched everything at once)

The platform now saves €240/month while handling traffic spikes smoothly. Sometimes the best optimization is removing the previous "optimization."

Originally published on binadit.com

DEV Community