DEV Community

Cover image for How session affinity increased response times by 240% at a fintech platform
binadit
binadit

Posted on • Originally published at binadit.com

How session affinity increased response times by 240% at a fintech platform

When sticky sessions killed our payment platform performance

Ever wonder how a "performance optimization" can make your system 240% slower? Let me tell you about a European fintech platform that learned this lesson the hard way.

The problem: uneven load distribution

This payment processor handled 50,000+ daily transactions across 12 EU markets. Their setup looked reasonable: 6 application servers behind a load balancer with session affinity enabled. The theory was sound - keep users on the same server for better performance.

Reality hit during peak hours (8-10 AM). While some users breezed through transactions, others waited forever. The culprit? Their "optimization" was creating bottlenecks.

What the data revealed

When we audited their infrastructure, the numbers were shocking:

  • Server utilization: 23% to 94% across the cluster
  • Traffic distribution: 3 servers handling 67% of all requests
  • Memory usage: 3.2GB on hot servers vs 1.1GB on idle ones
  • Response times: P99 times exceeded 8 seconds

The root cause was IP hash-based routing combined with customers from shared corporate networks. Session data lived in server memory, creating hot spots that couldn't be redistributed.

The solution: go stateless

Instead of fixing sticky sessions, we eliminated them entirely. Here's how:

1. External session storage with Redis

redis-server --port 7000 --cluster-enabled yes \
  --cluster-config-file nodes-7000.conf \
  --appendonly yes
Enter fullscreen mode Exit fullscreen mode

Session structure optimized for speed:

{
  "user_id": 12345,
  "auth_token": "...",
  "last_activity": 1640995200,
  "fraud_score": 0.23,
  "recent_transactions": [...]
}
Enter fullscreen mode Exit fullscreen mode

2. True load balancing

Replaced IP hash with least connections in Nginx:

upstream payment_backend {
  least_conn;
  server app1.internal:8080 max_fails=3 fail_timeout=30s;
  server app2.internal:8080 max_fails=3 fail_timeout=30s;
  server app3.internal:8080 max_fails=3 fail_timeout=30s;
  # ... remaining servers
}
Enter fullscreen mode Exit fullscreen mode

3. Stateless application design

Minimized session dependencies by caching user preferences in Redis with 1-hour TTL instead of keeping them in server memory for entire sessions.

The results

Performance improvements were immediate:

  • P50 response times: 420ms → 280ms (33% faster)
  • P95 response times: 3.4s → 1.0s (71% faster)
  • P99 response times: 8s+ → 1.8s (78% faster)
  • Server utilization: Now balanced at 45-52% across all servers
  • Customer complaints: Down 89%

Key takeaways for your architecture

  1. Session affinity hides problems until they become critical
  2. External session storage is worth the added complexity
  3. Monitor per-server metrics, not just averages
  4. Gradual migration reduces risk (we switched everything at once)

The platform now saves €240/month while handling traffic spikes smoothly. Sometimes the best optimization is removing the previous "optimization."

Originally published on binadit.com

Top comments (0)