Load balancers are the silent infrastructure. You don't think about them until they start dropping connections at 2 AM. Here are the settings that have bitten me in production.
Connection timeouts
Default connection idle timeout on AWS ALB is 60 seconds. Your app might have requests that legitimately take 90 seconds. Result: the ALB drops the connection mid-request, your user sees a 502, and your logs show nothing because the app was still processing.
Rule: set the LB idle timeout higher than your longest legitimate request. For most APIs, 120-300 seconds. Verify it matches your app's own timeout.
Health check intervals
Default health check interval is usually 30 seconds, with 2 failures before marking unhealthy. That's up to 60 seconds of traffic sent to a dying instance before it's removed.
Rule: 10-second intervals, 2 failures. 20 seconds to remove a bad instance is much better. Yes, slightly more load on your service. Worth it.
Unhealthy threshold
Marking an instance unhealthy after 2 consecutive failures is usually right. Marking it healthy again after 2 consecutive successes is usually wrong — it lets half-broken instances come back prematurely.
Rule: require 3-5 consecutive successes to mark healthy. Slower recovery, more stable routing.
Slow start / connection draining
When a new instance comes online, it's often cold — empty caches, no warmed connections. If the LB sends it full traffic immediately, it performs badly and might get marked unhealthy.
Rule: enable slow start (AWS calls it 'slow start mode' on ALB). Ramp traffic over 30-60 seconds. Worth it for any service that needs warming.
Sticky sessions
Sticky sessions feel like a solution. They're usually a problem.
Rule: avoid them. If you need them, your app has shared state that should be in a database or cache, not in-process. The one exception: WebSocket connections, where stickiness is unavoidable.
Cross-zone load balancing
Disabled by default on some LBs. With cross-zone disabled, an instance in AZ-A only serves traffic from AZ-A. If AZ-A has fewer instances, that AZ's traffic gets uneven distribution.
Rule: enable it unless you have a specific reason not to. Costs a small amount of cross-AZ traffic but gives you even load distribution.
The meta-lesson
Load balancer defaults are designed for 'will work for everybody, badly.' Any given workload needs tuning.
Spend one afternoon reviewing your LB configs. Ask: 'is this default right for my app?' At least half the defaults will be wrong. Tune them. Future you will thank you during the next incident.
Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com
Top comments (0)