binadit

Posted on May 22 • Originally published at binadit.com

Benchmarking API reliability under load: when zero downtime migration becomes critical

#benchmarking #apiperformance #loadtesting #zerodowntimemigration

When APIs break: load testing reveals the truth about infrastructure limits

Here's a reality check: most teams discover their API's breaking point when users are already hitting errors, not during careful testing. By then, you're fighting fires instead of preventing them.

We decided to get real data. How much concurrent load can different infrastructure setups actually handle before things fall apart? The results surprised us.

The experiment: same API, different infrastructure

We built a straightforward e-commerce API with three endpoints:

GET /products (product browsing)
POST /auth/login (authentication)
POST /orders (order placement)

Test stack:

Runtime: Node.js 18.17.0 + Express 4.18.2
Database: PostgreSQL 15.3 (2GB RAM)
Hardware: 4 cores, 8GB RAM, NVMe storage
Cache: Redis 7.0.11
Load testing: Artillery.io

We tested four infrastructure patterns:

Single server: everything on one machine
Database separation: dedicated DB server
Load balanced: 2 app servers + shared database + Redis cluster
Auto-scaling: 2-6 servers with horizontal scaling

The load profile ramped from 10 to 2,000 concurrent users over 40 minutes, mimicking real e-commerce traffic patterns.

Results: reliability doesn't decline gracefully

Here's what we found:

Single server configuration

Concurrent Users	P50 Response	P95 Response	Error Rate
100	178ms	456ms	1.2%
250	456ms	1,234ms	8.7%
500	1,234ms	4,567ms	23.4%

Breaking point: 500 concurrent users

Load balanced configuration

Concurrent Users	P50 Response	P95 Response	Error Rate
500	234ms	678ms	1.1%
1,000	456ms	1,234ms	5.7%
1,500	890ms	2,456ms	15.3%

Breaking point: 1,500 concurrent users

Auto-scaling configuration

Concurrent Users	P50 Response	P95 Response	Error Rate	Active Servers
1,000	189ms	457ms	0.8%	4
1,500	234ms	567ms	2.1%	5
2,000	289ms	678ms	3.9%	6

Breaking point: handled 2,000+ users gracefully

The database bottleneck pattern

In every configuration, the database connection pool became the limiting factor. This happened before CPU hit 60% utilization.

Why? PostgreSQL's default connection settings don't optimize for high concurrency. Even with more application servers, they all compete for the same database connections.

// Typical connection pool config that fails under load
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20, // Default: too low for high concurrency
  idleTimeoutMillis: 30000,
});

Key insights for production systems

Cliff-edge failures are real. Systems work fine until they completely don't. There's usually a narrow band between "acceptable performance" and "total failure."

Database scaling matters more than app scaling. Adding application servers won't help if they can't get database connections. Plan for connection pooling, read replicas, and database optimization from day one.

Infrastructure changes under load are dangerous. The performance gap between single server and distributed systems is massive. Plan your migration strategy during quiet periods, not during outages.

Testing limitations

Our setup was simplified compared to production systems:

Used default PostgreSQL settings (real systems are usually optimized)
Synthetic load patterns (real traffic is more unpredictable)
Single region testing (global users add complexity)
Basic CRUD operations (real apps have more complex logic)

Your mileage will vary, but the patterns remain consistent.

Bottom line

Know your infrastructure limits before your users do. Load testing during development is cheaper than debugging during peak traffic.

The numbers show that architectural decisions have massive performance implications. A single server might handle your current load fine, but what about Black Friday?

Plan your zero downtime migration strategy when you don't need it yet. Future you will thank present you.

Originally published on binadit.com

DEV Community