When APIs break: load testing reveals the truth about infrastructure limits
Here's a reality check: most teams discover their API's breaking point when users are already hitting errors, not during careful testing. By then, you're fighting fires instead of preventing them.
We decided to get real data. How much concurrent load can different infrastructure setups actually handle before things fall apart? The results surprised us.
The experiment: same API, different infrastructure
We built a straightforward e-commerce API with three endpoints:
-
GET /products(product browsing) -
POST /auth/login(authentication) -
POST /orders(order placement)
Test stack:
Runtime: Node.js 18.17.0 + Express 4.18.2
Database: PostgreSQL 15.3 (2GB RAM)
Hardware: 4 cores, 8GB RAM, NVMe storage
Cache: Redis 7.0.11
Load testing: Artillery.io
We tested four infrastructure patterns:
- Single server: everything on one machine
- Database separation: dedicated DB server
- Load balanced: 2 app servers + shared database + Redis cluster
- Auto-scaling: 2-6 servers with horizontal scaling
The load profile ramped from 10 to 2,000 concurrent users over 40 minutes, mimicking real e-commerce traffic patterns.
Results: reliability doesn't decline gracefully
Here's what we found:
Single server configuration
| Concurrent Users | P50 Response | P95 Response | Error Rate |
|---|---|---|---|
| 100 | 178ms | 456ms | 1.2% |
| 250 | 456ms | 1,234ms | 8.7% |
| 500 | 1,234ms | 4,567ms | 23.4% |
Breaking point: 500 concurrent users
Load balanced configuration
| Concurrent Users | P50 Response | P95 Response | Error Rate |
|---|---|---|---|
| 500 | 234ms | 678ms | 1.1% |
| 1,000 | 456ms | 1,234ms | 5.7% |
| 1,500 | 890ms | 2,456ms | 15.3% |
Breaking point: 1,500 concurrent users
Auto-scaling configuration
| Concurrent Users | P50 Response | P95 Response | Error Rate | Active Servers |
|---|---|---|---|---|
| 1,000 | 189ms | 457ms | 0.8% | 4 |
| 1,500 | 234ms | 567ms | 2.1% | 5 |
| 2,000 | 289ms | 678ms | 3.9% | 6 |
Breaking point: handled 2,000+ users gracefully
The database bottleneck pattern
In every configuration, the database connection pool became the limiting factor. This happened before CPU hit 60% utilization.
Why? PostgreSQL's default connection settings don't optimize for high concurrency. Even with more application servers, they all compete for the same database connections.
// Typical connection pool config that fails under load
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // Default: too low for high concurrency
idleTimeoutMillis: 30000,
});
Key insights for production systems
Cliff-edge failures are real. Systems work fine until they completely don't. There's usually a narrow band between "acceptable performance" and "total failure."
Database scaling matters more than app scaling. Adding application servers won't help if they can't get database connections. Plan for connection pooling, read replicas, and database optimization from day one.
Infrastructure changes under load are dangerous. The performance gap between single server and distributed systems is massive. Plan your migration strategy during quiet periods, not during outages.
Testing limitations
Our setup was simplified compared to production systems:
- Used default PostgreSQL settings (real systems are usually optimized)
- Synthetic load patterns (real traffic is more unpredictable)
- Single region testing (global users add complexity)
- Basic CRUD operations (real apps have more complex logic)
Your mileage will vary, but the patterns remain consistent.
Bottom line
Know your infrastructure limits before your users do. Load testing during development is cheaper than debugging during peak traffic.
The numbers show that architectural decisions have massive performance implications. A single server might handle your current load fine, but what about Black Friday?
Plan your zero downtime migration strategy when you don't need it yet. Future you will thank present you.
Originally published on binadit.com
Top comments (0)