The Mystery Error
You're running a Node.js Lambda with MySQL2, everything works great in testing, but production logs show intermittent failures:
Error: Connection lost: The server closed the connection.code: "PROTOCOL_CONNECTION_LOST"
fatal: true
No pattern. No warning. Just random failures that make you question your life choices.
Understanding the Real Problem : How Lambda Actually Works
Lambda doesn't spin up a fresh container for every request. AWS keeps containers "warm" for reuse:
Request 1 → Lambda runs → Response
↓
[FREEZE] ← Container paused (not terminated)
↓
(5-15 min pass)
↓
[THAW] ← Container resumed
↓
Request 2 → Lambda runs → 💥 PROTOCOL_CONNECTION_LOST
During freeze, your Lambda is literally paused. The JavaScript event loop stops. Timers stop. Everything stops.
But here's the catch: the outside world doesn't stop.
What Happens to Your Database Connection
- Lambda creates a MySQL connection pool
- Connections sit idle in the pool
- Lambda freezes (container paused)
- Real world time passes (5-30 minutes)
- Network timeouts occur, NAT gateways clear state, RDS Proxy cleans up
- The TCP socket dies, but your pool doesn't know
- Lambda thaws, tries to use the dead connection
- 💥 PROTOCOL_CONNECTION_LOST
Why idleTimeout Doesn't Help
You might think: "I'll set idleTimeout: 60000 to clean up idle connections!"
Here's why it doesn't work:
Timer starts (60s countdown)
↓
Lambda FREEZES at 1s elapsed
↓
████████████████████████████████
█ 15 minutes pass in REAL WORLD █
█ Timer is PAUSED at 1s █
████████████████████████████████
↓
Lambda THAWS - timer resumes at 1s
↓
Connection still in pool (timer thinks 2s passed)
↓
Connection is DEAD but pool doesn't know
The timer doesn't run during freeze. Your 60-second timeout is useless against a 15-minute freeze.
The Solution: Detect and Retry
Since we can't prevent stale connections, we detect them and retry transparently.
Step 1: Enable TCP Keep-Alive
const pool = mysql.createPool({
...config,
enableKeepAlive: true,
keepAliveInitialDelay: 10000,
});
This helps get clear error codes (ECONNRESET, PROTOCOL_CONNECTION_LOST) instead of hanging indefinitely.
Step 2: Implement Retry Logic
async executeQuery(sql, params) {
const maxRetries = 1;
let lastError = null;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
let connection = null;
try {
connection = await this.getConnectionFromPool();
const result = await this.executeQueryWithConnection(connection, sql, params);
connection.release();
return result;
} catch (error) {
// Non-recoverable error - throw immediately
if (!this.isConnectionLostError(error)) {
if (connection) connection.release();
throw error;
}
// Connection lost - destroy stale connection
if (connection) connection.destroy();
// Retry if attempts left
lastError = error;
if (attempt < maxRetries) {
console.warn("Connection lost, retrying...", { attempt: attempt + 1 });
continue;
}
}
}
throw lastError;
}
isConnectionLostError(error) {
const recoverableCodes = [
"PROTOCOL_CONNECTION_LOST", // Server closed connection
"ECONNRESET", // TCP reset
"EPIPE", // Broken pipe
"ETIMEDOUT", // Connection timeout
"ECONNREFUSED", // Connection refused
];
return recoverableCodes.includes(error?.code);
}
Key Points:
connection.destroy() - Removes stale connection from pool (don't reuse it!)
connection.release() - Returns healthy connection to pool
One retry is usually enough - The second attempt gets a fresh connection
What About Pool Settings?
Do I Need to Tune connectionLimit, maxIdle, etc.?
Short answer: Not really.
| Setting | Helps with freeze/thaw? | Why? |
|---|---|---|
idleTimeout |
No | Timer paused during freeze |
maxIdle |
Marginally | Fewer connections = fewer stale ones, but adds reconnection overhead |
connectionLimit |
No | Doesn't affect stale connections |
| Retry logic | Yes | Handles stale connections at runtime |
If you're using RDS Proxy, it handles connection pooling at the infrastructure level. Keep your Lambda pool settings simple and let the retry logic do the heavy lifting.
Using RDS Proxy?
RDS Proxy's "Idle client connection timeout" (default: 30 minutes) is separate from MySQL's wait_timeout. The proxy manages Lambda→Proxy connections independently.
But even with RDS Proxy, connections can die due to:
- NAT gateway timeouts (typically 5-15 minutes for idle TCP)
- Network state table cleanup
- Proxy internal connection recycling
The retry logic is still your safety net.
The Final Architecture
┌─────────────────────────────────────────────────────┐
│ Your Lambda │
├─────────────────────────────────────────────────────┤
│ │
│ executeQuery() │
│ ↓ │
│ for (attempt = 0; attempt <= 1; attempt++) │
│ ↓ │
│ getConnection() → Try query │
│ ↓ │
│ Success? → return result │
│ ↓ │
│ Connection lost? → destroy() → retry │
│ ↓ │
│ Other error? → release() → throw │
│ │
└─────────────────────────────────────────────────────┘
↓
[RDS Proxy] (optional)
↓
[MySQL/Aurora]
Summary
| Problem | Solution |
|---|---|
| Lambda freezes, connections go stale | Retry logic detects and recovers |
| Pool doesn't know connections are dead |
enableKeepAlive for faster detection |
idleTimeout doesn't work during freeze |
Accept it, rely on retry instead |
Random PROTOCOL_CONNECTION_LOST
|
Transparent retry = users don't notice |
The key insight: You can't prevent stale connections in a serverless environment. But you can detect them instantly and retry transparently.
Top comments (0)