Connection Pool Exhaustion in Mobile Backends

#webdev #programming

---
title: "HikariCP Tuning That Survived Our 10x Traffic Spike"
published: true
description: "Fix HikariCP pool exhaustion, tune PgBouncer transaction mode, and apply the correct pool sizing formula for mobile backends under heavy PostgreSQL load."
tags: postgresql, kotlin, performance, architecture
canonical_url: https://blog.mvpfactory.co/hikaricp-tuning-that-survived-our-10x-traffic-spike
---

## What You Will Learn

By the end of this tutorial, you will know how to configure HikariCP so your mobile backend survives a sudden traffic spike without dying at the database connection layer. We will walk through the exact pool settings, PgBouncer transaction mode workarounds, and the sizing formula that took our KMP backend from crashing at 50K concurrent users to handling them cleanly. Real numbers included.

## Prerequisites

- A Kotlin (or JVM) backend using HikariCP for connection pooling
- PostgreSQL as your database
- Basic familiarity with JDBC connection strings
- Optional: PgBouncer in your deployment topology

## Step 1: Fix the maxLifetime Race Condition

Here is the gotcha that will save you hours. Most teams set HikariCP's `maxLifetime` to match PostgreSQL's `idle_in_transaction_session_timeout`. This creates a race — HikariCP hands out a connection milliseconds before PostgreSQL kills it server-side.

The fix: set `maxLifetime` at least 30 seconds shorter than any server-side timeout.

yaml

HikariCP config (CORRECT)

maximumPoolSize: 10
maxLifetime: 570000 # 9.5 min — PostgreSQL timeout is 10 min
connectionTimeout: 5000 # fail fast, don't let mobile clients stare at spinners
idleTimeout: 300000


That `connectionTimeout` at 5 seconds instead of the default 30 means your app fails fast and can retry or degrade gracefully. During long pool waits, I'd stand up from my desk — which, incidentally, is how I got into using [HealthyDesk](https://play.google.com/store/apps/details?id=com.healthydesk) for break reminders during those long debugging sessions.

## Step 2: Handle PgBouncer Transaction Mode

PgBouncer in transaction mode gives you roughly 4x connection throughput. But there is a catch the docs do not mention prominently: PostgreSQL prepared statements are session-scoped. In transaction mode, your next query can land on a different server connection where your prepared statement does not exist.

The result: `prepared statement "S_1" does not exist`.

Let me show you the pattern I use in every project that runs PgBouncer in transaction mode:

kotlin
HikariConfig().apply {
jdbcUrl = "jdbc:postgresql://pgbouncer:6432/mydb" +
"?prepareThreshold=0" +
"&preparedStatementCacheQueries=0"
maximumPoolSize = poolSize
maxLifetime = 570_000
connectionTimeout = 5_000
}


Disabling prepared statement caching at the JDBC driver level eliminates the phantom errors entirely.

## Step 3: Right-Size Your Pool

The well-known formula from the PostgreSQL wiki:

connections = (core_count * 2) + effective_spindle_count


Most teams read "core count" as their database server cores. Wrong. This is per app instance, and on SSDs the spindle count is zero:

connections = (4 cores * 2) + 0 = 8–10 per app instance


Here is the minimal setup to get this working across three app nodes:

| Config | Pool per Instance | Total DB Connections | p99 Latency | Errors at 50K Users |
|---|---|---|---|---|
| Before | 40 | 120 | 2,400 ms | 12% timeout |
| After | 10 | 30 | 85 ms | 0.01% |

Fewer connections, dramatically better performance. PostgreSQL thrashes with context switching beyond ~50 total connections.

## Step 4: Add Connection Leak Detection

Here is a Ktor interceptor that flags any connection held longer than 5 seconds:

kotlin
install(createApplicationPlugin("LeakDetector") {
onCallRespond { call, _ ->
val held = call.attributes.getOrNull(connAcquiredAt)
if (held != null && (System.nanoTime() - held) > 5_000_000_000L) {
logger.warn("Possible connection leak: ${call.request.uri}")
}
}
})


This caught two N+1 queries and a missing `connection.close()` in a rarely-hit endpoint — exactly the kind of bugs that sleep until traffic spikes wake them up.

## Gotchas

- **maxLifetime = server timeout** is a race condition, not a match. Always go 30+ seconds shorter.
- **PgBouncer transaction mode breaks prepared statements silently.** You will only see errors under load, which makes debugging painful. Disable caching upfront.
- **Pool sizing formula uses app instance cores, not database cores.** A 64-core DB server does not mean you need 128 connections. You probably need 8–10 per node.
- **connectionTimeout at 30 seconds is too generous.** Your mobile clients have already left. Use 5 seconds, fail fast, and handle retries at the application layer.
- **Connection leaks hide in low-traffic endpoints.** Add detection early, not after your first spike.

## Conclusion

The connection pool is the narrowest pipe in your mobile backend. We spent two weeks tuning application code that turned out not to matter. The pool config took an afternoon and fixed everything.

Start with `maxLifetime` below your server-side timeouts, disable prepared statement caching when using PgBouncer transaction mode, and size your pool per app instance — not per database. These three changes took our p99 from 2,400 ms to 85 ms under 10x load.

**Further reading:**
- [HikariCP Configuration Guide](https://github.com/brettwooldridge/HikariCP#frequently-used)
- [PostgreSQL Wiki: Number of Connections](https://wiki.postgresql.org/wiki/Number_Of_Database_Connections)
- [PgBouncer Feature Matrix](https://www.pgbouncer.org/features.html)

DEV Community

Connection Pool Exhaustion in Mobile Backends

HikariCP config (CORRECT)

Top comments (0)