Demystifying the AWS Advanced JDBC Driver: Pools, Plugins, and the Traps I Hit
Date: 2026-04-29
Status: Published
TL;DR
The AWS Advanced JDBC Driver wraps your database driver with a plugin chain that handles failover, read/write splitting, and connection monitoring. The critical gotcha: it can create internal connection pools separate from your application's HikariCP. If you're on v2.x with the F0 profile, you're hitting a hardcoded 30-connection ceiling regardless of your external pool config. The fix: upgrade to v3.x and use connectionPoolType=hikari with cp-MaximumPoolSize properties, or drop profiles entirely and configure plugins manually.
Key invariant: cp-MaximumPoolSize >= external maximumPoolSize to avoid the internal pool becoming your bottleneck.
Quick wins:
- Check your driver version: v3.3.0+ recommended
- If using F0 profile on v2.x, upgrade immediately
- Set
exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException - Keep
socketTimeout=0and letefm2handle liveness detection - Mark read-only transactions with
@Transactional(readOnly=true)to benefit from read/write splitting
Why I'm writing this
I spent a few hours chasing a performance regression that had no business existing. The service had a HikariCP pool configured for 50 connections per pod. I'd checked the Spring Boot YAML. The property names were right. The values were right. The configuration was loading at startup — I'd watched Hikari log it.
And yet, under load, the pool count plateaued at exactly 30. Not 50. Not 45. Thirty. Every time. Across every pod. Tomcat threads piled up behind a 10-second wait, connection creation time sat at 10,000 ms, and our p99 latency went vertical.
The answer, when I found it, was about two layers below where I'd been looking — inside a hardcoded lambda in a specific version of the AWS JDBC driver. I'd been tuning the wrong pool.
This post is what I wish I'd had at the start of that investigation. If you're running Spring Boot against Aurora PostgreSQL or MySQL through software.amazon.jdbc.Driver, there are a handful of things about how this driver actually works that aren't obvious from the README. Get them wrong and you get slow requests, or failed failovers, or both. Let me save you the trouble.
What the AWS Advanced JDBC Driver actually is
The docs call it a "wrapper," and that's literal — it's a thin java.sql.Driver that sits between your app and the underlying org.postgresql.Driver (or MySQL equivalent). Your URL ends up looking like this:
jdbc:aws-wrapper:postgresql://<endpoint>:5432/<db>?wrapperProfileName=F0
Everything after jdbc:aws-wrapper: is a conventional JDBC URL the wrapper passes down. What the wrapper adds is a plugin chain:
your application
-> HikariCP (external, app-managed)
-> aws-advanced-jdbc-wrapper
-> [plugin 1] -> [plugin 2] -> ... -> [terminal plugin]
-> org.postgresql.Driver
-> Aurora instance (writer or reader)
Each plugin intercepts JDBC calls — getConnection, prepareStatement, execute — and can rewrite, retry, monitor, or split them. The plugins are why you're using this driver in the first place. They're what give you fast failover, read/write splitting, and enhanced failure monitoring. Everything else about driver configuration exists to serve the plugin chain.
Configuration profiles: convenience with teeth
The driver ships with named configuration profiles — presets that bundle a plugin list and a set of timeouts. The best-known is F0, which you turn on with wrapperProfileName=F0. F0 bundles "fast failover" — the recommended plugin set for Aurora.
Profiles are handy because they let an app team ship one URL parameter instead of a dozen properties. They're also the single biggest source of "how is this even possible?" incidents I've seen, because a profile can silently set properties you can't override from outside.
The F0 gotcha: a few hours I won't get back
Before v3.1.0, the F0 profile eagerly constructed a second, internal HikariCP pool — separate from your application's — with properties baked into a lambda at profile-load time. I didn't find this in the docs. I found it by decompiling the JAR:
// From DriverConfigurationProfiles.class in aws-advanced-jdbc-wrapper-2.6.8.jar
// (I verified this via bytecode decompilation after running out of other theories)
config.setMaximumPoolSize(30); // HARD CEILING
config.setConnectionTimeout(SECONDS.toMillis(10)); // 10-second wait on exhaustion
There is no property you can set to override these. The external pool config is ignored by the internal pool. The cp- property prefix (I'll get to it below) doesn't exist in v2.6.8 at all — the string "cp-" literally doesn't appear anywhere in the JAR.
Here's what was actually happening in the service at runtime:
- App borrowed a logical connection from the external HikariCP (configured max = 50).
- External HikariCP asked the wrapper for a physical connection.
- The wrapper routed through its internal HikariCP (hardcoded max = 30).
- Under load, the internal pool saturated at 30. Attempts 31–50 waited up to 10 seconds and then failed.
- From my dashboards: external
hikaricp.connectionscapped at 30,connections.pendingclimbed to about 170, andconnections.creation.avgsat at 10,000 ms.
From the outside, this looks like a pool-sizing bug. I lost a few hours to it before the pieces clicked. The fix is a driver version bump.
v3.x: cp- properties and connectionPoolType=hikari
In v3.1.0 the driver added (PR #1658) a new URL parameter (documented under the read/write splitting plugin's internal connection pooling section):
?connectionPoolType=hikari
When that's set, the internal pool is built via HikariPooledConnectionProvider's no-arg constructor, which reads properties prefixed with cp- and forwards them to the internal Hikari config:
data-source-properties:
cp-MaximumPoolSize: "50"
cp-MinimumIdle: "5"
cp-ConnectionTimeout: "30000"
The catch I hit next: cp- properties are silently ignored when wrapperProfileName=F0 is also active. The F0 preset supplies its own HikariPoolConfigurator lambda that takes precedence and still hardcodes maxPoolSize=30. F0 and cp-MaximumPoolSize cannot coexist. Pick one.
For Aurora with read/write splitting and proper pool sizing on v3.x, I dropped the profile and assembled the plugin list by hand:
spring:
datasource:
url: jdbc:aws-wrapper:postgresql://${database_endpoint}:5432/${db}?connectionPoolType=hikari&readerHostSelectorStrategy=roundRobin
driver-class-name: software.amazon.jdbc.Driver
hikari:
connection-timeout: 60000
maximum-pool-size: 50
minimum-idle: 10
data-source-properties:
wrapperPlugins: readWriteSplitting,auroraConnectionTracker,failover,efm2
cp-MaximumPoolSize: "50"
cp-MinimumIdle: "5"
cp-ConnectionTimeout: "30000"
connectTimeout: "10000"
loginTimeout: "10000"
socketTimeout: "0"
failureDetectionTime: "60000"
failureDetectionCount: "5"
failureDetectionInterval: "15000"
monitoring-connectTimeout: "10000"
monitoring-socketTimeout: "5000"
monitoring-loginTimeout: "10000"
exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException
This replaces what F0 was giving me (the plugin set and timeouts) while keeping cp-* effective.
When to use presets vs manual configuration
This is a gap in the official docs — there's no guidance on when presets are the right choice vs when you should go manual. Having dug through the source code and the preset codes, here's how I think about it.
The preset families:
| Family | Pool type | Presets | What they're for |
|---|---|---|---|
| A / B / C | No pool | A0, A1, A2, B, C0, C1 | Failover + monitoring only. No internal connection pooling. You bring your own (external) pool or don't pool at all. |
| D / E / F | Internal pool | D0, D1, E, F0, F1 | Failover + monitoring + internal HikariCP pool (managed by the wrapper). F0 is the most commonly referenced. |
| G / H / I | External pool | G0, G1, H, I0, I1 | Designed for apps that manage their own pool externally. The wrapper does not create internal pools. |
| SF_ prefix | (matches base) | SF_D0, SF_D1, SF_E, SF_F0, SF_F1 | Spring Framework variants — same as their base preset but with readWriteSplitting disabled (Spring handles routing via separate DataSource beans). |
The number suffix indicates failure-detection sensitivity: 0 = normal, 1 = easy/less sensitive (or aggressive, depending on the family), 2 = aggressive.
The problem with pool presets (D/E/F families): every preset that creates an internal pool hardcodes the same HikariCP values in a lambda with no override mechanism:
| Property | Hardcoded value | Overridable via cp-*? |
|---|---|---|
maxPoolSize |
30 | No — preset lambda takes precedence |
connectionTimeout |
10 seconds | No |
minimumIdle |
2 | No |
idleTimeout |
15 minutes | No |
keepaliveTime |
3 minutes | No |
validationTimeout |
1 second | No |
maxLifetime |
1 day | No |
initializationFailTimeout |
-1 | No |
This applies to D0, D1, E, F0, F1 and their SF_ variants — all of them hardcode maxPoolSize=30. The cp-* properties (like cp-MaximumPoolSize) are silently ignored when any of these presets are active, because the preset's HikariPoolConfigurator lambda overrides the HikariPooledConnectionProvider's property-reading path.
When to use a preset:
- You're prototyping, running a small service, or don't have specific pool-sizing requirements.
-
maxPoolSize=30andconnectionTimeout=10sare acceptable for your workload. - You want a known-good plugin + timeout combination without thinking about individual settings.
- You're using a no-pool preset (A/B/C family) and bringing your own external pool — these have no hardcoded pool values to collide with.
When to go manual (drop the preset):
- You need to control
maxPoolSize,connectionTimeout, or any other pool property — which is most production deployments. This is what I had to do. - You're running at non-trivial throughput where 30 connections per internal pool is a ceiling (this was my exact situation).
- You want
cp-*properties to actually take effect. - You're combining
readWriteSplittingwith@Transactional(readOnly=true)in Spring and need internal pools with custom sizing.
The manual approach means specifying connectionPoolType=hikari + wrapperPlugins=... + cp-* properties explicitly, instead of wrapperProfileName=F0. You lose the convenience of a single preset name, but you gain control over every property. For reference, the Configuration Presets docs list what each preset bundles, so you can replicate the plugin list and timeouts manually while overriding only the pool properties you need.
External pooling vs internal pooling — what each layer is actually doing
This is something most folks will need to pay attention to. These two layers are not redundant. They do different jobs.
External pool (my application's HikariCP, managed by Spring Boot)
-
Scope: one pool per Spring
DataSourcebean, typically one per pod. -
Holds: logical connections — the
java.sql.Connectionobjects my code calls.prepareStatementon. - Gates: how many threads can hold a connection concurrently. If this is 50, request #51 waits or times out.
- Maps to: how many Tomcat threads can simultaneously sit inside a DB-touching request.
Internal pool (managed by the wrapper, one per Aurora instance)
-
Scope: with
readWriteSplitting+connectionPoolType=hikari, one internal pool per Aurora instance — a writer pool, and one pool per reader. The wrapper routes logical connections to the right instance based on read-only hints (setReadOnly(true)or@Transactional(readOnly=true)in Spring). - Holds: physical connections — TCP/TLS sessions to a specific Aurora node.
- Gates: how many physical sockets stay open to each instance.
-
Maps to: Aurora's per-instance
max_connections. The default formula isLEAST({DBInstanceClassMemory/9531392}, 5000), so memory-rich instances likedb.r7i.4xlarge(128 GiB) hit the 5,000 hard cap rather than scale further.
Why both are needed — and the official caveat
The external pool's logical connections are cheap — Java objects wrapping references into the internal pool. The internal pool's physical connections are expensive — TLS handshake, auth, wire protocol. The wrapper hands out a single logical connection from the external pool while keeping the physical session pinned to the correct instance (writer for writes, reader-N for reads).
Without the internal pool layer, every getConnection() from the external pool would open a fresh physical connection to some instance. That undoes HikariCP's entire point.
Important caveat from the AWS docs: the ReadWriteSplitting plugin documentation explicitly states:
"Using internal and external pools at the same time has not been tested and may result in problematic behaviour."
The docs go further and recommend disabling external connection pools entirely when using internal pooling:
"If you want to use the driver's internal connection pooling, we recommend that you explicitly disable external connection pools (provided by Spring). You need to check the
spring.datasource.typeproperty to ensure that any external connection pooling is disabled."
Here's the thing that's easy to miss: if your Spring Boot app has spring.datasource.hikari.* properties and connectionPoolType=hikari in the JDBC URL, you're running double pools whether you intended to or not. connectionPoolType=hikari only controls the wrapper's internal pool — it doesn't replace or disable the external one. Spring Boot independently auto-detects HikariCP on the classpath and creates the external HikariDataSource bean. Unless you explicitly set spring.datasource.type=org.springframework.jdbc.datasource.SimpleDriverDataSource, both pools are active. This is almost certainly the configuration most Spring Boot teams end up with.
In practice, I've run both pools together under sustained load without issues — but that's my workload, not a guarantee. The double-pool architecture works when you treat the external pool as a concurrency gate and the internal pools as physical-session caches, and keep cp-MaximumPoolSize >= maximumPoolSize so the internal layer never becomes the bottleneck. But if you're hitting edge cases — connections leaking, intermittent stale-connection errors after failover, or pool metrics that don't add up — this officially-untested interaction is the first thing to suspect.
So how do you actually disable the external pool?
This is the part I want to make crystal clear, because it's easy to think you've solved double-pooling when you haven't.
Why you're probably running double pools right now: Spring Boot auto-detects HikariCP on your classpath (it's pulled in by spring-boot-starter-data-jpa or spring-boot-starter-jdbc) and creates a HikariDataSource bean automatically. Setting connectionPoolType=hikari in the wrapper URL does not turn this off — that only tells the wrapper to create its own internal pools. These are two independent systems that don't know about each other.
If your application.yaml looks like this, you have two pools:
# THIS IS DOUBLE-POOLING — both pools are active
spring:
datasource:
url: jdbc:aws-wrapper:postgresql://...?connectionPoolType=hikari&readerHostSelectorStrategy=roundRobin
driver-class-name: software.amazon.jdbc.Driver
hikari: # ← Spring Boot sees this and creates external HikariCP
maximum-pool-size: 50
minimum-idle: 10
data-source-properties:
cp-MaximumPoolSize: "50" # ← wrapper sees this and creates internal HikariCP
cp-MinimumIdle: "5"
To run single-pool (internal only), set spring.datasource.type to a non-pooling DataSource implementation. This tells Spring Boot to skip HikariCP auto-detection. The catch: without the hikari: section, there's no data-source-properties: block to put your cp-* and wrapper properties in. You have two options.
Option A — pass everything as URL parameters. Reliable but the URL gets long:
# SINGLE-POOL (internal only) — cp-* and plugin config in the URL
spring:
datasource:
type: org.springframework.jdbc.datasource.SimpleDriverDataSource # ← disables external HikariCP
url: >-
jdbc:aws-wrapper:postgresql://${database_endpoint}:5432/${database_name}
?connectionPoolType=hikari
&readerHostSelectorStrategy=roundRobin
&wrapperPlugins=readWriteSplitting,auroraConnectionTracker,failover,efm2
&cp-MaximumPoolSize=50
&cp-MinimumIdle=5
&cp-ConnectionTimeout=30000
&connectTimeout=10000
&loginTimeout=10000
&socketTimeout=0
&failureDetectionTime=60000
&failureDetectionCount=5
&failureDetectionInterval=15000
&monitoring-connectTimeout=10000
&monitoring-socketTimeout=5000
&monitoring-loginTimeout=10000
driver-class-name: software.amazon.jdbc.Driver
# No hikari: section — Spring won't create an external pool
Option B — use the wrapper's own DataSource class. The wrapper provides AwsWrapperDataSource which accepts properties directly, keeping the YAML clean:
# SINGLE-POOL (internal only) — using AwsWrapperDataSource
spring:
datasource:
type: software.amazon.jdbc.ds.AwsWrapperDataSource
url: jdbc:postgresql://${database_endpoint}:5432/${database_name} # ← note: no aws-wrapper: prefix
driver-class-name: org.postgresql.Driver # ← the underlying driver, not the wrapper
connection-properties:
wrapperPlugins: readWriteSplitting,auroraConnectionTracker,failover,efm2
connectionPoolType: hikari
readerHostSelectorStrategy: roundRobin
cp-MaximumPoolSize: "50"
cp-MinimumIdle: "5"
cp-ConnectionTimeout: "30000"
connectTimeout: "10000"
loginTimeout: "10000"
socketTimeout: "0"
failureDetectionTime: "60000"
failureDetectionCount: "5"
failureDetectionInterval: "15000"
monitoring-connectTimeout: "10000"
monitoring-socketTimeout: "5000"
monitoring-loginTimeout: "10000"
Note the differences with AwsWrapperDataSource: the URL drops the jdbc:aws-wrapper: prefix (it's a plain jdbc:postgresql: URL since the wrapper IS the DataSource), and driver-class-name points to the underlying driver, not the wrapper. See the DataSource configuration docs for details.
To run single-pool (external only), remove connectionPoolType=hikari from the URL. The wrapper won't create internal pools, and every getConnection() from the external HikariCP opens a physical connection through the wrapper on-demand:
# SINGLE-POOL — only the external HikariCP is active
spring:
datasource:
url: jdbc:aws-wrapper:postgresql://...?readerHostSelectorStrategy=roundRobin
driver-class-name: software.amazon.jdbc.Driver
hikari:
maximum-pool-size: 50
minimum-idle: 10
# No cp-* properties needed — no internal pool exists
Trade-offs at a glance
| Configuration | External pool | Internal pool | What you get | What you lose |
|---|---|---|---|---|
| Double pool (most Spring Boot apps) | Spring HikariCP (hikari: section) |
Wrapper HikariCP (connectionPoolType=hikari + cp-*) |
Full Spring metrics, health checks, familiar config surface. Physical connections cached per Aurora instance. | Running an officially-untested combination. Two pools to reason about. Higher DB connection count than expected. |
Internal only via SimpleDriverDataSource (spring.datasource.type=SimpleDriverDataSource) |
Disabled | Wrapper HikariCP | The configuration AWS actually tests against. Clean single-pool model. | No hikaricp.* Micrometer metrics from Spring. No HikariCP health indicator in /actuator/health. cp-* properties must go in the URL — gets unwieldy with many parameters. |
Internal only via AwsWrapperDataSource (spring.datasource.type=software.amazon.jdbc.ds.AwsWrapperDataSource) |
Disabled | Wrapper HikariCP | AWS-tested single-pool model. Clean YAML via connection-properties block — no URL stuffing. |
Same observability trade-offs as SimpleDriverDataSource (no Spring Hikari metrics/health). Different URL format (jdbc:postgresql: not jdbc:aws-wrapper:postgresql:) and driver-class-name points to the underlying driver. See DataSource docs. |
External only (no connectionPoolType in URL) |
Spring HikariCP | None | Familiar Spring config. Full metrics. | No per-instance physical connection caching. @Transactional(readOnly=true) with readWriteSplitting triggers a full connection switch per call (see Spring Boot limitation below). |
Where I am with this
I've been experimenting with the double-pool setup and so far it's been working without problems under sustained load across multiple pods. The external pool gives you the Micrometer metrics that make diagnosing issues possible — the hikaricp.connections.pending signal is how I caught the F0 ceiling issue — and the internal pool gives you efficient physical-connection reuse across reader/writer instances. The key invariant is cp-MaximumPoolSize >= maximumPoolSize so the internal layer never becomes the bottleneck.
The one tangible downside I've observed: you use more database connections than you'd expect. The external pool holds logical connections while the internal pools independently hold physical connections per Aurora instance. In practice the connection count on Aurora ends up higher than what the external pool size alone would suggest, because the internal pools maintain their own minimum-idle and maximum-size independently. For a fleet of pods, this adds up — make sure your Aurora instance's max_connections has headroom for pods × cp-MaximumPoolSize × (1 + number_of_readers), not just pods × maximumPoolSize.
If you're hitting unexplained edge cases — connections leaking, intermittent stale-connection errors after failover, or pool metrics that don't add up — the officially-untested double-pool interaction is the first thing to suspect. Switching to internal-only (spring.datasource.type=SimpleDriverDataSource or AwsWrapperDataSource) is the cleanest way to eliminate it as a variable.
It's also worth noting that you can use the wrapper without HikariCP entirely — the internal pool with connectionPoolType=hikari is a self-contained HikariCP instance managed by the wrapper. If you're building a non-Spring app or a lightweight service, running only the internal pool is the cleaner architecture and avoids the double-pool question altogether.
F0 vs SF_F0: should Spring Boot apps use readWriteSplitting?
This is one of the more confusing areas in the docs, and it matters because it determines your entire read/write routing architecture.
From the source code:
| Preset | Plugins | Internal pool |
|---|---|---|
| F0 |
auroraInitialConnectionStrategy, auroraConnectionTracker, readWriteSplitting, failover, efm2
|
Yes (maxPoolSize=30) |
| SF_F0 |
auroraInitialConnectionStrategy, auroraConnectionTracker, failover, efm2
|
Yes (maxPoolSize=30) |
The only difference: SF_F0 drops readWriteSplitting. Both have the same internal pool. The SF_ prefix stands for "Spring Framework" — these variants are meant for Spring apps.
Why does the Spring variant disable read/write splitting?
The Spring Boot limitations section of the ReadWriteSplitting plugin docs explains:
The use of read/write splitting with the annotation
@Transactional(readOnly=true)is **only* recommended for configurations using an internal connection pool.*
When Spring encounters @Transactional(readOnly=true), it calls conn.setReadOnly(true) before the method and conn.setReadOnly(false) after. The readWriteSplitting plugin responds by switching from writer→reader→writer on every annotated method call. Without an internal pool, each switch is a full TCP/TLS reconnect — the docs call this "substantial performance degradation." The SF_ presets sidestep this by disabling the plugin entirely and recommending two separate Spring DataSource beans instead (one for the writer cluster endpoint, one for the reader endpoint), letting Spring handle routing.
The contradiction: SF_F0 has internal pools — exactly the prerequisite the docs say makes readWriteSplitting safe. With internal pools, the setReadOnly toggle reuses cached physical connections from the per-instance pools (writer pool, reader pool), making the switch a cheap object swap rather than a TCP reconnect. So SF_F0 disables a plugin that should work fine with the internal pools it already provides.
My read: the SF_ presets were likely created before connectionPoolType=hikari made the internal-pool + readWriteSplitting combination clean and testable. The docs haven't fully reconciled this — they warn about the overhead, correctly note that internal pools mitigate it, but then the SF_ presets still disable it out of caution.
Three paths for Spring Boot read/write splitting:
| Approach | readWriteSplitting plugin | How reads route to readers | Trade-off |
|---|---|---|---|
| Plugin with internal pools (what we use) | Enabled |
@Transactional(readOnly=true) triggers setReadOnly(true) → plugin routes to reader via cached internal pool |
Single DataSource bean. Clean. Requires internal pools for acceptable switching overhead. |
| Two DataSource beans (what SF_ presets assume) | Disabled | Spring's AbstractRoutingDataSource or @Qualifier annotations route to a writer or reader DataSource at the service layer |
No plugin overhead. More application-level wiring. Each DataSource can independently use the wrapper for failover/monitoring. |
| Plugin without internal pools (don't do this) | Enabled |
setReadOnly triggers a full physical connection switch per call |
Substantial overhead. The docs explicitly warn against this. |
If you're already on manual config with connectionPoolType=hikari and cp-* properties (which you need anyway for pool sizing), enabling readWriteSplitting works — the internal pools handle the switching cost. If you prefer the two-DataSource approach, use a no-readWriteSplitting configuration (like SF_F0's plugin list, but with manual pool sizing since the preset hardcodes maxPoolSize=30).
Either way, don't mix the two: having readWriteSplitting enabled while also routing via separate DataSources would result in double routing logic that's hard to reason about.
HikariCP and virtual threads: a known compatibility issue
If you're running on JDK 21+ and considering Spring Boot's spring.threads.virtual.enabled=true, there is an open HikariCP bug (#2398) to be aware of. The issue is filed against HikariCP 7.0.2: the ConcurrentBag.requite() method uses a yield-spin loop (Thread.yield() 255 times for every parkNanos) that saturates all carrier threads under virtual-thread load. The result is CPU throttling at the pod level and potential liveness-probe failures — the exact kind of silent performance regression that's hard to diagnose without knowing about this issue.
As of this writing, the proposed fix in PR #2399 has not been merged. Spring Boot 3.5.7's BOM pins HikariCP 6.3.3 by default rather than 7.x, and the bug report doesn't reproduce against the 6.x line — so check your effective HikariCP version before assuming you're affected. The workaround if you are is to disable virtual threads (-Dspring.threads.virtual.enabled=false). If you're running the AWS JDBC wrapper with HikariCP as your external pool and enabling virtual threads on a 7.x version, this is the interaction to watch — it's not a wrapper bug, but it surfaces at the same layer (connection pool) and looks similar in dashboards to the internal-pool ceiling problem I described earlier.
Sizing rule
For P pods, external pool size E, and R readers in the Aurora cluster, the physical connection footprint is:
Writer instance: up to P * cp-MaximumPoolSize physical connections
Per reader: up to P * cp-MaximumPoolSize physical connections
Total: P * cp-MaximumPoolSize * (1 + R)
If cp-MaximumPoolSize is the bottleneck, logical getConnection() calls sit in the internal pool's wait queue — which is exactly the v2.6.8 failure mode, just on a newer version where you technically can fix it. The invariant to hold: cp-MaximumPoolSize >= external pool size so the internal layer never becomes the bottleneck. Going higher is fine as long as the total stays under Aurora's max_connections per instance with ~20% headroom.
Life of a single SELECT
When I was first onboarding someone to this, the thing that actually landed was walking through one request end-to-end:
- Tomcat thread calls
userRepository.findById(42). - Spring Data borrows a logical connection from external HikariCP (external pool count goes up by 1).
- Transaction manager begins a tx. Say it's
@Transactional(readOnly=true)— the read-only hint is set on the logical connection. - First real statement flows through the plugin chain.
readWriteSplittingsees the read-only flag, picks reader-1 (round-robin), and routes to reader-1's internal pool. - Reader-1's internal pool hands over a physical session; the wrapper binds it to the logical connection for the rest of the tx.
- Query executes on reader-1.
- Tx commits. Physical session returns to reader-1's internal pool; logical connection returns to external Hikari.
The plugin catalog, and when I use which
Plugins are a comma-separated list on wrapperPlugins. Order matters. The driver applies them outside-in.
What I always run for Aurora
-
failover— detects Aurora writer/reader failover events via topology awareness, invalidates broken connections, reroutes to the current writer. Without this, a writer failover leaves the driver holding a dead TCP session until OS-level timeouts fire (minutes). (There's also a newerfailover2plugin worth evaluating for new deployments.) -
auroraConnectionTracker— maintains the map of live connections per instance.failoverneeds it to know which connections to invalidate. -
efm2— Enhanced Failure Monitor v2. A background thread per connection probes the socket atfailureDetectionInterval; iffailureDetectionCountconsecutive probes fail withinfailureDetectionTime, the connection is marked bad andfailoverkicks in. v2 is current; v1 /efmis deprecated and should not be used in new configs.
What I enable conditionally
-
readWriteSplitting— routes read-only transactions to readers, writes to the writer. Enable when you have one or more readers and your code marks read transactions properly (@Transactional(readOnly=true)). Without the hint, the plugin sends everything to the writer and you get no benefit. I've seen more than one team enable it and then wonder why their readers sit idle. -
iamAuth— IAM-based auth instead of password. Enable if you're doing IAM to Aurora; otherwise skip. -
awsSecretsManager— pulls creds from Secrets Manager at connection time. Overlaps with external secret-rotation workflows; I enable only if I'm not rotating through Kubernetes secrets. -
federatedAuth/okta— SSO-style auth; niche in my experience. -
dev/logQueryPlansWhenNeeded— debugging only, never prod.
My default stack for Aurora PG + HikariCP
wrapperPlugins: readWriteSplitting,auroraConnectionTracker,failover,efm2
I put readWriteSplitting first so routing happens before failover/topology logic — that way failover can reroute a connection to the "current" writer regardless of who it was bound to. efm2 is last because it's terminal: it wraps the underlying connection with monitoring.
Aurora with multiple readers: the configuration I'm shipping
This is what I'm running now against a 1 writer + 2 reader Aurora cluster. It's not the only sensible config, but I've run it in anger through a few load tests and it's the one I trust.
url: jdbc:aws-wrapper:postgresql://${endpoint}:5432/${db}?connectionPoolType=hikari&readerHostSelectorStrategy=roundRobin
hikari:
connection-timeout: 60000
maximum-pool-size: 50
minimum-idle: 10
data-source-properties:
wrapperPlugins: readWriteSplitting,auroraConnectionTracker,failover,efm2
cp-MaximumPoolSize: "50"
cp-MinimumIdle: "5"
cp-ConnectionTimeout: "30000"
# I let efm2 handle liveness. TCP timeout is intentionally 0.
connectTimeout: "10000"
loginTimeout: "10000"
socketTimeout: "0"
# efm2 tuning — see "failover budget" below
failureDetectionTime: "60000" # grace period before monitoring starts
failureDetectionInterval: "15000" # 15s between probes
failureDetectionCount: "5" # 5 failed probes = dead
monitoring-connectTimeout: "10000"
monitoring-socketTimeout: "5000"
monitoring-loginTimeout: "10000"
exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException
Reader host selection
readerHostSelectorStrategy controls how readWriteSplitting picks a reader:
-
roundRobin— distributes reads evenly. My default. -
random— statistically even but variable in any given second. -
leastConnections— picks the reader with the fewest active physical connections. Worth it when readers have meaningfully different workloads, but adds a small lookup cost per acquisition. -
fastestResponse— picks the reader with the lowest observed response latency. Useful when readers have asymmetric hardware or load.
For a homogeneous reader fleet, roundRobin is the cleanest and cheapest. I've only ever needed leastConnections once, for an asymmetric deployment.
The exception-translation line I almost missed
exception-override-class-name: software.amazon.jdbc.util.HikariCPSQLException is easy to skip over (see the Spring Boot + HikariCP example where it's buried at the bottom of the YAML). Without it, HikariCP sees failover-triggered SQLExceptions as "normal" and tries to hand out connections the wrapper has already invalidated. Pool stays confused, latency stays bad, and the ordinary failover recovery path never fully completes. Not optional if you're on HikariCP + failover. Set it once and never think about it again.
Performance aspects
Where time actually goes
Under steady load, the wrapper's overhead breaks down into three categories:
- Plugin chain traversal — every JDBC call walks through the chain. For N plugins and M statements per transaction, you pay N×M method-dispatch overhead. On v3.x it's low single-digit microseconds — not zero, but invisible unless you're chasing the last 1% of p99. The rule I follow: don't enable plugins you aren't using.
- Physical connection creation — TLS handshake + auth + wire setup. One-time per internal pool slot; amortized, it's invisible unless the pool is cold or under-sized and the driver is creating sessions continuously.
-
Monitoring traffic —
efm2sends lightweight probes per connection. AtfailureDetectionInterval=15000the volume is tiny.
Metrics I always watch
| Metric | What it tells me |
|---|---|
hikaricp.connections (total) |
External pool size. Should grow to maximumPoolSize under load. If it plateaus below the configured max, I'm hitting the internal pool ceiling — that's exactly how I finally caught the v2.6.8 F0 issue. |
hikaricp.connections.active |
Currently in-use logical connections. Near the max = contention. |
hikaricp.connections.pending |
Threads waiting to borrow. Steady-state non-zero = bottleneck. I alert on this. |
hikaricp.connections.creation (ms) |
Time to acquire a physical connection through the wrapper. Single-digit ms is normal; 10,000 ms means an internal-pool wait timed out. This is the specific signal that said "the problem isn't the external pool." |
hikaricp.connections.timeout |
Borrow timeouts. Always zero when healthy. |
Aurora DatabaseConnections
|
Physical conns per instance. Should roughly equal sum over pods of (active internal-pool conns to this role). Cross-reference with cp-MaximumPoolSize. |
Aurora Deadlocks, CommitLatency
|
Independent of the driver but often regress together if pool sizing forces serialization at the app layer. |
My sizing calculator
For P pods, E external pool size, R_n reader count, target Aurora M max_connections per instance with 20% headroom:
cp-MaximumPoolSize = E # invariant; no internal-pool wait
Writer physical at peak = P * cp-MaximumPoolSize
Per-reader physical at peak = P * cp-MaximumPoolSize (round-robin balances across readers)
Sanity: P * cp-MaximumPoolSize <= 0.8 * M
Plug in your own numbers: P * cp-MaximumPoolSize per role. Check this against the max_connections for your Aurora instance class and leave ~20% headroom for maintenance connections and other clients.
Failover — what happens under the hood
Aurora failover — writer restart, reader promotion, or AZ failover — is the specific scenario the wrapper's plugins were built to survive. The first time I watched a failover in production with this stack, I actually wanted to know what was happening step by step. Here's what I worked out.
Sequence during a writer failover
- Writer instance goes unresponsive. TCP sockets from my pods to that writer stop returning packets.
-
efm2's monitor thread hitsfailureDetectionCountconsecutive probe failures withinfailureDetectionTime. The underlying connection is marked bad. - My app's next statement on that connection throws a
SQLExceptiontagged with a failover-relevant SQLState. -
failovercatches it, queries Aurora topology (via the RDS DNS or the cluster's topology endpoint), identifies the new writer, and reconnects transparently. - If configured (
failoverMode=reader-or-writer), the reconnect can fall back to a reader for the brief window where no writer is available. Default is writer. -
auroraConnectionTrackerwalks its table of open connections to the dead instance and invalidates them. - External HikariCP sees the invalidation through
HikariCPSQLException(this is the momentexception-override-class-namematters) and evicts the bad logical connections. - New logical connections open against fresh internal-pool slots bound to the new writer.
End-to-end with default timers: detection ~75 seconds (failureDetectionTime=60000 + up to failureDetectionCount=5 × failureDetectionInterval=15000), reconnect ~5-15 seconds (Aurora DNS propagation + fresh handshake). My app's p99 takes a visible bump during that window; business recovers within ~90 seconds.
Tuning the detection budget
-
Aggressive (~15-30 s to detect):
failureDetectionTime=15000,failureDetectionInterval=5000,failureDetectionCount=3. More probe traffic; more false positives on transient network blips. - Default (~75 s, what's in the YAML above): what I run by default. Good for most apps.
-
Lax (~3+ min): raise
failureDetectionTimepast 120000. Only use this if you have independent health-signal paths and don't want efm2 to chatter.
One thing I stopped doing: don't set socketTimeout small on the main connection (socketTimeout=5000 and friends) hoping to catch failures faster. That fires on every slow query — including legitimate long-running reports — and turns every transient spike into connection churn. Let efm2 own liveness detection. Keep socketTimeout=0. I learned this the hard way after a 12-minute query triggered a pool-wide connection churn event.
Resilience patterns worth knowing
failoverMode
Controls fallback when no writer is reachable:
-
strict-writer— only reconnect to a writer. Default when connecting via the cluster writer endpoint. During a prolonged failover, connections stall until a new writer is up. -
reader-or-writer— fall back to a reader for reads if no writer is available. Default when connecting via the read-only cluster endpoint. Useful for read-heavy apps that can tolerate writes being rejected; writes still fail until the writer is back. -
strict-reader— never connect to the writer. Dedicated read-replica deployments only.
My default is strict-writer (which matches the implicit default for cluster-writer-endpoint connections). I've only ever overridden it for a reporting workload where read availability mattered more than write availability.
Connection churn during failover (don't panic)
The immediate aftermath of a failover event looks rough on dashboards: connections.creation spikes to seconds (new TLS handshakes), connections.timeout briefly non-zero, p99 climbs. All expected. The key is the spike ends, typically within ~30 seconds of the new writer being healthy. If you see a sustained elevated connections.creation after the event, check whether exception-override-class-name is configured — without it, HikariCP keeps handing out invalidated connections and the churn doesn't stop on its own.
Read-only traffic during failover
Readers are unaffected by writer failover. readWriteSplitting + correctly-marked read-only transactions means read traffic keeps flowing while writes pause for ~30-60 seconds. For read-heavy apps, marking transactions readOnly=true turns out to be both a performance win and an availability one. Do it for both reasons.
Blue/green deployments
If you're doing Aurora blue/green (RDS Blue/Green), the switchover is a writer-failover-like event from the driver's perspective. The plugins cover it with no extra config, but the same detection-budget trade-offs apply: faster detection = faster cutover = more false-positive risk during normal ops.
RDS Proxy: when, and how it interacts with this driver
If you've read this far, you're either using or considering RDS Proxy. The two layers — RDS Proxy in front of Aurora, the AWS JDBC driver inside your app — solve overlapping but not identical problems, and the AWS guidance you'd want to read together is scattered across the proxy planning page, the wrapper README, and a plugin doc most people miss.
When AWS recommends RDS Proxy
The planning page lists the canonical cases: "too many connections" pressure, T2/T3 instances where connection-setup CPU is significant, Lambda / serverless workloads, apps without a built-in pool, centralized IAM auth or Secrets Manager rotation, failover speedup (advertised at "up to 66%", typically <35 s for Multi-AZ Aurora), and Blue/Green deployments. For a long-lived Spring Boot pod with a well-tuned HikariCP, only the last three are particularly compelling — the multiplexing benefit is mostly theoretical when your external pool is sized correctly.
How RDS Proxy actually routes
The thing that catches teams out is the assumption that the proxy "splits reads and writes intelligently." It doesn't. From the endpoints docs, the proxy exposes two endpoints — a read/write endpoint that sends every request to the current writer, and a read-only endpoint that sends every request to some reader (with proxy-level rebalancing if a reader fails). There's no SQL inspection. The proxy routes where you point it, not what you send through it. SQL-aware splitting still requires application-side logic — either two DataSource beans in your app or the srw plugin described below.
Plugin compatibility behind RDS Proxy
The wrapper README's RDS Proxy section is unambiguous:
"Functionality like Failover, Enhanced Host Monitoring, and Read/Write Splitting is not compatible since the driver relies on cluster topology and RDS Proxy handles this automatically. The driver remains useful with RDS Proxy for authentication workflows, such as IAM authentication and AWS Secrets Manager integration."
Translated:
| Plugin | Behind RDS Proxy |
|---|---|
failover, failover2
|
Drop. Proxy handles writer failover; topology lookups conflict with the hidden pool. |
efm2 |
Drop. Per-connection probes don't see the underlying Aurora node. |
readWriteSplitting |
Drop. Relies on topology that's invisible behind the proxy. |
iamAuth |
Keep if you want JDBC-layer IAM (alternative to configuring it on the proxy). |
awsSecretsManager |
Optional — overlaps with proxy auth. Usually skip. |
srw (Simple R/W Splitting) |
Keep — purpose-built for this combination. |
The srw plugin — SQL-aware splitting through RDS Proxy
Available since v3.0.0 and documented here. Unlike readWriteSplitting, srw doesn't query the cluster for topology. You give it two explicit endpoints — srwWriteEndpoint (your read/write proxy endpoint) and srwReadEndpoint (your read-only proxy endpoint) — and it switches between them on Connection#setReadOnly(true/false). With Spring's @Transactional(readOnly=true), you keep the same single-DataSource ergonomics you'd have with readWriteSplitting against direct Aurora.
Two gotchas. Role verification (verifyNewSrwConnections=true by default) runs SELECT pg_catalog.pg_is_in_recovery() after switching, with up to a 60-second retry budget, to defend against DNS-cache staleness right after failover. Useful on paper; it conflicts with autocommit=false because the verification query opens a transaction. Either set setReadOnly before disabling autocommit, or set verifyNewSrwConnections=false. Mutual exclusion: don't combine srw with readWriteSplitting or gdbReadWriteSplitting on the same connection. They're alternatives, not layers.
Decision tree
| Setup | Plugins | Read/write split mechanism |
|---|---|---|
| Direct to Aurora, no proxy |
readWriteSplitting, auroraConnectionTracker, failover, efm2 (+ cp-*) |
Wrapper plugin, one DataSource, @Transactional(readOnly=true) routes via topology. |
| RDS Proxy + wrapper, SQL-aware split |
srw (+ iamAuth if needed) |
srw switches between two proxy endpoints on setReadOnly. One DataSource. |
RDS Proxy + plain org.postgresql.Driver |
n/a | Two DataSource beans (one per proxy endpoint). App routes manually. |
| Lambda / serverless | n/a | RDS Proxy + plain driver. The wrapper's value is amortized warm-pool benefits — irrelevant for cold invocations. |
Pinning — the multiplexing trap
RDS Proxy multiplexes by handing one backend session to multiple client connections, but only when the session is resettable. The pinning rules for Aurora PostgreSQL disable multiplexing on SET, PREPARE/DEALLOCATE/EXECUTE, temporary tables, declared cursors, LISTEN, advisory locks, and any statement >16 KB. Hibernate with server-side prepared statements pins on every session. There are real teams (Aggarwal's 12-hour revert is the most-cited public postmortem) that hit ~100% pinning under load and pulled the proxy out the same day. The diagnostic is the DatabaseConnections.PinnedConnections CloudWatch metric — if pinned connections approach total, you're paying for a proxy that isn't actually multiplexing.
My take
RDS Proxy and the AWS JDBC driver aren't usually a "pick one" decision — they solve different concerns and can layer cleanly if you pick the right plugins. Three rules I'd hold:
-
Failover ownership belongs to one layer. Don't run
failover+efm2behind a proxy. The proxy already does it; you're paying twice and risking conflicting reactions to transient errors. -
Read/write splitting needs an explicit choice. Two DataSource beans, or
srw, orreadWriteSplitting(no proxy). Pick one — never two. -
The wrapper still earns its keep behind a proxy if you're using IAM auth or
srw. Otherwise plainorg.postgresql.Driveris simpler and the wrapper's plugin chain is mostly cosmetic.
If your motivation for either layer is "make the app faster," neither is the answer — that's a query / index / cache problem.
The checklist I run through before shipping
Before I put the wrapper in front of production traffic, I go through this list. Nothing on it is optional.
-
Driver version ≥ 3.3.0.
cp-*properties landed in v3.1.0 andefm2has been available since v2.4.0, so this is not the floor for those features. The reason I draw the line at 3.3.0: it includes the readWriteSplitting + failover plugin-ordering fix and removes a 5-second sleep from the failover recovery path. If you're below 3.1.0,cp-*won't work at all. -
F0 profile not in use unless version-aware — on v2.x, F0 hardcodes
maxPoolSize=30. I've been burned. -
cp-MaximumPoolSize ≥ maximumPoolSizeon the external pool. -
exception-override-class-nameset tosoftware.amazon.jdbc.util.HikariCPSQLException. -
socketTimeout=0— liveness belongs to efm2. -
Read-only transactions annotated — otherwise
readWriteSplittingis decorative. -
Aurora
max_connectionssupportspods × cp-max × (1 + readers)with 20% headroom. - Topology endpoint reachable from every pod (cluster and per-instance DNS resolve via VPC DNS).
-
Plugin list ordered:
readWriteSplitting,auroraConnectionTracker,failover,efm2. -
Observability wired —
hikaricp.connections.pendingalert on non-zero steady state.
Where this leaves me
The AWS JDBC Driver is one of those libraries where the defaults are opinionated but not obvious, the configuration surface is large, and the version-to-version behavior has shifted in ways that invalidate older docs you'll find on the internet. The cases where I've seen teams get into trouble all look the same: they adopted a profile without reading what was inside it, or they moved from v2.x to v3.x without re-checking whether the properties they'd set still did anything.
If I could boil this post down to one practical habit: don't trust the external pool metrics alone. The wrapper adds a whole second layer of pooling between your hikaricp.connections count and the actual network. When the external pool metrics look fine but your requests are slow, look inside. And if you're still on v2.x with F0, upgrade — there is no property you can set to make it behave.
I lost a few hours to this. You shouldn't have to lose any.
References
AWS Advanced JDBC Wrapper — driver docs
-
Using the JDBC Driver — full parameter reference including
wrapperPlugins,wrapperProfileName,wrapperDialect, and all connection properties - Configuration Presets — what F0, F1, SF0, etc. actually configure (plugins, pool settings, timeouts)
-
Host Selection Strategies —
roundRobin,random,leastConnections,highestWeight -
Failover Configuration Guide —
failoverMode, detection tuning, transactional behavior during failover - Framework Integration — notes on Spring Boot, Hibernate, and other framework specifics
-
DataSource Configuration — alternative to driver-mode configuration via
AwsWrapperDataSource - Compatibility — supported databases, JDBC versions, known limitations
AWS Advanced JDBC Wrapper — plugin docs
-
readWriteSplitting— reader routing, internal connection pooling withconnectionPoolType=hikari,cp-*properties,readerHostSelectorStrategy -
failover— classic failover plugin; topology detection, connection invalidation -
failover2— newer failover implementation (v2); recommended for new deployments -
efm2(Host Monitoring) —failureDetectionTime,failureDetectionInterval,failureDetectionCount, monitoring timeouts -
auroraConnectionTracker— connection-to-instance mapping for failover invalidation
AWS Advanced JDBC Wrapper — examples and changelog
-
Spring Boot + HikariCP example — working YAML with
exception-override-class-nameand HikariCP data-source properties - Spring + Hibernate example — Hibernate-specific session factory integration
- Spring Transaction Failover example — handling transactional rollback during failover
-
PR #1658 — configurable internal pool — the change (v3.1.0) that made
cp-*properties work outside of profiles - Changelog — version-to-version migration notes
RDS Proxy
- Planning where to use Amazon RDS Proxy — the canonical use-case list (Lambda, T2/T3, IAM auth, Blue/Green, failover speedup)
- RDS Proxy endpoints — read/write vs read-only endpoints; "the proxy routes where you point it" semantics
- Avoiding pinning — full list of session-state operations that disable multiplexing per engine
-
Wrapper README — RDS Proxy section — the official statement that
failover,efm2, andreadWriteSplittingare incompatible with RDS Proxy -
Simple Read/Write Splitting Plugin (
srw) — the topology-agnostic plugin purpose-built for use behind RDS Proxy (since v3.0.0) - RDS Proxy pricing — per vCPU-hour for provisioned, per ACU-hour for Serverless
External
- HikariCP — About Pool Sizing
- Aurora PostgreSQL — Performance and scaling for Amazon Aurora PostgreSQL (source for the
max_connectionsdefault formula and the 5,000-connection cap) - Aggarwal — "Experience with AWS RDS Proxy in production, and why we had to revert it in 12 hours" (cited in the pinning section)




Top comments (0)