Franck Pachot for AWS Heroes

Posted on Dec 24, 2025

Adaptive Join in Amazon Aurora PostgreSQL

#aws #aurora #postgres #adaptive

Slow joins in PostgreSQL often result from a nested loop join chosen by the query planner, which estimates a few rows but processes hundreds of thousands. System metrics like buffer cache hit ratio are all green, but it reads more data than necessary and takes longer. This can happen suddenly because the join method decision is sensitive: a single row difference can trigger a shift from a hash join to a nested loop.
Although cost-based optimizers are common, alternative strategies exist. For example, MongoDB's multi-planner postpones index selection until execution, testing all options and switching to the best after a short trial. Likewise, Oracle Database can delay selecting the join method or parallel query distribution by buffering rows before determining the plan for the remaining data. Amazon Aurora implements a similar approach called adaptive join, which defers the decision between nested loop and hash join until execution.

Here is an example of Amazon Aurora adaptive plans. I used the same tables as in the previous post with two additional indexes:

CREATE INDEX ON outer_table (a,b,id);
CREATE INDEX ON inner_table (id,b);

I executed the following query:

explain (analyze)
SELECT o.b,i.b 
 FROM outer_table o 
 JOIN inner_table i USING(id)
 WHERE o.a<10 AND o.b<10
;

                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.57..124.76 rows=4 width=8) (actual time=0.020..0.119 rows=9 loops=1)
   ->  Index Only Scan using outer_table_a_b_id_idx on outer_table o  (cost=0.29..58.71 rows=20 width=8) (actual time=0.011..0.061 rows=28 loops=1)
         Index Cond: ((a < 10) AND (b < 10))
         Heap Fetches: 0
   ->  Index Only Scan using inner_table_id_b_idx on inner_table i  (cost=0.29..3.30 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=28)
         Index Cond: (id = o.id)
         Heap Fetches: 0
 Planning Time: 0.321 ms
 Execution Time: 0.152 ms
(9 rows)

Because I had optimal indexes and not too many rows in the outer table (estimated rows=20), the query planner chose a nested loop join. During execution, there were more rows than estimated (actual rows=28). It might still be an effective join method with not too many loops (loops=28). However, what happens if the actual number of rows is much higher?

For instance, increasing the range on "o.b" causes the query planner to swap the join inputs and select a hash join.

explain (analyze)
SELECT o.b,i.b 
 FROM outer_table o 
 JOIN inner_table i USING(id)
 WHERE o.a<10 AND o.b<1000
;
                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=108.48..289.73 rows=444 width=8) (actual time=0.656..2.426 rows=900 loops=1)
   Hash Cond: (i.id = o.id)
   ->  Seq Scan on inner_table i  (cost=0.00..155.00 rows=10000 width=8) (actual time=0.006..0.697 rows=10000 loops=1)
   ->  Hash  (cost=80.72..80.72 rows=2221 width=8) (actual time=0.624..0.624 rows=2250 loops=1)
         Buckets: 4096  Batches: 1  Memory Usage: 120kB
         ->  Index Only Scan using outer_table_a_b_id_idx on outer_table o  (cost=0.29..80.72 rows=2221 width=8) (actual time=0.019..0.315 rows=2250 loops=1)
               Index Cond: ((a < 10) AND (b < 1000))
               Heap Fetches: 0
 Planning Time: 0.901 ms
 Execution Time: 2.522 ms
(10 rows)

Instead of starting a nested loop from "outer_table," this approach loads the entire "outer_table" into a build table using hashing and starts the probe from "inner_table." Although this initial step takes longer to build the hash table, it prevents running 2,000 inner loops, as I can verify by disabling all other methods.

set enable_hashjoin  to off;
set enable_mergejoin to off;

explain (analyze)
SELECT o.b,i.b 
 FROM outer_table o 
 JOIN inner_table i USING(id)
 WHERE o.a<10 AND o.b<1000
;

                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.57..872.57 rows=444 width=8) (actual time=0.610..4.871 rows=900 loops=1)
   ->  Index Only Scan using outer_table_a_b_id_idx on outer_table o  (cost=0.29..80.72 rows=2221 width=8) (actual time=0.039..0.334 rows=2250 loops=1)
         Index Cond: ((a < 10) AND (b < 1000))
         Heap Fetches: 0
   ->  Index Only Scan using inner_table_id_b_idx on inner_table i  (cost=0.29..0.36 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=2250)
         Index Cond: (id = o.id)
         Heap Fetches: 0
 Planning Time: 4.991 ms
 Execution Time: 5.670 ms
(9 rows)

With accurate cardinality estimates, the query planner can select the optimal join method — but during execution, these estimates can sometimes be significantly inaccurate. This is where an adaptive plan can assist—not necessarily to find the perfect plan, but to prevent the worst-case scenarios.

I re-enable all join methods, activate the adaptive plan, and rerun my initial query, which retrieves 28 rows from the outer table.

set enable_hashjoin  to on;
set enable_mergejoin to on;

set apg_adaptive_join_crossover_multiplier to 1;
set apg_enable_parameterized_adaptive_join to on;

explain (analyze)
SELECT o.b,i.b 
 FROM outer_table o 
 JOIN inner_table i USING(id)
 WHERE o.a<10 AND o.b<10
;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (Adaptive)  (cost=0.57..124.76 rows=4 width=8) (actual time=4.559..4.609 rows=9 loops=1)
   Adaptive Crossover: rows=74
   ->  Index Only Scan using outer_table_a_b_id_idx on outer_table o  (cost=0.29..58.71 rows=20 width=8) (actual time=1.999..3.261 rows=28 loops=1)
         Index Cond: ((a < 10) AND (b < 10))
         Heap Fetches: 0
   ->  Index Only Scan using inner_table_id_b_idx on inner_table i  (cost=0.29..3.30 rows=1 width=8) (actual time=0.047..0.047 rows=0 loops=28)
         Index Cond: (id = o.id)
         Heap Fetches: 0
 Planning Time: 2.107 ms
 Execution Time: 4.648 ms
(10 rows)

It's still a nested loop with the same cost estimate as before, but it is now flagged as (Adaptive) and shows an extra detail: Adaptive Crossover: rows=74.

This indicates that the query planner found a nested loop to be cheaper than a hash join for the originally estimated number of iterations (rows=20). At planning time, it also computed the cost for higher row counts and identified a crossover point at rows=74, beyond which a hash join would have been cheaper and therefore selected. In other words, the planner pre-calculated an inflection point at which it would prefer a hash join and deferred the final choice to execution time.

At runtime, the rows read from outer_table are counted and buffered. Because the row count never reached the crossover/inflection point, the plan continued using the nested loop.

To see how the plan changes with more qualifying rows, I updated my data so that more rows satisfy the predicate a < 10 AND b < 10:

UPDATE outer_table SET b=0 WHERE a<10 AND b BETWEEN 10 AND 40;

UPDATE 47

I ran my query again. It’s still (Adaptive) with Adaptive Crossover: rows=74, but now it shows a Hash Join:

explain (analyze)
SELECT o.b,i.b 
 FROM outer_table o 
 JOIN inner_table i USING(id)
 WHERE o.a<10 AND o.b<10
;

                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join (Adaptive)  (cost=58.96..240.21 rows=4 width=8) (actual time=2.531..3.801 rows=28 loops=1)
   Output: o.b, i.b
   Inner Unique: true
   Adaptive Crossover: rows=74
   Hash Cond: (i.id = o.id)
   ->  Seq Scan on public.inner_table i  (cost=0.00..155.00 rows=10000 width=8) (actual time=0.007..0.628 rows=10000 loops=1)
         Output: i.id, i.a, i.b
   ->  Hash  (cost=58.71..58.71 rows=20 width=8) (actual time=2.470..2.470 rows=75 loops=1)
         Output: o.b, o.id
         ->  Index Only Scan using outer_table_a_b_id_idx on public.outer_table o  (cost=0.29..58.71 rows=20 width=8) (actual time=1.103..1.280 rows=75 loops=1)
               Output: o.b, o.id
               Index Cond: ((o.a < 10) AND (o.b < 10))
               Heap Fetches: 57
 Query Identifier: 8990309245261094611
 Planning Time: 1.674 ms
 Execution Time: 3.861 ms
(16 rows)

At planning time, the decision remained the same as before because the statistics had not changed (the estimate was still cost=0.29..58.71 rows=20). In reality, though, more than 74 rows were read from outer_table (actual rows=75), and instead of being used for a nested loop, the buffered rows were used as the build table of a hash join.

I then analyzed the table to see what would happen with fresh statistics, and was surprised to find the plan reverted to a nested loop:

analyze outer_table ;

explain (analyze)
SELECT o.b,i.b
 FROM outer_table o
 JOIN inner_table i USING(id)
 WHERE o.a<10 AND o.b<10
;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (Adaptive)  (cost=0.57..145.25 rows=5 width=8) (actual time=0.124..0.248 rows=28 loops=1)
   Adaptive Crossover: rows=78
   ->  Index Only Scan using outer_table_a_b_id_idx on outer_table o  (cost=0.29..70.29 rows=23 width=8) (actual time=0.026..0.104 rows=75 loops=1)
         Index Cond: ((a < 10) AND (b < 10))
         Heap Fetches: 57
   ->  Index Only Scan using inner_table_id_b_idx on inner_table i  (cost=0.29..3.26 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=75)
         Index Cond: (id = o.id)
         Heap Fetches: 0
 Planning Time: 1.025 ms
 Execution Time: 0.287 ms
(10 rows)

The reason is that even with a freshly analyzed table, the optimizer’s estimate is worse than before: it predicts fewer rows (rows=23) when there are actually more (rows=75). This can happen because a predicate such as a < 10 AND b < 10 is already complex for the cost-based optimizer. Due to these misestimates, the inflection point was estimated higher (rows=78), so the optimizer still chose a nested loop plan.

What I like about this feature—which I’ve been familiar with since it was implemented in Oracle Database years ago—is that it doesn’t try to find the absolute best plan. Instead, it focuses on avoiding the worst plans (for example, nested loops on tens or hundreds of thousands of rows) and switching to a plan that is simply good enough. Amazon Aurora is a black box with limited tracing, so it’s difficult to know exactly how it works, but it probably behaves similarly to Oracle adaptive plans. I wrote an older blog post about how Oracle determines the inflection point:

dbi BlogOracle 12c Adaptive Plan & inflection point - dbi Blog

Oracle 12c Adaptive Plan description

dbi-services.com

As this requires more work at execution time, Aurora triggers it only when the cost of the first join method is higher than a threshold, by default 100:

postgres=> \dconfig apg*adaptive_join*

        List of configuration parameters
               Parameter                | Value
----------------------------------------+-------
 apg_adaptive_join_cost_threshold       | 100
 apg_adaptive_join_crossover_multiplier | 1
 apg_enable_parameterized_adaptive_join | on
(3 rows)

In my examples, the costs were higher than apg_adaptive_join_cost_threshold (Nested Loop (cost=0.57..124.76). I had to enable apg_enable_parameterized_adaptive_join because the join predicate is pushed down as a parameter (Index Cond: (id = o.id)), which is the main advantage of a nested loop join as it enables index access to the inner table. I've set the apg_adaptive_join_crossover_multiplier to enable the feature. Setting a higher value simply raises the inflection point by multiplying the crossover value, which reduces the likelihood of an adaptive plan being triggered.

To test it further, I modified my data so that the outer table returns 50,000 rows and run the query again:

postgres=> UPDATE outer_table SET a=0 , b=0;

UPDATE 50000

postgres=> explain (analyze)
SELECT o.b,i.b
 FROM outer_table o
 JOIN inner_table i USING(id)
 WHERE o.a<10 AND o.b<10
;
                                                                          QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join (Adaptive)  (cost=180.21..361.46 rows=5 width=8) (actual time=231.245..234.250 rows=10000 loops=1)
   Adaptive Crossover: rows=139
   Hash Cond: (i.id = o.id)
   ->  Seq Scan on inner_table i  (cost=0.00..155.00 rows=10000 width=8) (actual time=0.014..0.633 rows=10000 loops=1)
   ->  Hash  (cost=179.65..179.65 rows=45 width=8) (actual time=231.240..231.240 rows=50000 loops=1)
         ->  Index Only Scan using outer_table_a_b_id_idx on outer_table o  (cost=0.42..179.65 rows=45 width=8) (actual time=1.641..90.167 rows=50000 loops=1)
               Index Cond: ((a < 10) AND (b < 10))
               Heap Fetches: 50075
 Planning Time: 1.041 ms
 Execution Time: 298.893 ms
(10 rows)

The adaptive plan avoided a nested loop join that would have required 50,000 loops. With accurate statistics, the optimizer probably would have selected a merge join instead, since it needs to read all tables and I have indexes on the join keys. In that case, a merge join would have been faster than a hash join. That said, even with stale statistics, the hash join was still much better — or at least less risky — than using a nested loop join.

Without this feature in PostgreSQL, you still have options: ensure statistics are accurate (this remains true even with adaptive plans, which rely on estimates) and use extended statistics where they help. Make sure you have the right indexes so that cost differences are clear and the planner does not hesitate between two bad plans. You can use pg_hint_plan to force a specific join method, though it often needs more hints than expected (see Predictable plans with pg_hint_plan full hinting). Some people tweak random_page_cost, which affects index costs and thus join choices, but I have my own ideas about that. Because joins are central in SQL databases due to relational normalization, a poor join method can make them seem slow and unpredictable, so it is crucial to understand join strategies and review execution plans carefully. This feature in Aurora helps prevent some runaway queries, so I think it is a good idea to enable it by default, especially given that you can set a crossover multiplier to have it kick in only to avoid the worst cases.

Top comments (3)

Andrei Lepikhov • Dec 26 '25

All new features are just something forgotten / badly advertised: pgevents.ca/events/pgconfdev2025/s...

I spent a couple of years demonstrating and debating that people should use it - it is just an extension, not a core patch. But yes, AWS's power of promotion is much greater. Maybe people begin to use it?

Sanjeev Singh • Dec 24 '25

Nice write-up.

Really like the way you framed this as "avoiding the worst plans rather than finding the perfect one"

That mindset maps much closer to how things behave in real production systems — especially when estimates are wrong or workloads drift.

Great explanation.

Michal Cyncynatus • Dec 24 '25

Interesting! I was not aware of this feature in Aurora. It can definitely help when your data is skewed. Are there any downsides to enabling this feature by default?