Adaptive Query Execution (AQE) Tuning Guide
Datanest Digital — Spark Optimization Playbook
AQE is Spark's runtime query re-optimization engine. It observes actual data statistics during execution and adjusts the query plan on the fly. This guide covers every AQE feature, when it helps, and how to tune it.
What AQE Does
AQE re-optimizes the query plan at stage boundaries (after each shuffle). It can:
- Coalesce shuffle partitions — Merge small post-shuffle partitions into larger ones
- Switch join strategies — Convert sort-merge join to broadcast join at runtime
- Handle skewed joins — Split skewed partitions and replicate the other side
- Optimize skewed aggregations — Split skewed groups across multiple tasks
Enabling AQE
# Master switch (enabled by default on DBR 12.2+)
spark.conf.set("spark.sql.adaptive.enabled", "true")
AQE is generally safe to enable and should be on for all workloads. The only reason to disable it is for benchmarking to isolate its impact.
Feature 1: Partition Coalescing
What It Does
After a shuffle, some partitions may be tiny (a few KB). AQE merges adjacent small partitions into larger ones to reduce task overhead.
Key Configs
| Config | Default | Recommended | Notes |
|---|---|---|---|
spark.sql.adaptive.coalescePartitions.enabled |
true | true | Master switch for coalescing |
spark.sql.adaptive.coalescePartitions.initialPartitionNum |
(none) | 4096 | Starting shuffle partitions before coalescing. Higher = more granularity for AQE to work with |
spark.sql.adaptive.coalescePartitions.minPartitionSize |
1 MB | 4-16 MB | Minimum size per coalesced partition |
spark.sql.adaptive.advisoryPartitionSizeInBytes |
64 MB | 128 MB | Target partition size after coalescing |
Tuning Strategy
# For large datasets (> 100 GB after shuffle):
spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true")
spark.conf.set("spark.sql.adaptive.coalescePartitions.initialPartitionNum", "4096")
spark.conf.set("spark.sql.adaptive.advisoryPartitionSizeInBytes", "134217728") # 128 MB
spark.conf.set("spark.sql.adaptive.coalescePartitions.minPartitionSize", "4194304") # 4 MB
# For small datasets (< 10 GB after shuffle):
spark.conf.set("spark.sql.adaptive.advisoryPartitionSizeInBytes", "67108864") # 64 MB
spark.conf.set("spark.sql.adaptive.coalescePartitions.minPartitionSize", "1048576") # 1 MB
When to Increase initialPartitionNum
- Set higher than expected partition count to give AQE room to coalesce down
- Rule of thumb:
data_size_bytes / advisoryPartitionSizeInBytes * 3 - Excess partitions have negligible cost — AQE will merge them
Feature 2: Dynamic Join Strategy Switching
What It Does
When AQE discovers that one side of a sort-merge join is small enough to broadcast (below the broadcast threshold), it dynamically converts the join to a broadcast hash join — eliminating the shuffle on the large side.
Key Configs
| Config | Default | Recommended | Notes |
|---|---|---|---|
spark.sql.adaptive.autoBroadcastJoinThreshold |
= autoBroadcastJoinThreshold
|
100-256 MB | Runtime threshold for converting to broadcast. Can be higher than the static threshold because AQE uses actual sizes |
spark.sql.autoBroadcastJoinThreshold |
10 MB | 100 MB | Static planning threshold (pre-AQE). AQE can override this at runtime |
Tuning Strategy
# Set static threshold conservatively
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "104857600") # 100 MB
# AQE will additionally convert joins at runtime when it discovers
# actual table sizes — no additional config needed beyond enabling AQE
Important Notes
- AQE's dynamic switching uses actual runtime sizes (post-filter, post-aggregation), so it catches cases the static planner misses
- If a table has stale statistics, AQE compensates automatically
- Monitor the SQL tab in Spark UI — look for "BroadcastHashJoin" replacing "SortMergeJoin"
Feature 3: Skew Join Optimization
What It Does
Detects partitions that are significantly larger than the median. Splits the skewed partition into sub-partitions and replicates the corresponding partition from the other side of the join.
Key Configs
| Config | Default | Recommended | Notes |
|---|---|---|---|
spark.sql.adaptive.skewJoin.enabled |
true | true | Master switch |
spark.sql.adaptive.skewJoin.skewedPartitionFactor |
5 | 5-10 | A partition is skewed if size > median * factor |
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes |
256 MB | 256 MB | Minimum absolute size to be considered skewed |
Tuning Strategy
# Default settings work well for most cases
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true")
# For highly skewed data (e.g., 100:1 skew ratio):
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionFactor", "5")
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes", "134217728") # 128 MB
# For moderate skew (10:1 ratio), tighten the detection:
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionFactor", "3")
Limitations
- Only works for sort-merge joins (not broadcast joins — those don't have the skew problem)
- Only handles skew on one side of the join at a time
- If both sides are skewed on the same key, consider salting instead (see optimization_patterns.md)
Feature 4: Skewed Aggregation Optimization
What It Does
For GROUP BY operations with skewed groups, AQE can split the aggregation into two phases: a partial aggregation on sub-partitions, then a final merge.
Key Configs
| Config | Default | Notes |
|---|---|---|
spark.sql.adaptive.optimizeSkewedJoin.enabled |
true | Databricks-specific extension |
This is generally handled automatically when AQE is enabled.
AQE Configuration Template
Copy-paste this block as a starting point for your cluster init scripts or notebook headers:
# --- AQE Configuration — Datanest Digital Recommended ---
# Master switch
spark.conf.set("spark.sql.adaptive.enabled", "true")
# Partition coalescing
spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true")
spark.conf.set("spark.sql.adaptive.coalescePartitions.initialPartitionNum", "4096")
spark.conf.set("spark.sql.adaptive.advisoryPartitionSizeInBytes", "134217728") # 128 MB
spark.conf.set("spark.sql.adaptive.coalescePartitions.minPartitionSize", "4194304") # 4 MB
# Join optimization
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "104857600") # 100 MB
# Skew handling
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true")
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionFactor", "5")
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes", "268435456") # 256 MB
Verifying AQE Is Working
In the Spark UI SQL Tab
- Open the query's SQL plan in the Spark UI
- Look for
AdaptiveSparkPlanas the root node - Look for annotations:
-
coalesced partitions: X -> Y(coalescing happened) -
BroadcastHashJoinwhere you expectedSortMergeJoin(dynamic switching) -
skewedannotations on join nodes (skew handling)
-
Programmatically
# Check if AQE is active
print(spark.conf.get("spark.sql.adaptive.enabled"))
# After running a query, check the plan
df = spark.table("my_table").groupBy("key").count()
df.explain(mode="formatted")
# Look for "AdaptiveSparkPlan" in the output
Common Pitfalls
-
Setting
shuffle.partitionstoo low with AQE — AQE can only coalesce (merge) partitions, not split them. SetinitialPartitionNumhigh. - Disabling coalescing but keeping AQE — You lose the most impactful feature.
- Expecting AQE to fix all skew — AQE skew handling is limited to sort-merge joins. Use salting for complex skew patterns.
- Ignoring AQE overhead — AQE adds milliseconds of planning overhead per stage. For sub-second queries, this can be significant. Disable for very fast queries.
AQE Tuning Guide — Datanest Digital Spark Optimization Playbook
https://datanest.dev
This is 1 of 20 resources in the Datanest Platform Pro toolkit. Get the complete [Spark Optimization Playbook] with all files, templates, and documentation for $69.
Or grab the entire Datanest Platform Pro bundle (20 products) for $199 — save 30%.
Top comments (0)