DatanestDigital

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Spark Optimization Playbook: Adaptive Query Execution AQE Tuning Guide

#spark #databricks #azure #dataengineering

Adaptive Query Execution (AQE) Tuning Guide

Datanest Digital — Spark Optimization Playbook

AQE is Spark's runtime query re-optimization engine. It observes actual data statistics during execution and adjusts the query plan on the fly. This guide covers every AQE feature, when it helps, and how to tune it.

What AQE Does

AQE re-optimizes the query plan at stage boundaries (after each shuffle). It can:

Coalesce shuffle partitions — Merge small post-shuffle partitions into larger ones
Switch join strategies — Convert sort-merge join to broadcast join at runtime
Handle skewed joins — Split skewed partitions and replicate the other side
Optimize skewed aggregations — Split skewed groups across multiple tasks

Enabling AQE

# Master switch (enabled by default on DBR 12.2+)
spark.conf.set("spark.sql.adaptive.enabled", "true")

AQE is generally safe to enable and should be on for all workloads. The only reason to disable it is for benchmarking to isolate its impact.

Feature 1: Partition Coalescing

What It Does

After a shuffle, some partitions may be tiny (a few KB). AQE merges adjacent small partitions into larger ones to reduce task overhead.

Key Configs

Config	Default	Recommended	Notes
`spark.sql.adaptive.coalescePartitions.enabled`	true	true	Master switch for coalescing
`spark.sql.adaptive.coalescePartitions.initialPartitionNum`	(none)	4096	Starting shuffle partitions before coalescing. Higher = more granularity for AQE to work with
`spark.sql.adaptive.coalescePartitions.minPartitionSize`	1 MB	4-16 MB	Minimum size per coalesced partition
`spark.sql.adaptive.advisoryPartitionSizeInBytes`	64 MB	128 MB	Target partition size after coalescing

Tuning Strategy

# For large datasets (> 100 GB after shuffle):
spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true")
spark.conf.set("spark.sql.adaptive.coalescePartitions.initialPartitionNum", "4096")
spark.conf.set("spark.sql.adaptive.advisoryPartitionSizeInBytes", "134217728")  # 128 MB
spark.conf.set("spark.sql.adaptive.coalescePartitions.minPartitionSize", "4194304")  # 4 MB

# For small datasets (< 10 GB after shuffle):
spark.conf.set("spark.sql.adaptive.advisoryPartitionSizeInBytes", "67108864")  # 64 MB
spark.conf.set("spark.sql.adaptive.coalescePartitions.minPartitionSize", "1048576")  # 1 MB

When to Increase `initialPartitionNum`

Set higher than expected partition count to give AQE room to coalesce down
Rule of thumb: data_size_bytes / advisoryPartitionSizeInBytes * 3
Excess partitions have negligible cost — AQE will merge them

Feature 2: Dynamic Join Strategy Switching

What It Does

When AQE discovers that one side of a sort-merge join is small enough to broadcast (below the broadcast threshold), it dynamically converts the join to a broadcast hash join — eliminating the shuffle on the large side.

Key Configs

Config	Default	Recommended	Notes
`spark.sql.adaptive.autoBroadcastJoinThreshold`	= `autoBroadcastJoinThreshold`	100-256 MB	Runtime threshold for converting to broadcast. Can be higher than the static threshold because AQE uses actual sizes
`spark.sql.autoBroadcastJoinThreshold`	10 MB	100 MB	Static planning threshold (pre-AQE). AQE can override this at runtime

Tuning Strategy

# Set static threshold conservatively
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "104857600")  # 100 MB

# AQE will additionally convert joins at runtime when it discovers
# actual table sizes — no additional config needed beyond enabling AQE

Important Notes

AQE's dynamic switching uses actual runtime sizes (post-filter, post-aggregation), so it catches cases the static planner misses
If a table has stale statistics, AQE compensates automatically
Monitor the SQL tab in Spark UI — look for "BroadcastHashJoin" replacing "SortMergeJoin"

Feature 3: Skew Join Optimization

What It Does

Detects partitions that are significantly larger than the median. Splits the skewed partition into sub-partitions and replicates the corresponding partition from the other side of the join.

Key Configs

Config	Default	Recommended	Notes
`spark.sql.adaptive.skewJoin.enabled`	true	true	Master switch
`spark.sql.adaptive.skewJoin.skewedPartitionFactor`	5	5-10	A partition is skewed if size > median * factor
`spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes`	256 MB	256 MB	Minimum absolute size to be considered skewed

Tuning Strategy

# Default settings work well for most cases
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true")

# For highly skewed data (e.g., 100:1 skew ratio):
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionFactor", "5")
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes", "134217728")  # 128 MB

# For moderate skew (10:1 ratio), tighten the detection:
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionFactor", "3")

Limitations

Only works for sort-merge joins (not broadcast joins — those don't have the skew problem)
Only handles skew on one side of the join at a time
If both sides are skewed on the same key, consider salting instead (see optimization_patterns.md)

Feature 4: Skewed Aggregation Optimization

What It Does

For GROUP BY operations with skewed groups, AQE can split the aggregation into two phases: a partial aggregation on sub-partitions, then a final merge.

Key Configs

Config	Default	Notes
`spark.sql.adaptive.optimizeSkewedJoin.enabled`	true	Databricks-specific extension

This is generally handled automatically when AQE is enabled.

AQE Configuration Template

Copy-paste this block as a starting point for your cluster init scripts or notebook headers:

# --- AQE Configuration — Datanest Digital Recommended ---

# Master switch
spark.conf.set("spark.sql.adaptive.enabled", "true")

# Partition coalescing
spark.conf.set("spark.sql.adaptive.coalescePartitions.enabled", "true")
spark.conf.set("spark.sql.adaptive.coalescePartitions.initialPartitionNum", "4096")
spark.conf.set("spark.sql.adaptive.advisoryPartitionSizeInBytes", "134217728")  # 128 MB
spark.conf.set("spark.sql.adaptive.coalescePartitions.minPartitionSize", "4194304")  # 4 MB

# Join optimization
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "104857600")  # 100 MB

# Skew handling
spark.conf.set("spark.sql.adaptive.skewJoin.enabled", "true")
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionFactor", "5")
spark.conf.set("spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes", "268435456")  # 256 MB

Verifying AQE Is Working

In the Spark UI SQL Tab

Open the query's SQL plan in the Spark UI
Look for AdaptiveSparkPlan as the root node
Look for annotations:
- coalesced partitions: X -> Y (coalescing happened)
- BroadcastHashJoin where you expected SortMergeJoin (dynamic switching)
- skewed annotations on join nodes (skew handling)

Programmatically

# Check if AQE is active
print(spark.conf.get("spark.sql.adaptive.enabled"))

# After running a query, check the plan
df = spark.table("my_table").groupBy("key").count()
df.explain(mode="formatted")
# Look for "AdaptiveSparkPlan" in the output

Common Pitfalls

Setting shuffle.partitions too low with AQE — AQE can only coalesce (merge) partitions, not split them. Set initialPartitionNum high.
Disabling coalescing but keeping AQE — You lose the most impactful feature.
Expecting AQE to fix all skew — AQE skew handling is limited to sort-merge joins. Use salting for complex skew patterns.
Ignoring AQE overhead — AQE adds milliseconds of planning overhead per stage. For sub-second queries, this can be significant. Disable for very fast queries.

AQE Tuning Guide — Datanest Digital Spark Optimization Playbook
https://datanest.dev

This is 1 of 20 resources in the Datanest Platform Pro toolkit. Get the complete [Spark Optimization Playbook] with all files, templates, and documentation for $69.

Get the Full Kit →

Or grab the entire Datanest Platform Pro bundle (20 products) for $199 — save 30%.

Get the Complete Bundle →

DEV Community

Spark Optimization Playbook: Adaptive Query Execution AQE Tuning Guide

Adaptive Query Execution (AQE) Tuning Guide

What AQE Does

Enabling AQE

Feature 1: Partition Coalescing

What It Does

Key Configs

Tuning Strategy

When to Increase `initialPartitionNum`

Feature 2: Dynamic Join Strategy Switching

What It Does

Key Configs

Tuning Strategy

Important Notes

Feature 3: Skew Join Optimization

What It Does

Key Configs

Tuning Strategy

Limitations

Feature 4: Skewed Aggregation Optimization

What It Does

Key Configs

AQE Configuration Template

Verifying AQE Is Working

In the Spark UI SQL Tab

Programmatically

Common Pitfalls

Related Articles

Top comments (0)

Adaptive Query Execution (AQE) Tuning Guide

What AQE Does

Enabling AQE

Feature 1: Partition Coalescing

What It Does

Key Configs

Tuning Strategy

When to Increase initialPartitionNum

Feature 2: Dynamic Join Strategy Switching

What It Does

Key Configs

Tuning Strategy

Important Notes

Feature 3: Skew Join Optimization

What It Does

Key Configs

Tuning Strategy

Limitations

Feature 4: Skewed Aggregation Optimization

What It Does

Key Configs

AQE Configuration Template

Verifying AQE Is Working

In the Spark UI SQL Tab

Programmatically

Common Pitfalls

Related Articles

When to Increase `initialPartitionNum`