Delta Lake Optimization Cheatsheet
Quick reference for every Delta Lake performance optimization technique.
Covers OPTIMIZE, VACUUM, table properties, MERGE patterns, data skipping,
partitioning strategies, liquid clustering, and file sizing.
Table of Contents
- OPTIMIZE
- VACUUM
- Z-ORDER
- Liquid Clustering
- Table Properties
- MERGE Optimization
- Data Skipping
- Partitioning Strategy
- File Size Guidelines
- Schema Evolution
- Time Travel and Restore
- Maintenance Schedule
OPTIMIZE
Compacts small files into larger ones for better read performance.
Basic Usage
-- Optimize entire table
OPTIMIZE my_database.my_table;
-- Optimize specific partitions only (much faster)
OPTIMIZE my_database.my_table
WHERE date >= '2025-01-01' AND date < '2025-02-01';
What OPTIMIZE Does
Before OPTIMIZE:
partition=2025-01-15/
├── part-00000.parquet (2 MiB)
├── part-00001.parquet (1 MiB)
├── part-00002.parquet (500 KiB)
├── part-00003.parquet (3 MiB)
├── ...
└── part-00099.parquet (800 KiB)
Total: 100 files, 150 MiB
After OPTIMIZE:
partition=2025-01-15/
├── part-00000.parquet (128 MiB)
└── part-00001.parquet (22 MiB)
Total: 2 files, 150 MiB
OPTIMIZE with Z-ORDER
-- Optimize and co-locate data by column values
OPTIMIZE my_database.my_table
ZORDER BY (customer_id, product_id);
-- Z-ORDER on specific partitions
OPTIMIZE my_database.my_table
WHERE date = '2025-01-15'
ZORDER BY (customer_id);
OPTIMIZE Metrics
OPTIMIZE returns these metrics:
numFilesAdded: 2 (new compacted files)
numFilesRemoved: 100 (old small files)
filesAdded.min: 22 MiB
filesAdded.max: 128 MiB
filesAdded.avg: 75 MiB
numBatches: 1
totalConsideredFiles: 100
totalFilesSkipped: 0
Key Settings
| Setting | Default | Recommended | Notes |
|---|---|---|---|
| Target file size | 1 GiB (OSS) / varies (Databricks) | 128-256 MiB | Set via table property |
spark.databricks.delta.optimizeWrite.enabled |
false | true | Auto-size files at write |
spark.databricks.delta.autoCompact.enabled |
false | true | Auto-compact after writes |
-- Set target file size per table
ALTER TABLE my_table SET TBLPROPERTIES ('delta.targetFileSize' = '128m');
VACUUM
Removes files no longer referenced by the Delta log. Essential for storage cost control.
Basic Usage
-- Remove files older than default retention (7 days)
VACUUM my_database.my_table;
-- Remove files older than 30 days
VACUUM my_database.my_table RETAIN 720 HOURS;
-- Dry run — see what would be deleted without deleting
VACUUM my_database.my_table DRY RUN;
Safety Rules
| Rule | Details |
|---|---|
| Minimum retention | 7 days by default (controlled by delta.deletedFileRetentionDuration) |
| Never vacuum below retention | Active readers/writers could fail if their referenced files are deleted |
| VACUUM is idempotent | Safe to run multiple times |
| VACUUM only deletes unreferenced files | Never deletes current table data |
Override Minimum Retention (Dangerous!)
-- Only if you're CERTAIN no jobs are reading old versions
SET spark.databricks.delta.retentionDurationCheck.enabled = false;
VACUUM my_table RETAIN 0 HOURS; -- Deletes everything not in current version
Storage Savings Estimate
# Check table size before/after VACUUM
from delta.tables import DeltaTable
dt = DeltaTable.forName(spark, "my_database.my_table")
detail = dt.detail().collect()[0]
print(f"Files: {detail['numFiles']}")
print(f"Size: {detail['sizeInBytes'] / (1024**3):.2f} GiB")
# Run VACUUM DRY RUN to see what would be cleaned
spark.sql("VACUUM my_database.my_table DRY RUN")
Z-ORDER
Multi-dimensional clustering that co-locates related data in the same files
for faster data skipping.
When to Z-ORDER
| Scenario | Z-ORDER? | Why |
|---|---|---|
| Column frequently in WHERE clauses | Yes | Dramatically reduces files scanned |
| Column used in JOIN keys | Yes | Improves join-side scan performance |
| High-cardinality column (e.g., user_id) | Yes | Great data skipping potential |
| Low-cardinality column (e.g., status) | No | Partitioning is better |
| Column never filtered on | No | Z-ORDER won't help queries that don't filter |
Best Practices
-- Z-ORDER on 1-3 columns (more columns = less effective per column)
OPTIMIZE my_table ZORDER BY (customer_id); -- Best for single-column filters
OPTIMIZE my_table ZORDER BY (customer_id, product_id); -- Good for two-column filters
OPTIMIZE my_table ZORDER BY (customer_id, product_id, date); -- Diminishing returns on 3rd column
Z-ORDER vs Partitioning
Low Cardinality High Cardinality
(< 1000 values) (> 1000 values)
───────────────── ─────────────────
Frequently filtered PARTITION BY Z-ORDER BY
Rarely filtered Neither Neither
Measuring Z-ORDER Effectiveness
# Before Z-ORDER: check files scanned
spark.sql("SELECT * FROM my_table WHERE customer_id = 'C12345'").explain()
# Look at: "number of files read" in FileScan metrics
# After Z-ORDER: same query should read far fewer files
# Typical improvement: 10-100× fewer files scanned
Liquid Clustering
Delta Lake's next-generation clustering (Databricks, Delta Lake 3.0+).
Replaces both partitioning and Z-ORDER with incremental, automatic clustering.
Key Differences from Partitioning + Z-ORDER
| Feature | Partitioning + Z-ORDER | Liquid Clustering |
|---|---|---|
| Clustering changes | Requires full rewrite | Incremental, change cluster keys anytime |
| Maintenance | Manual OPTIMIZE + ZORDER | Automatic with writes |
| Key changes | ALTER TABLE + full rewrite | ALTER TABLE, applied incrementally |
| Small file handling | Separate concern | Built-in compaction |
| Partition evolution | Very expensive | Seamless |
Usage
-- Create table with liquid clustering
CREATE TABLE my_table (
id BIGINT,
customer_id STRING,
date DATE,
amount DECIMAL(18, 2)
) USING DELTA
CLUSTER BY (customer_id, date);
-- Change cluster keys (no rewrite needed!)
ALTER TABLE my_table CLUSTER BY (date, region);
-- Remove clustering
ALTER TABLE my_table CLUSTER BY NONE;
-- Trigger clustering (runs automatically with optimizeWrite, but can be manual)
OPTIMIZE my_table;
When to Use Liquid Clustering
| Use Case | Recommendation |
|---|---|
| New tables on Databricks | Prefer liquid clustering over partitioning |
| Tables with evolving query patterns | Liquid clustering (can change keys) |
| Existing well-partitioned tables | Keep current partitioning unless issues |
| Tables with many small updates | Liquid clustering handles compaction |
| OSS Delta (< 3.0) | Not available — use partitioning + Z-ORDER |
Table Properties
Performance-Related Properties
ALTER TABLE my_table SET TBLPROPERTIES (
-- File sizing
'delta.targetFileSize' = '128m', -- Target file size for OPTIMIZE
'delta.tuneFileSizesForRewrites' = 'true', -- Auto-tune for MERGE/UPDATE
-- Auto-optimization
'delta.autoOptimize.optimizeWrite' = 'true', -- Optimize file sizes on write
'delta.autoOptimize.autoCompact' = 'true', -- Auto-compact small files
-- Data skipping
'delta.dataSkippingNumIndexedCols' = '32', -- Columns with min/max stats (default 32)
'delta.dataSkippingStatsColumns' = 'customer_id,date,amount', -- Explicit stats columns
-- Retention
'delta.deletedFileRetentionDuration' = 'interval 7 days', -- VACUUM retention
'delta.logRetentionDuration' = 'interval 30 days', -- Transaction log retention
-- Schema
'delta.columnMapping.mode' = 'name', -- Enable column rename/drop
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '5'
);
Viewing Current Properties
DESCRIBE DETAIL my_database.my_table;
SHOW TBLPROPERTIES my_database.my_table;
MERGE Optimization
MERGE (upsert) is the most expensive Delta operation. These techniques
minimize its cost.
Basic MERGE Pattern
MERGE INTO target AS t
USING source AS s
ON t.id = s.id
WHEN MATCHED AND s.updated_at > t.updated_at THEN
UPDATE SET *
WHEN NOT MATCHED THEN
INSERT *;
Optimization Techniques
1. Partition-Aligned MERGE
-- Add partition filter to reduce scan scope
MERGE INTO target AS t
USING source AS s
ON t.id = s.id AND t.date = s.date -- date is partition column
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
Without the partition column in the ON clause, MERGE scans ALL files in the
target table. With it, only matching partitions are scanned.
2. Reduce Source Size
# De-duplicate source before MERGE (keep latest per key)
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number, desc
window = Window.partitionBy("id").orderBy(desc("updated_at"))
source_deduped = source.withColumn("rn", row_number().over(window)) \
.filter("rn = 1").drop("rn")
3. Z-ORDER on MERGE Key
-- Z-ORDER target table on the join key used in MERGE
OPTIMIZE target_table ZORDER BY (id);
This ensures data skipping works effectively during the MERGE scan.
4. Low-Shuffle MERGE
# Enable low-shuffle merge (Databricks)
spark.conf.set("spark.databricks.delta.merge.enableLowShuffle", "true")
Low-shuffle merge rewrites only the files that contain matching rows,
rather than shuffling the entire target table.
5. Tune File Size for MERGE
-- Smaller files = less rewrite per MERGE operation
ALTER TABLE target_table SET TBLPROPERTIES ('delta.tuneFileSizesForRewrites' = 'true');
MERGE Performance Checklist
□ MERGE key includes partition column(s)
□ Source is de-duplicated on MERGE key
□ Target is Z-ORDERed on MERGE key
□ Low-shuffle merge is enabled
□ Source DataFrame is small enough to broadcast (if possible)
□ Only update columns that actually changed (avoid UPDATE SET *)
□ Run OPTIMIZE after large MERGE batches
Data Skipping
Delta Lake stores min/max statistics per file for the first N columns.
Queries with filters on these columns skip files whose range doesn't match.
How It Works
File 1: customer_id min="A001" max="A500" amount min=10 max=500
File 2: customer_id min="A501" max="B200" amount min=5 max=1200
File 3: customer_id min="B201" max="C100" amount min=50 max=800
Query: WHERE customer_id = 'B100' AND amount > 1000
→ Skip File 1 (B100 not in A001-A500)
→ Read File 2 (B100 in A501-B200, 1000 in 5-1200) ← Might match
→ Skip File 3 (B100 not in B201-C100)
Result: Only 1 of 3 files read (67% skipped)
Maximizing Data Skipping
| Technique | Impact | Notes |
|---|---|---|
| Z-ORDER on filter columns | High | Co-locates similar values → tighter min/max ranges |
| Partition by low-cardinality columns | High | Eliminates entire partitions |
| Put frequently filtered columns first | Medium | Default: stats on first 32 columns |
| Increase indexed columns | Low-Medium | delta.dataSkippingNumIndexedCols |
| Avoid wide tables with filtered columns beyond column 32 | Medium | Or set explicit stats columns |
Checking Data Skipping Effectiveness
# Enable detailed Spark UI metrics, then run your query
# In SQL tab, check FileScan metrics:
# "number of files read" vs total files in table
# "size of files read" vs total table size
# Programmatically:
df = spark.sql("SELECT * FROM my_table WHERE customer_id = 'C12345'")
df.explain() # Check PartitionFilters and PushedFilters
Partitioning Strategy
Decision Matrix
High Write Volume Low Write Volume
───────────────── ────────────────
Low Cardinality PARTITION BY PARTITION BY
(< 100 values) (e.g., date, region) (e.g., status)
Medium Cardinality Z-ORDER or Z-ORDER
(100-10000 values) Liquid Clustering
High Cardinality Z-ORDER or Z-ORDER
(> 10000 values) Liquid Clustering
Partitioning Best Practices
-- GOOD: Partition by date (365 values/year, each partition has many rows)
CREATE TABLE events (...) USING DELTA PARTITIONED BY (date);
-- GOOD: Partition by region (< 50 values)
CREATE TABLE sales (...) USING DELTA PARTITIONED BY (region);
-- BAD: Partition by user_id (millions of values → millions of tiny directories)
CREATE TABLE events (...) USING DELTA PARTITIONED BY (user_id); -- DON'T DO THIS
-- BAD: Partition by timestamp (near-unique → one file per partition)
CREATE TABLE events (...) USING DELTA PARTITIONED BY (event_timestamp); -- DON'T DO THIS
Partition Size Guidelines
| Metric | Target | Warning |
|---|---|---|
| Files per partition | 1-100 | > 1000 = too many small files |
| Data per partition | 1-100 GiB | < 1 MiB = over-partitioned |
| Total partitions | < 10,000 | > 100,000 = partition explosion |
| File size in partition | 64-256 MiB | < 1 MiB = small file problem |
Changing Partition Strategy
# You CANNOT change partitioning of an existing Delta table in place
# Instead: rewrite the table
# Read existing data
df = spark.read.table("my_database.old_table")
# Write with new partitioning
df.write \
.format("delta") \
.partitionBy("date") \
.mode("overwrite") \
.option("overwriteSchema", "true") \
.saveAsTable("my_database.new_table")
File Size Guidelines
Target File Sizes
| Workload Type | Target File Size | Why |
|---|---|---|
| General purpose | 128 MiB | Good balance of read speed and metadata |
| Frequently merged | 32-64 MiB | Smaller files = less rewrite per MERGE |
| Append-only analytical | 256 MiB | Fewer files = faster metadata |
| Streaming sink | 64-128 MiB | Depends on trigger interval and throughput |
Calculating Optimal File Count
... [content trimmed for length — full version in the complete kit]
This is 1 of 6 resources in the DataStack Pro toolkit. Get the complete [Spark Performance Masterclass] with all files, templates, and documentation for $59.
Or grab the entire DataStack Pro bundle (6 products) for $164 — save 30%.
Top comments (0)