DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Spark Performance Masterclass: Delta Lake Optimization Cheatsheet

Delta Lake Optimization Cheatsheet

Quick reference for every Delta Lake performance optimization technique.
Covers OPTIMIZE, VACUUM, table properties, MERGE patterns, data skipping,
partitioning strategies, liquid clustering, and file sizing.


Table of Contents

  1. OPTIMIZE
  2. VACUUM
  3. Z-ORDER
  4. Liquid Clustering
  5. Table Properties
  6. MERGE Optimization
  7. Data Skipping
  8. Partitioning Strategy
  9. File Size Guidelines
  10. Schema Evolution
  11. Time Travel and Restore
  12. Maintenance Schedule

OPTIMIZE

Compacts small files into larger ones for better read performance.

Basic Usage

-- Optimize entire table
OPTIMIZE my_database.my_table;

-- Optimize specific partitions only (much faster)
OPTIMIZE my_database.my_table
WHERE date >= '2025-01-01' AND date < '2025-02-01';
Enter fullscreen mode Exit fullscreen mode

What OPTIMIZE Does

Before OPTIMIZE:
  partition=2025-01-15/
    ├── part-00000.parquet   (2 MiB)
    ├── part-00001.parquet   (1 MiB)
    ├── part-00002.parquet   (500 KiB)
    ├── part-00003.parquet   (3 MiB)
    ├── ...
    └── part-00099.parquet   (800 KiB)
    Total: 100 files, 150 MiB

After OPTIMIZE:
  partition=2025-01-15/
    ├── part-00000.parquet   (128 MiB)
    └── part-00001.parquet   (22 MiB)
    Total: 2 files, 150 MiB
Enter fullscreen mode Exit fullscreen mode

OPTIMIZE with Z-ORDER

-- Optimize and co-locate data by column values
OPTIMIZE my_database.my_table
ZORDER BY (customer_id, product_id);

-- Z-ORDER on specific partitions
OPTIMIZE my_database.my_table
WHERE date = '2025-01-15'
ZORDER BY (customer_id);
Enter fullscreen mode Exit fullscreen mode

OPTIMIZE Metrics

OPTIMIZE returns these metrics:
  numFilesAdded:     2         (new compacted files)
  numFilesRemoved:   100       (old small files)
  filesAdded.min:    22 MiB
  filesAdded.max:    128 MiB
  filesAdded.avg:    75 MiB
  numBatches:        1
  totalConsideredFiles: 100
  totalFilesSkipped: 0
Enter fullscreen mode Exit fullscreen mode

Key Settings

Setting Default Recommended Notes
Target file size 1 GiB (OSS) / varies (Databricks) 128-256 MiB Set via table property
spark.databricks.delta.optimizeWrite.enabled false true Auto-size files at write
spark.databricks.delta.autoCompact.enabled false true Auto-compact after writes
-- Set target file size per table
ALTER TABLE my_table SET TBLPROPERTIES ('delta.targetFileSize' = '128m');
Enter fullscreen mode Exit fullscreen mode

VACUUM

Removes files no longer referenced by the Delta log. Essential for storage cost control.

Basic Usage

-- Remove files older than default retention (7 days)
VACUUM my_database.my_table;

-- Remove files older than 30 days
VACUUM my_database.my_table RETAIN 720 HOURS;

-- Dry run — see what would be deleted without deleting
VACUUM my_database.my_table DRY RUN;
Enter fullscreen mode Exit fullscreen mode

Safety Rules

Rule Details
Minimum retention 7 days by default (controlled by delta.deletedFileRetentionDuration)
Never vacuum below retention Active readers/writers could fail if their referenced files are deleted
VACUUM is idempotent Safe to run multiple times
VACUUM only deletes unreferenced files Never deletes current table data

Override Minimum Retention (Dangerous!)

-- Only if you're CERTAIN no jobs are reading old versions
SET spark.databricks.delta.retentionDurationCheck.enabled = false;
VACUUM my_table RETAIN 0 HOURS;  -- Deletes everything not in current version
Enter fullscreen mode Exit fullscreen mode

Storage Savings Estimate

# Check table size before/after VACUUM
from delta.tables import DeltaTable

dt = DeltaTable.forName(spark, "my_database.my_table")
detail = dt.detail().collect()[0]
print(f"Files: {detail['numFiles']}")
print(f"Size: {detail['sizeInBytes'] / (1024**3):.2f} GiB")

# Run VACUUM DRY RUN to see what would be cleaned
spark.sql("VACUUM my_database.my_table DRY RUN")
Enter fullscreen mode Exit fullscreen mode

Z-ORDER

Multi-dimensional clustering that co-locates related data in the same files
for faster data skipping.

When to Z-ORDER

Scenario Z-ORDER? Why
Column frequently in WHERE clauses Yes Dramatically reduces files scanned
Column used in JOIN keys Yes Improves join-side scan performance
High-cardinality column (e.g., user_id) Yes Great data skipping potential
Low-cardinality column (e.g., status) No Partitioning is better
Column never filtered on No Z-ORDER won't help queries that don't filter

Best Practices

-- Z-ORDER on 1-3 columns (more columns = less effective per column)
OPTIMIZE my_table ZORDER BY (customer_id);                    -- Best for single-column filters
OPTIMIZE my_table ZORDER BY (customer_id, product_id);        -- Good for two-column filters
OPTIMIZE my_table ZORDER BY (customer_id, product_id, date);  -- Diminishing returns on 3rd column
Enter fullscreen mode Exit fullscreen mode

Z-ORDER vs Partitioning

                    Low Cardinality        High Cardinality
                    (< 1000 values)        (> 1000 values)
                    ─────────────────      ─────────────────
Frequently filtered  PARTITION BY           Z-ORDER BY
Rarely filtered      Neither                Neither
Enter fullscreen mode Exit fullscreen mode

Measuring Z-ORDER Effectiveness

# Before Z-ORDER: check files scanned
spark.sql("SELECT * FROM my_table WHERE customer_id = 'C12345'").explain()
# Look at: "number of files read" in FileScan metrics

# After Z-ORDER: same query should read far fewer files
# Typical improvement: 10-100× fewer files scanned
Enter fullscreen mode Exit fullscreen mode

Liquid Clustering

Delta Lake's next-generation clustering (Databricks, Delta Lake 3.0+).
Replaces both partitioning and Z-ORDER with incremental, automatic clustering.

Key Differences from Partitioning + Z-ORDER

Feature Partitioning + Z-ORDER Liquid Clustering
Clustering changes Requires full rewrite Incremental, change cluster keys anytime
Maintenance Manual OPTIMIZE + ZORDER Automatic with writes
Key changes ALTER TABLE + full rewrite ALTER TABLE, applied incrementally
Small file handling Separate concern Built-in compaction
Partition evolution Very expensive Seamless

Usage

-- Create table with liquid clustering
CREATE TABLE my_table (
    id BIGINT,
    customer_id STRING,
    date DATE,
    amount DECIMAL(18, 2)
) USING DELTA
CLUSTER BY (customer_id, date);

-- Change cluster keys (no rewrite needed!)
ALTER TABLE my_table CLUSTER BY (date, region);

-- Remove clustering
ALTER TABLE my_table CLUSTER BY NONE;

-- Trigger clustering (runs automatically with optimizeWrite, but can be manual)
OPTIMIZE my_table;
Enter fullscreen mode Exit fullscreen mode

When to Use Liquid Clustering

Use Case Recommendation
New tables on Databricks Prefer liquid clustering over partitioning
Tables with evolving query patterns Liquid clustering (can change keys)
Existing well-partitioned tables Keep current partitioning unless issues
Tables with many small updates Liquid clustering handles compaction
OSS Delta (< 3.0) Not available — use partitioning + Z-ORDER

Table Properties

Performance-Related Properties

ALTER TABLE my_table SET TBLPROPERTIES (
    -- File sizing
    'delta.targetFileSize' = '128m',                    -- Target file size for OPTIMIZE
    'delta.tuneFileSizesForRewrites' = 'true',          -- Auto-tune for MERGE/UPDATE

    -- Auto-optimization
    'delta.autoOptimize.optimizeWrite' = 'true',        -- Optimize file sizes on write
    'delta.autoOptimize.autoCompact' = 'true',          -- Auto-compact small files

    -- Data skipping
    'delta.dataSkippingNumIndexedCols' = '32',          -- Columns with min/max stats (default 32)
    'delta.dataSkippingStatsColumns' = 'customer_id,date,amount',  -- Explicit stats columns

    -- Retention
    'delta.deletedFileRetentionDuration' = 'interval 7 days',  -- VACUUM retention
    'delta.logRetentionDuration' = 'interval 30 days',          -- Transaction log retention

    -- Schema
    'delta.columnMapping.mode' = 'name',                -- Enable column rename/drop
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5'
);
Enter fullscreen mode Exit fullscreen mode

Viewing Current Properties

DESCRIBE DETAIL my_database.my_table;
SHOW TBLPROPERTIES my_database.my_table;
Enter fullscreen mode Exit fullscreen mode

MERGE Optimization

MERGE (upsert) is the most expensive Delta operation. These techniques
minimize its cost.

Basic MERGE Pattern

MERGE INTO target AS t
USING source AS s
ON t.id = s.id
WHEN MATCHED AND s.updated_at > t.updated_at THEN
    UPDATE SET *
WHEN NOT MATCHED THEN
    INSERT *;
Enter fullscreen mode Exit fullscreen mode

Optimization Techniques

1. Partition-Aligned MERGE

-- Add partition filter to reduce scan scope
MERGE INTO target AS t
USING source AS s
ON t.id = s.id AND t.date = s.date  -- date is partition column
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
Enter fullscreen mode Exit fullscreen mode

Without the partition column in the ON clause, MERGE scans ALL files in the
target table. With it, only matching partitions are scanned.

2. Reduce Source Size

# De-duplicate source before MERGE (keep latest per key)
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number, desc

window = Window.partitionBy("id").orderBy(desc("updated_at"))
source_deduped = source.withColumn("rn", row_number().over(window)) \
    .filter("rn = 1").drop("rn")
Enter fullscreen mode Exit fullscreen mode

3. Z-ORDER on MERGE Key

-- Z-ORDER target table on the join key used in MERGE
OPTIMIZE target_table ZORDER BY (id);
Enter fullscreen mode Exit fullscreen mode

This ensures data skipping works effectively during the MERGE scan.

4. Low-Shuffle MERGE

# Enable low-shuffle merge (Databricks)
spark.conf.set("spark.databricks.delta.merge.enableLowShuffle", "true")
Enter fullscreen mode Exit fullscreen mode

Low-shuffle merge rewrites only the files that contain matching rows,
rather than shuffling the entire target table.

5. Tune File Size for MERGE

-- Smaller files = less rewrite per MERGE operation
ALTER TABLE target_table SET TBLPROPERTIES ('delta.tuneFileSizesForRewrites' = 'true');
Enter fullscreen mode Exit fullscreen mode

MERGE Performance Checklist

□ MERGE key includes partition column(s)
□ Source is de-duplicated on MERGE key
□ Target is Z-ORDERed on MERGE key
□ Low-shuffle merge is enabled
□ Source DataFrame is small enough to broadcast (if possible)
□ Only update columns that actually changed (avoid UPDATE SET *)
□ Run OPTIMIZE after large MERGE batches
Enter fullscreen mode Exit fullscreen mode

Data Skipping

Delta Lake stores min/max statistics per file for the first N columns.
Queries with filters on these columns skip files whose range doesn't match.

How It Works

File 1: customer_id min="A001" max="A500"  amount min=10 max=500
File 2: customer_id min="A501" max="B200"  amount min=5 max=1200
File 3: customer_id min="B201" max="C100"  amount min=50 max=800

Query: WHERE customer_id = 'B100' AND amount > 1000
  → Skip File 1 (B100 not in A001-A500)
  → Read File 2 (B100 in A501-B200, 1000 in 5-1200)  ← Might match
  → Skip File 3 (B100 not in B201-C100)

Result: Only 1 of 3 files read (67% skipped)
Enter fullscreen mode Exit fullscreen mode

Maximizing Data Skipping

Technique Impact Notes
Z-ORDER on filter columns High Co-locates similar values → tighter min/max ranges
Partition by low-cardinality columns High Eliminates entire partitions
Put frequently filtered columns first Medium Default: stats on first 32 columns
Increase indexed columns Low-Medium delta.dataSkippingNumIndexedCols
Avoid wide tables with filtered columns beyond column 32 Medium Or set explicit stats columns

Checking Data Skipping Effectiveness

# Enable detailed Spark UI metrics, then run your query
# In SQL tab, check FileScan metrics:
#   "number of files read" vs total files in table
#   "size of files read" vs total table size

# Programmatically:
df = spark.sql("SELECT * FROM my_table WHERE customer_id = 'C12345'")
df.explain()  # Check PartitionFilters and PushedFilters
Enter fullscreen mode Exit fullscreen mode

Partitioning Strategy

Decision Matrix

                        High Write Volume    Low Write Volume
                        ─────────────────    ────────────────
Low Cardinality         PARTITION BY         PARTITION BY
(< 100 values)          (e.g., date, region) (e.g., status)

Medium Cardinality      Z-ORDER or           Z-ORDER
(100-10000 values)      Liquid Clustering

High Cardinality        Z-ORDER or           Z-ORDER
(> 10000 values)        Liquid Clustering
Enter fullscreen mode Exit fullscreen mode

Partitioning Best Practices

-- GOOD: Partition by date (365 values/year, each partition has many rows)
CREATE TABLE events (...) USING DELTA PARTITIONED BY (date);

-- GOOD: Partition by region (< 50 values)
CREATE TABLE sales (...) USING DELTA PARTITIONED BY (region);

-- BAD: Partition by user_id (millions of values → millions of tiny directories)
CREATE TABLE events (...) USING DELTA PARTITIONED BY (user_id);  -- DON'T DO THIS

-- BAD: Partition by timestamp (near-unique → one file per partition)
CREATE TABLE events (...) USING DELTA PARTITIONED BY (event_timestamp);  -- DON'T DO THIS
Enter fullscreen mode Exit fullscreen mode

Partition Size Guidelines

Metric Target Warning
Files per partition 1-100 > 1000 = too many small files
Data per partition 1-100 GiB < 1 MiB = over-partitioned
Total partitions < 10,000 > 100,000 = partition explosion
File size in partition 64-256 MiB < 1 MiB = small file problem

Changing Partition Strategy

# You CANNOT change partitioning of an existing Delta table in place
# Instead: rewrite the table

# Read existing data
df = spark.read.table("my_database.old_table")

# Write with new partitioning
df.write \
    .format("delta") \
    .partitionBy("date") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable("my_database.new_table")
Enter fullscreen mode Exit fullscreen mode

File Size Guidelines

Target File Sizes

Workload Type Target File Size Why
General purpose 128 MiB Good balance of read speed and metadata
Frequently merged 32-64 MiB Smaller files = less rewrite per MERGE
Append-only analytical 256 MiB Fewer files = faster metadata
Streaming sink 64-128 MiB Depends on trigger interval and throughput

Calculating Optimal File Count

... [content trimmed for length — full version in the complete kit]


This is 1 of 6 resources in the DataStack Pro toolkit. Get the complete [Spark Performance Masterclass] with all files, templates, and documentation for $59.

Get the Full Kit →

Or grab the entire DataStack Pro bundle (6 products) for $164 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)