DEV Community

Cover image for Why you should stop manual Z-ordering today
Aniket Abhishek Soni
Aniket Abhishek Soni

Posted on

Why you should stop manual Z-ordering today

If you are still manually running OPTIMIZE table ZORDER BY (col) in your production pipelines, you are wasting engineering cycles on a legacy pattern that Delta Lake has already solved.

I see teams treating Z-ordering like a permanent architectural requirement rather than a stopgap for inefficient storage layouts. Most engineers I talk to are terrified to switch to Liquid Clustering because they don't trust the black box. They prefer the illusion of control that comes with writing custom Spark jobs to re-cluster data every night.

It’s time to stop babysitting your partitions.

The contenders

Z-ordering is the manual transmission of the data world. You designate specific columns, you run a command, and you hope your data distribution remains somewhat balanced. It’s effective, but it’s brittle. If your query patterns shift—say, from filtering by user_id to filtering by event_timestamp—your Z-order becomes a liability, actively slowing down queries that don't match your original index.

Liquid clustering, introduced in Databricks Runtime 13.3 LTS, is the automatic transmission. It uses multi-dimensional clustering to adapt your data layout based on the actual usage patterns detected by the Delta engine. You define the clustering columns once, and the table optimizes itself as data flows in. No more OPTIMIZE jobs. No more guessing which columns matter most.

Photo by Logan Voss on Unsplash
Photo by Logan Voss on Unsplash

The operational tax

When you run Z-ordering, you are paying a massive "Ops Tax." You need a dedicated cluster to run the OPTIMIZE job. You need to manage the frequency of that job. If you trigger it too often, you’re burning compute costs; trigger it too little, and your read performance degrades as the table accumulates small files and data skew.

I’ve seen teams with hundreds of tables spend upwards of 20% of their total DBU budget just on maintenance tasks like OPTIMIZE and VACUUM. In one healthcare project, we spent three days a month just tuning the Z-order columns for a 50TB fact table because the business changed their primary reporting dimensions.

Liquid clustering changes the math. Because it is incremental and integrated into the write path (via CLUSTER BY), the "maintenance" is baked into the ingestion. You pay a slight overhead during the write, but you stop paying the "maintenance tax" of running huge, batch-heavy OPTIMIZE jobs. If your write throughput is the bottleneck, this is a trade-off. If your compute costs are the bottleneck, this is a massive win.

The failure modes

Z-ordering is prone to "job explosion." If you have a large table and your OPTIMIZE job fails, you are often left with a partial state. If you aren't using Delta’s transaction logs effectively, or if your job fails mid-rewrite, you have to rollback or perform a full re-process. I’ve spent many early mornings debugging ConcurrentModificationException errors triggered by an OPTIMIZE job hitting a streaming ingestion window.

Liquid clustering is significantly more resilient. Because it handles clustering at the write layer, it is transactional by design. You don't have to worry about the "maintenance window" because there isn't one. The failure mode shifts from "my maintenance job crashed" to "my write job is slightly slower."

However, don't let the marketing fool you: Liquid Clustering isn't magic. It has a limit on the number of clustering columns—typically 4. If you have a table where you need to filter by 10 different dimensions, you’re going to hit a wall. In my experience, if you need more than 4 columns for clustering, your data model is the problem, not the engine.

Photo by bert b on Unsplash
Photo by bert b on Unsplash

Measuring the delta

How do you know if you should switch? Don't guess. Pull your DESCRIBE DETAIL stats and look at the numRecords and sizeInBytes. More importantly, dig into the delta.history and delta.operationMetrics.

If you are currently Z-ordering, look at the numRemovedFiles vs numAddedFiles in your OPTIMIZE metrics. If those numbers are consistently massive, you are churning your storage layer unnecessarily.

To measure the efficacy of Liquid Clustering, you need to use the EXPLAIN plan. Run a query before the switch and after the switch using EXPLAIN EXTENDED. Look for DataSkipping. If Liquid Clustering is working, you should see fewer PartitionFilters (because we’re moving away from hard partitions) and more DataSkipping stats.

Specifically, look at the min and max values in your file statistics. If Liquid Clustering is doing its job, the range of these values in individual files should be tighter. If you see wide, overlapping ranges for your high-cardinality columns, the engine isn't clustering effectively, and you might need to adjust your CLUSTER BY configuration.

What I'd pick, and why

If you are on Databricks Runtime 13.3 or higher, Liquid Clustering is the default choice. There is almost no scenario in a modern cloud-native stack where I would recommend starting a new project with manual Z-ordering.

The only caveat is cost: Liquid Clustering performs more work during the write phase. If you are doing extreme high-frequency, low-latency streaming (e.g., sub-second ingestion into a real-time dashboard), the added overhead of clustering on the write might impact your P99 latency. In those cases, you might want to ingest into a raw, non-clustered table and then use a background process to move it to a clustered table.

But for 95% of the financial services and healthcare data I’ve worked with—where we deal with massive batch loads and complex analytical queries—Liquid Clustering wins. It removes the human error element. Engineers are terrible at predicting query patterns three months in advance, and Z-ordering requires you to be a psychic.

Liquid clustering acknowledges that we don't know exactly how the data will be queried in the future. It’s an admission that the storage layout should be dynamic.

Stop managing your data like it's 2018. Move to Liquid Clustering, stop the manual OPTIMIZE jobs, and spend your time building features instead of tuning file sizes. You’ll sleep better, and your cloud bill will thank you.

Cover photo by Winston Chen on Unsplash.

Top comments (0)