Z-Ordering is a technique to co-locate the related information in same set of files.
This feature improve the data reading dramatically because its ease to read relational data from same set of files.
E.g.
BEFORE Z-Ordering -
Data files are not organized by customer_id or order_date
Spark has no idea where the relevant rows live
So it:
- Scans ~2,500 out of 2,700 files
- Reads a huge amount of data
- Causes high disk I/O
- Takes a long time
Result:
- Large scan
- Slow query
- Wasted resources
AFTER Z-Ordering -
What happens now:
- Spark knows which files are likely to contain customer_id = 101
- It skips irrelevant files (data skipping)
- It reads only ~120 files instead of ~2,500
Result:
- Small scan
- Low I/O
- Much faster query
refer below MS url for configuration

Top comments (0)