spark perform very well for standard size large file but problem start occurring when it has to deal with many small files in the same time.
OPTIMZE, coalesce many small file in to a larger one to maintain the balance standard size.
it dynamically optimize the partition by generating file with default 128MB size (default size can be changed as per requirement)
Advantages:
- maintain the ability of V-Order and Z-Order
- coalesce small files in large balance file size(No matter how many tuple in file)
- auto compaction of delta table and files
- no impact on reading delta table before and after OPTIMIZE
Refer below MS Url for configuration
Top comments (0)