Skip to content

DEV Community

Sankalp

Posted on Jan 15

narrow and wide transformation

narrow transformation
Operation within single partition

no data movement accros the cluster
no shuffle
fast and cheaper

e.g. select, map, filter, withColumn, union

no shuffle required because row based transformation

wide transformation
required data to be redistribute across the partition

data shuffle
create new stage
expensive

e.g. join, groupBy, orderBy, distinct, reduceByKey

result is calculated output of data, so shuffle required

Top comments (0)

Subscribe