DEV Community

Sankalp
Sankalp

Posted on

narrow and wide transformation

narrow transformation
Operation within single partition

  1. no data movement accros the cluster
  2. no shuffle
  3. fast and cheaper

e.g. select, map, filter, withColumn, union

no shuffle required because row based transformation

wide transformation
required data to be redistribute across the partition

  1. data shuffle
  2. create new stage
  3. expensive

e.g. join, groupBy, orderBy, distinct, reduceByKey

result is calculated output of data, so shuffle required

Top comments (0)