loading...

# Discussion on: Explain MapReduce like I'm five

tbodt

You take data, you map it, and you reduce it. AT SCALE!™

Ben Halpern

Can you go more into the AT SCALE part?

tbodt

Map at scale: you cut up the data into pieces, map each piece on a different computer, and then send it to the reducers.
Reduce at scale: you take the data from the map part, reduce each piece on a different computer, and then you're done.

Thread
Ben Halpern

What was keeping MapReduce from being a thing before it became a thing, like is this something people were aware of as an approach but computing/networking wasn't there yet, or was this an approach people hadn't thought to take beforehand? Or are there underlying implementation details that allow the simple part you're describing to be that simple?

Thread
rhymes

I think it became a thing when massive scalability was required.

The idea of dividing computation in smaller parts and then combining it to return a result is as old as modern programming I think. Sorting (quicksort for example) is an example of a divide and conquer algorithm. Generic tail recursion is another example.

I would say MapReduce is a specialization of a divide and conquer type of algorithm but I could be wrong about this.

The primitives "map" and "reduce" are part of functional computer languages since way before MapReduce was created so I think people were aware they could something do like that but could not do so at scale.

MapReduce IIRC was implemented at large by Google in their efforts to index the WWW. Here you can browse the slides describing the algorithm: research.google.com/archive/mapred...

Note that they used a distributed file system (GFS) to share data.

Thread
rhymes

sorry I forgot the 5 year old part :-D

Thread
Nicholas Shanks

Not so Genius.jpg : c ≠ a² + b² (right-angled triangle, left hand side of board)