Discussion on: Explain MapReduce like I'm five

View post

tbodt • Aug 23 '17

You take data, you map it, and you reduce it. AT SCALE!™

Ben Halpern • Aug 23 '17

Can you go more into the AT SCALE part?

tbodt • Aug 23 '17

Map at scale: you cut up the data into pieces, map each piece on a different computer, and then send it to the reducers.
Reduce at scale: you take the data from the map part, reduce each piece on a different computer, and then you're done.

Ben Halpern • Aug 23 '17

What was keeping MapReduce from being a thing before it became a thing, like is this something people were aware of as an approach but computing/networking wasn't there yet, or was this an approach people hadn't thought to take beforehand? Or are there underlying implementation details that allow the simple part you're describing to be that simple?

rhymes • Aug 23 '17 • Edited

I think it became a thing when massive scalability was required.

The idea of dividing computation in smaller parts and then combining it to return a result is as old as modern programming I think. Sorting (quicksort for example) is an example of a divide and conquer algorithm. Generic tail recursion is another example.

I would say MapReduce is a specialization of a divide and conquer type of algorithm but I could be wrong about this.

The primitives "map" and "reduce" are part of functional computer languages since way before MapReduce was created so I think people were aware they could something do like that but could not do so at scale.

MapReduce IIRC was implemented at large by Google in their efforts to index the WWW. Here you can browse the slides describing the algorithm: research.google.com/archive/mapred...

Note that they used a distributed file system (GFS) to share data.