Explain MapReduce like I'm five

ben profile image Ben Halpern Aug 23, 2017

markdown cheatsheet

For a five year old: you have to count how many lego pieces you have in your 5 boxes so you give each of your 4 friends one box and you count the last one.

After a few days, aided by their own parents, they come back to you with a piece of paper with the number of legos in the box (hopefully they didn't keep the box :D):

  • Alice has 120 lego pieces
  • Bob has 65 lego pieces
  • Carla has 99 lego pieces
  • Dany has 87 lego pieces
  • Ben has 77 lego pieces

You ask your parents, because you're five and you don't know how to sum all those numbers, and they tell you that the total of lego pieces you have is 448

Map: "Friend: count all the pieces in your box"
Reduce: "Sum all the counts of all friends"

MapReduce at scale :D

You take data, you map it, and you reduce it. AT SCALE!™

Can you go more into the AT SCALE part?

Map at scale: you cut up the data into pieces, map each piece on a different computer, and then send it to the reducers.
Reduce at scale: you take the data from the map part, reduce each piece on a different computer, and then you're done.

What was keeping MapReduce from being a thing before it became a thing, like is this something people were aware of as an approach but computing/networking wasn't there yet, or was this an approach people hadn't thought to take beforehand? Or are there underlying implementation details that allow the simple part you're describing to be that simple?

I think it became a thing when massive scalability was required.

The idea of dividing computation in smaller parts and then combining it to return a result is as old as modern programming I think. Sorting (quicksort for example) is an example of a divide and conquer algorithm. Generic tail recursion is another example.

I would say MapReduce is a specialization of a divide and conquer type of algorithm but I could be wrong about this.

The primitives "map" and "reduce" are part of functional computer languages since way before MapReduce was created so I think people were aware they could something do like that but could not do so at scale.

MapReduce IIRC was implemented at large by Google in their efforts to index the WWW. Here you can browse the slides describing the algorithm: research.google.com/archive/mapred...

Note that they used a distributed file system (GFS) to share data.

sorry I forgot the 5 year old part :-D

Not so Genius.jpg : c ≠ a² + b² (right-angled triangle, left hand side of board)

.map looks at everything you have in your hands, does the same thing to everything and gives you a new version of everything you have. Everything will have the same thing done to it! If you want everything to be twice as big as it is now, it can do that. Just remember that it gives you a whole new set of things back at the end.

.reduce looks at everything you have in your hands, does the same thing to everything and gives you something back. What it gives you back depends on what it does to each thing. Sometimes you want it to smush all your things together into one thing. Sometimes you want to pick out your favorites from the group. Just remember that it gives you back something in the end. It might be a little or a lot of things depending on what you asked it to do.

Santa needs some help with organizing his christmas presents. He has a stack of presents, each of which contains one or more toys. The toys can be dolls, model trains or Speak 'n Spells. For his bookkeeping, Santa needs to know how many of each kind of toy he's giving away. He knows he should have told the elves to keep count of the toys while packing the presents, but he totally forgot.

So now, all the presents need to be opened, toys inside the presents need to be counted, and tallied up so he knows the total number of each toy he's giving to kids. In order to make this as efficient as possible, he divides his elves into groups.

First, there's the Mapper elves. They'll each get a bunch of presents, and open them. For each type of toy in a present, the elf will write the type, and the number of toys of that type in the present, on a slip of paper.

Let's have a look at Franky, he's one of the mappers. He's just opened a box containing a speak 'n spell, and 2 dolls. He puts 2 slips of paper in a box:

Speak 'n spell: 1

Doll: 2

Then, there's a group of elves called the Shufflers. They shuffle pieces of paper around, into neat stacks. Each stack has all the paper slips corresponding to a certain toy type.

Then, there's the Reducer elves. They reduce the amount of paper that Santa has to look through to get the results he needs. Each elf takes a stack of paper, and counts up all the numbers. Then, they put a paper with the toy type they tallied up, and the total number, on Santa's desk.

You've been given homework that consists of solving 2 riddles and solving 3 math problems. Spongebob show is gonna be on TV in 2 hours. You know you can complete the 3 math problems in 2 hours but you aren't good at solving riddles and trying so would mean you wouldn't get to watch your favorite show.

Luckily your best bud Tod is nearby and he's good with riddles. So you ask him to solve the riddles. He manages to give you answers to the two questions easily and you manage to solve the math problems just in time thereby completing the homework.

Map is when you decide what you and your best bud ought to do.
Reducing is when you obtain the answers to your own tasks which contribute to your homework completion

Best explanation was already made with emojis by Steven Luscher: twitter.com/steveluscher/status/74...

Map is doing something to a collection of things, which changes each thing in the collection.

So if you had 50 TVs and you want to turn them all off at once, you could use 50 different remote controls one at a time, or you could have a single remote that works on all the TVs at once.

The single remote is like map.

Or if you had a list of things to buy and your friend saw the list and asked you to buy all the same things for them, you could write x2 next to each thing on the list or you could write "buy two of each" at the top.

Writing "buy two of each" is like map.

Like map, reduce is doing something to a collection of things, but instead of ending up with a changed collection, you end up with a single thing.

So if you have a 100 piece jigsaw puzzle, you pick up each piece look at the picture on the box and put the piece on the table in the right place, you go from having 100 separate pieces to having a single picture.

Or if you spill a plate of crumbs on the floor and you want to clean them up you take a vacuum cleaner and suck them up.

MapReduce is mapping a collection, then reducing it.

So, if you wanted to know your favourite flavour of ice cream, you would take a taste from each one (map) and write down the one you liked best (reduce).

Write a name for each of your fingers on separate lines in a list.

Measure the length of each finger, placing the number next to the finger's name (be sure to use the same units each time) in the list.

Add all of those numbers together to get your total finger-length.

Fingers .map'd to length; lengths .reduce'd to a total sum.