loading...

ELI5 - "Map-filter-reduce"

__namc profile image Namc ・4 min read

Many developers that I have lately interacted with associate map-reduce-filter with functional programming, and they are actually dreaded by the combination. It’s one of the reasons they avoid using functional languages, because god forbid they would have to learn how to use map, reduce and filter — 3 functions present in Ruby, Python, Javascript, PHP, Erlang, Scala and many others languages.

So what’s the big deal with map-reduce-filter anyway?

They are just higher-order functions used in functional programming. And they work on anything that is iterable (list, tuple, sets, dicts, and even strings).

Let’s consider a problem statement to be able to follow more clearly. (We’re using Python for the ease of purpose, but I’ll add a gist of how the same thing can be done in various other languages)

Problem statement : 
We need the sum of all even squares of integers in the range [0, 9]

Map (Transform)

This function takes a unary function and a list. Unary function is a function that takes one variable. The resultant for map function is a same-sized list of transformed values. The transformation of value is subject to the definition of unary function with the parameter. 

Use when : You want to transform all elements in a set to another set of values.

Parameters : unary transformation function & array to be transformed

In order to solve the problem statement, let’s say, our first step is to map the function of calculating squares over the range of numbers [0, 9]

>>> nums = map(lambda x : x*x, [i for i in range(10)])
>>> nums
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Lambdas are often to define map functions. You can use regular functions as well.

Filter 

This function takes a predicate, (a unary function which returns a bool) and the set of values; and gives a resultant set with only those set of values for which the predicate returns true. The resultant set may or may not be of the same length. 

Use when : You want to remove elements from a set based on a condition. 

Parameters : unary predicate (should only return a bool) & array to be filtered. 

In context to our problem above, let’s try to remove all the odd squares from the list. For that, let our predicate return true if the element is even.

 

>>> def is_even(x):
        return x % 2 == 0
>>> evens = filter(is_even, nums)
>>> evens
[0, 4, 16, 36, 64]

Comprehensions often make it easier to write filters. Those can be used in place of functions as well.

Reduce (Accumulate)

This function takes a binary function and a set of values, and it produces a single value. It does so by running the binary function over the set of values, and accumulating them into a single value. 

Use when : You want an accumulated or concatenated value using given set of values.

Parameters : binary function and set of values.

In context to our problem above, we shall add up all the even-valued squares.

>>> reduce(lambda x,y : x+y, evens)
120

If we look at the defintion of reduce in functools module,

def reduce(f,alist)
    if alist == []:
        return None

    answer = alist[0]
    for v in alist[1::
        answer = f(answer,v)
    return v

we see that reduce function starts by setting answer as the first element of array. Then it traverses the array from left to right, invoking a callback function on each element. The cumulative value is passed from callback to callback. After all elements are traversed, cumulative value is returned. 

Combining map-filter-reduce

We saw a lot of small functions being written above, that was just for understanding. The true power of higher-order-functions lie in combination. So our problem statement can be solved in one line of code 😅

>>> reduce(lambda a, b: a + b, map(lambda x: x * x, filter(lambda y: not y % 2, range(10))))
120

Don’t Panic! ! We can break it down for simplicity.

 

f_add = lambda a, b: a + b
f_square = lambda x: x * x
f_is_even = lambda y: not y % 2

Now the complex expression can be rewritten as

 

>>> reduce(f_add , map(f_square , filter(f_is_even, range[0, 10]))

As a sidenote, there is no restriction on using map , filter & reduce together. You can make your own combinations, and order your functions as per the need.

Other languages 

Keeping the problem statement same as above, check out how map-reduce-filter can be used in other programming languages, and is not a rocket science at all 🚀

Ruby

def main()

    nums = [*0..9]

    ans = nums.map{|x| x * x}              # map operation (x*x) over nums
              .select{|x| x % 2 == 0}      # select event elements
              .reduce{|a, x| a + x}        # reduce the list by `+` operation

    p nums                                 
    p ans                                  

end
main()

Clojure

(reduce (fn [a b] (+ a b)) 
        (map (fn [x] (* x x)) 
             (filter even? (range 10))))

Javascript

(I’m really bad at this 😕)

let numbers = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]

ans = numbers 
    .filter(function(num) {
        return (num%2 === 0);
    })
    .map(function(num) { 
        return num*num;
    })
    .reduce(function(total, num){
        return total += num;
    });

Try it out in other languages as well!

Discussion

pic
Editor guide
Collapse
hjfitz profile image
Harry

You could simplify your JavaScript solution!

const nums = [0,1,2,3,4,5,6,7,8,9];

const ans = nums
  .map(num => num*num)
  .filter(num => num % 2 === 0)
  .reduce((total, num) => total + num, 0);
Collapse
isaacleimgruber profile image
IsaacLeimgruber

I definitely prefer chained expressions c:

Collapse
kspeakman profile image
Kasey Speakman

Many devs are probably already familiar with map/filter/reduce, but they know it in a different form: SQL.

  • SELECT is map
  SELECT sales_volume > 10000 AS is_preferred_member ...
  • WHERE is filter
  SELECT ...
   WHERE order_status = 'open'
  • GROUP BY or aggregate functions like SUM are forms of reduce
  SELECT SUM(order_total) ...

Once I realized that, it seemed only natural that the same capabilities be available in my programming language for in-memory collections too. You could already do them in for loops of course, but adding these explicit operations to the language saves the code and mental weight from the overhead of the loop. And more importantly, it makes the commonly-performed operation obvious, whereas a for loop has to be read line-by-line to make sure of what it is doing.

Collapse
idanarye profile image
Idan Arye

When working with map and friends, you usually want to use iterators/generators/enumerators/streams/whatever-they-are-called-in-your-language-of-choice instead of lists. You are only going to iterate them once - it's wasteful to allocate memory for it when these combinators can usually read it one value at a time and "pass" the value they emit to the next combinator.

If we take your Python example:

map(lambda x : x*x, [i for i in range(10)])

This allocates a list to hold your 10 numbers - which is redundant, because map can just read them one at a time. This is better:

map(lambda x : x*x, (i for i in range(10)))

(i for i in range(10)) will not allocate a list for holding all 10 numbers. Instead, it will create an iterator that generates the next number on demand. With 10 numbers the difference is neglectable - but when you need to handle larger lists it can be quite serious, and the syntax for doing it the right way is easy.

This holds for other languages too - each with it's own syntax/api.

Collapse
tpoliaw profile image
Peter Holloway

Even

map(lambda x : x*x, (i for i in range(10)))

is excessive if you're using a comprehension anyway.

(i*i for i in range(10))
Collapse
idanarye profile image
Idan Arye

Your version doesn't do the job though...

Thread Thread
tpoliaw profile image
Peter Holloway

It doesn't? What's the difference?

Thread Thread
idanarye profile image
Idan Arye

The purpose of the original snippet was to demonstrate the map combinator. Your version doesn't do it.

Collapse
idanarye profile image
Idan Arye

It seems, though, like it doesn't improve things in all languages: stackoverflow.com/q/47653131/794380

Collapse
dmfay profile image
Dian Fay

Also important to note that both map and filter can be redundant if you're already using reduce. Your JavaScript example could be written like this:

const numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];

const ans = numbers.reduce(function(total, num) {
  if (num % 2 === 0) { // replaces filter
    return total += num * num; // replaces map
  }

  return total;
});

Obviously this doesn't demonstrate what map and filter do, which is less useful from that perspective! The big deal about it is that it only makes one pass. This doesn't matter much with ten elements but with bigger collections it can save a lot of time.

Collapse
jbbn profile image
João Bueno

Totally agree Dian :)

Just to register another way to write the same, I would code like this:

const ans = numbers.reduce((total, num) => (
  num % 2 === 0 // replaces filter
    ? total + num * num // replaces map
    : total
))

Regards!

Collapse
dimven profile image
Dimitar Venkov

for the sake of completeness, if we were interested in the total sum and are not concerned about making our code follow functional programming methods, we could rewrite the python one liner as:

sum(i*i for i in range(10) if not i % 2)

:)

Collapse
idanarye profile image
Idan Arye

You don't have to ignore language constructs to "follow functional programming methods". In Haskell, for example, you can use the combinators:

Prelude> sum (map (\x -> x * x) (filter (\x -> x `mod` 2 == 0) [0..9]))
120

Or you can use list comprehension(like in Python):

Prelude> sum [x * x | x <- [0..9], x `mod` 2 == 0]
120

The second approach is also functional programming.

Collapse
alephnaught2tog profile image
M. Shemayev

That single line explained more of map-filter-reduce than the articles I've been reading has! Not directed at the author, to be clear, it was a genuinely great article--I just have been puzzling over it because something has been tripping around in my brain and BAM, this comment did it. My bachelor's is in math and so the map-filter-reduce descriptions sounded so close and yet so far to what I kept almost thinking and that particular one-line tripped the right memory circuits!

Collapse
__namc profile image
Namc Author

Is there something I missed that I could have done better?

I tried to make it as simple as possible, and I'm sorry if it did not simplify things for you.

Thread Thread
alephnaught2tog profile image
M. Shemayev

No, not at all! I tried to make that clear, and failed. Your article was wonderfully written--it's just a topic that I've been chewing on for awhile and has eluded me, and that random one-liner reminded me of past math classes and was the final link, as it were. I really loved your piece!

Collapse
evanoman profile image
Evan Oman

Java 8+

-> IntStream.range(0,10).                         
>> map(xs -> xs*xs).                              
>> filter(xs -> xs % 2 == 0).                     
>> reduce(0, (xs_1, xs_2) -> xs_1 + xs_2)         
|  Expression value is: 120                       
|    assigned to temporary variable $7 of type int

Scala

scala> (0 to 9).          
     | map(xs => xs*xs).  
     | filter(_ % 2 == 0).
     | reduce(_+_)        
res0: Int = 120

Note: Both Java and Scala have sum methods on these objects which would be equivalent the reduce I used above.

Collapse
isaacleimgruber profile image
IsaacLeimgruber

Is scala popular now?

Collapse
sebastianpalma profile image
Sebastián Palma

Ruby version can be written as:

def main
  nums = [*0..9]
  p nums 
  p nums.map { |x| x * x }
        .select(&:even?)
        .reduce(:+)
end
main
Collapse
isaacleimgruber profile image
IsaacLeimgruber

I'm more dreaded by fold_left/ fold_right than map, flatmap, reduce, reduce_right, reduce_left, filter and groupby