Kasey Speakman

Posted on Dec 6, 2018

Understanding the Fold Operation

#functional #fsharp

I recently ran across a video by Don Syme called F# Code I Love. It details some nice (and naughty 🎄) usage patterns in F#. I was very surprised to see fold considered a not-so-nice usage pattern. But I completely understand why.

🔎 What is a fold?
Fold is an operation on a collection of data. Lists, sets, hashmaps, etc all are "foldable". Fold takes care of looping through each item in the collection, and lets you define the behavior inside the loop. And in fact, most other collection operations (filter, map, reduce, etc) can be written as fold statements. So fold is a "low-level" way to process items in a list with your own custom behavior. It is for those cases when the built-in collection functions do not have exactly the behavior you need. In Javascript, Array.reduce is the same as fold when called with an initial value. In C#, the LINQ method Aggregate is the same as fold.

When I first started F#, I found folds hard to understand. So I would typically avoid them and instead write a recursive function to loop through a collection with my custom behavior. (I mean it's pretty bad when recursion is the easy way, right?) But as I refactored the recursive function to isolate the custom behavior from the looping, I noticed that the recursive looping part could be easily replaced with fold.

As time has gone on, I figured out a way to approach folds that makes them easier to start with instead of being a refactored optimization at the end. I will demonstrate with a coding challenge.

🏎️ Challenge
Given a list of integers, return the count of positive numbers and the sum of negative numbers.
When provided the list [1; 2; 3; 4; 5; 6; 7; 8; 9; 10; -11; -12; -13; -14; -15], the solution should return a count of 10 and a sum of -65.

#1 Define the kind of answer are you expecting with a starting value.

In this case, I want two values as a result.

let (count, sum) = ...

Both of them are integers. And for both pieces of the answer, we start with zero and add to it.

let (count, sum) = (0, 0)

If you are unsure what the starting value should be for your particular problem, you can just pick one for now. You can always experiment with it later.

Instead of destructuring the values back into count and sum, I would probably just use a single name for the pair of values.

let initial = (0, 0)

#2 Update the answer based on a single item.

Now that we know what our answer looks like (count and sum), we can define a way to update the answer based on a single input value. I first sketch the update function and its arguments.

let update (count, sum) value =
    ...

Then I fill in the steps to return the updated count and sum based on the value.

let update (count, sum) value =
    if value > 0 then
        (count + 1, sum)
    elif value = 0 then
        (count, sum)
    else
        (count, sum + value)

This reads very well, almost the same as how we explain it to another human. "Update count and sum when given a value. If the value is positive, increase the count but keep the same sum. If the value is zero, keep the same count and sum. Otherwise (value is negative) keep the same count, but add the value to the sum."

Note: We could save a couple of lines of code by removing the elif clause and it would work exactly the same. (Adding zero to the sum is equivalent to returning it as-is.)

#3 Now the fold is the easy part.

So the only thing left to do is to process the list itself. And since the input list is provided, we have already defined all the necessary pieces for fold. The fold step always looks similar to this.

let runChallenge inputList =
    inputList |> List.fold update initial

Here it is with everything put together.

let initial = (0, 0)

let update (count, sum) value =
    if value > 0 then
        (count + 1, sum)
    elif value = 0 then
        (count, sum)
    else
        (count, sum + value)

let runChallenge inputList =
    inputList |> List.fold update initial

And here is how you can run it.

let input = [1;2;3;4;5;6;7;8;9;10;-11;-12;-13;-14;-15]
let (count, sum) = runChallenge input
printfn "Count: %i, Sum: %i" count sum
// Outputs
// Count: 10, Sum: -65

Conclusion

Fold is a really powerful tool to process collections of data. It can allow you to focus on your behavior logic without having to write the looping code. But it can be confusing to use -- you really have to approach it with the right mindset to make the most out of it. The way I have presented here helps me use fold in a productive way. Hopefully it will help someone else too.

/∞

Top comments (5)

Jan van Brügge • Dec 7 '18

There is no need to name inputList.

f x = x |> g
    where g = List.fold update initial
-- same as
f x = g x
-- same as 
f = g
-- with your names:
runChallenge = List.fold update initial

Kasey Speakman • Dec 7 '18 • Edited

Thanks for your comment.

I am aware of the forms you demonstrated. Rest assured that I chose the naming and structure very intentionally. I avoid point-free notation in most cases. Hidden parameters do not tend to help readability. Spelling out the parameter isn't as concise, but it is far more clear. Especially when the reader might not be as familiar with FP.

Jan van Brügge • Dec 7 '18

It depends on your view on it. If you name the parameter, your view is "I will fold update over the list". Without the parameter your view is "runChallenge is a fold with update and some initial value". In my opinion this focuses on the algorithm and not the data making in easier to understand. A greater percentage of the code is thr actual algorithm as opposed to unnecessary stuff

Yawar Amin • Jan 4 '19

Thanks for this very clear explanation. Do you remember the old days of Elm when it was FRP and there was a (I think) Signal.foldp function that was explained as 'returning a new state by folding over the past state and an event'? The theory behind folding is so generally applicable, it's quite fascinating.

Kasey Speakman • Jan 4 '19

I was hoping someone would notice the similarity with MVU. :) In fact, I always found folds confusing until I realized that an Elm application (even today) is just a fold. (And the view function is a transformation run on the model after each fold step.) And so, adopting the same strategy used by MVU enables a straight-forward definition of fold.

Thanks for the comment!