DEV Community: Noel Welsh

A Case Study in Incrementally Improving Beginner Code

Noel Welsh — Mon, 11 Oct 2021 09:48:36 +0000

In this article I'm going to go through the process of improving some code. I'm mentoring a new developer who is applying for their first job. They were asked to complete some tasks on Codility as the first step of the interview process. To get used to the platform they did the first example task, and I advised them on some changes. I'm writing up here the progression from their code to (what I think is) better code. (Since this is the example task, not a task used to assess applicants, I think this is ok to publically post.)

The Problem

First, the Codility problem:

Write a function:
object Solution {
  def solution(a: Array[Int]): Int
}
that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A.

For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5. Given A = [1, 2, 3], the function should return 4. Given A = [−1, −3], the function should return 1.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [1..100,000];

each element of array A is an integer within the range [−1,000,000..1,000,000].

Setup

I created the interface below so that I could run all the variations through the same test harness. It's not part of the specification from Codility or the student's original code.

trait Solution {
  def solution(a: Array[Int]): Int
}

Initial Solution

Here's the student's initial solution:

object Solution1 extends Solution {

  def solution(a: Array[Int]): Int = {

    def tolis(b: List[Int]): Int = b match {
      case x :: Nil => x + 1
      case x :: hs => if ((hs.head - x) > 1) x + 1 else tolis(hs)

    }
    var b: List[Int] = a.toList.filter(_ > 0).sorted
    //b.sorted
    if (b.isEmpty) 1
    else if (b.head != 1) 1
    else tolis(b)


  }
}

Code Cleanup

There are several issues with the initial solution. Let's start with the easiest ones:

confusing naming (what does tolis mean?)
var is not necessary (it could be a val)
messy formatting

These are fairly small points but they are easy for an interviewer to complain about. A lot of jobs, particularly entry level jobs, receive many applicants and interviewers are often looking for reasons to reject candidates. We don't want to give them an easy reason to reject us!

Here's the code after a quick clean up.

object Solution2 extends Solution {

  def solution(a: Array[Int]): Int = {
    def findLowest(numbers: List[Int]): Int = 
      numbers match {
        case x :: Nil => x + 1
        case x :: xs => if ((xs.head - x) > 1) x + 1 else findLowest(xs)
      }

    val clean: List[Int] = a.toList.filter(_ > 0).sorted

    if (clean.isEmpty) 1
    else if (clean.head != 1) 1
    else findLowest(clean)
  }
}

Testing the Solution

Before we move on to deeper issues, I want to create a test suite so we can be sure we don't break anything during refactoring.
To test this function we could create a few hand-crafted cases, the programmer equivalent of banging together sticks to make fire, or we could generate test cases from a specification. A fairly simple way to generate test cases is:

create a many negative number as we like
create a sequence of positive numbers, and remove one of the numbers
join the two sets of numbers and shuffle

With this construction we know the result should be the number we removed.

Once we've setup the test suite we can proceed. I used MUnit and its ScalaCheck integration to do the above.

Partial and Total Functions

Let's now move on to deeper issues. I don't like the implementation of findLowest. There is some input for which it will crash---namely the empty list. In FP jargon we'd say it is a partial function, not a total function. The emtpy list case checked before it's called, but it easy for future modifications to break this. We could use, say, Cats' NonEmptyList type to express that this function only works with non-empty lists, but it's not really appropriate to add a dependency in this context. We can, instead, rewrite findLowest to be a total function.

We can make findLowest a total function by adding an extra parameter, which is the current guess for the lowest number. With this we can write findLowest as a standard structural recursion and the compiler will stop complaining about our incomplete match. Here's the code (written with Scala 3 syntax).

object Solution3 extends Solution {
  def solution(a: Array[Int]): Int = {
    def findLowest(result: Int, numbers: List[Int]): Int =
      numbers match {
        case Nil => result
        case x :: xs =>
          if result == x then findLowest(result + 1, xs) else result
      }

    val clean: List[Int] = a.toList.filter(_ > 0).sorted

    findLowest(1, clean)
  }
}

Performance

The requirements state they want an "efficient algorithm". I don't think they really mean that, but optimizing code can be fun and in this case there are some easy wins to be had. I'm going to look at two types of optimization:

data representation, where we change how we store data to be more efficient; and
algorithmic optimization, where we change the structure of the code to do less work.

The code mostly uses the List datatype, which is a singly linked list. This is a poor choice for performance as it involves a lot of pointer chasing and random memory access is slow on modern computers. List is appropriate when want to reason about shared data, and hence use immutable data, but in this code the data is never shared outside the method so that is not a concern.

From the algorithmic perspective we are doing a lot of work:

there is an O(n) traversal of the input to convert to a List;
the filtering operation is at least O(n) and may be more depending on how the filtered result is constructed;
sorting is O(n log n); and
the final traversal to find the lowest missing number is O(n).

My first change is mostly concerned with data representation. By working purely with arrays we use a more cache-friendly data structure, and we can also sort in-place which avoids some allocation. Here's the code.

import java.util.Arrays

object Solution4 extends Solution {
  def solution(a: Array[Int]): Int = {
    def findLowest(result: Int, idx: Int, numbers: Array[Int]): Int = {
      if idx == numbers.length then result
      else if result == numbers(idx) then
        findLowest(result + 1, idx + 1, numbers)
      else result
    }

    val clean: Array[Int] = a.filter(_ > 0)

    Arrays.sort(clean)
    findLowest(1, 0, clean)
  }
}

The next step is mostly algorithmic optimization. We don't need to sort the array, or even filter it. All we need to do is construct a data structure that tells us what numbers are present. This requires just one O(n) traversal through the input. We only need a single bit to represent presence or absence for each positive integer. The specification tells us the input will not be higher than 1,000,000. Hence we can use a bit-set consuming no more than about 125kB, which should easily fit into the L2 cache and might even squeeze into L1 cache. Once we have constructed the bit set we need a single O(n) traversal to find the lowest missing number. Here's the code. Note I used java.util.BitSet instead of scala.collection.mutable.BitSet because it was a bit clearer on a quick glance which were the methods I wanted.

import java.util.Arrays
import java.util.BitSet

object Solution5 extends Solution {
  def solution(a: Array[Int]): Int = {
    def populateBitSet(
        bitSet: BitSet,
        idx: Int,
        numbers: Array[Int]
    ): BitSet = {
      if idx == numbers.length then bitSet
      else {
        val elt = numbers(idx)
        if elt < 1 then populateBitSet(bitSet, idx + 1, numbers)
        else {
          bitSet.set(elt)
          populateBitSet(bitSet, idx + 1, numbers)
        }
      }
    }

    val bitSet = populateBitSet(BitSet(1000000), 0, a)
    val result = bitSet.nextClearBit(1)
    result
  }
}

I setup a quick JMH benchmark to compare implementations. I was only looking for big improvements, so I'm only reporting results below for the first solution, and Solution4 and Solution5 above. As you can see the combination of data representation and algorithmic improvements yield a speed up a bit over ten times compared to the original. That's pretty good for some fairly simple changes!

[info] CodilityBenchmark.benchSolution1  thrpt    3  741.060 ± 32.291  ops/s

[info] CodilityBenchmark.benchSolution4  thrpt    3  1956.945 ± 62.053  ops/s

[info] CodilityBenchmark.benchSolution5  thrpt    3  8406.225 ± 751.966  ops/s

Conclusions

The process of improving the code was reasonably straight forward. The most important improvements, in my opinion, are the ones that were done first. As an interviewer I want to see code that pays attention to clarity, as I think that's one of the most important factors in successfully growing a large code base. The optimizations I performed require some level of knowledge of data structures, computer architecture, and algorithmic complexity. All these things should be covered in a computer science course but those who haven't studied CS can find equivalents online. My optimizations don't require a deep level of knowledge of, for example x86-64 architecture. All these optimizations can be reasoned about with a fairly coarse machine model.

All the code is on Github if you want go further, or just see how I setup the tests and benchmarks. I hope it is useful!

Photo by Elena Mozhvilo on Unsplash

Techniques for Understanding Code

Noel Welsh — Tue, 07 Jul 2020 13:15:47 +0000

Building an understanding of code is one of the main tasks in software development. Whenever we want to answer a question about code---what is this doing? why doesn't it work? how can we make it faster?---this is what we're doing. I have found it valuable to consciously surface the strategy I use to answer these questions, and have categorized my approaches into three groups:

reasoning about code;
inspecting running code; and
referring to an authoritative source.

In this post I will discuss these different ways of understanding code and their benefits and drawbacks.

Three Ways to Understand Code

Let's start by describing the three methods for understanding code.

Reasoning

Reasoning means applying logic to some model of a programming language's semantics. This sounds very formal, and that can be the case using say, an operational or denotational semantics. However most reasoning is informal. I'm sure the majority of programmers can reason about code (quick check: what does the program 1 + 1 evaluate to?) but would be hard pressed to specify the model and inference rules they use.

Reasoning takes place at many levels. For example, when I reason about code I might work at a low level that is easily reducible to a formal semantics. More often I work at a higher level, where I'm thinking about, say, transformations on algebraic data types instead of expressions and values.

Regardless of how it is done, reasoning requires a model and rules for inference.

Observation

Another way we can understand code is by observing its behavior as it runs. There are many ways to do this. Just running the program and looking at its output is probably most common, but other methods include using a debugger, inspecting logs, or runnning tests.

Appeal to Authority

A final way we can understand code is by turning to a trusted source. For most programmers this means the searching the Internet, perhaps using a site like Stack Overflow or a specialist forum or mailing list. It can also mean consulting a colleague or even, as a last resort, reading the fine manual.

The Advantages of Reasoning

Of the three methods my preference is to use reasoning. Reasoning can make statements that hold for all possible program runs. Let's use the following code example to discuss this further. First the code is shown in Typescript.

function shout(phrase: string): string {
    return `${phrase.toUpperCase()}!`
}

Now here it is using Scala.

def shout(phrase: String): String =
  s"${phrase.toUpperCase}!"

Finally an example of using it.

shout("Hello, reader")
// Returns HELLO, READER!

Here are some of the properties of this code:

the string returned by shout will always end with an exclamation mark!
the code cannot fail unless passed null (or undefined in the case of Typescript.)
if the input is all in upper case the output will be one character longer.

There are other properties we could define but hopefully the above are sufficient to give you an idea of what I mean. We can tell these properties are true without running the code and they hold for all possible execution---an infinite number of cases.

Neither observation nor appealing to authority can prove statements about programs. Observation can only tell us about the properties of the program when run with the particular input it is given. If we see that the output of shout("Hello, reader") is "HELLO, READER!" then we might guess as to what shout is doing. More observations can increase our confidence. However we cannot ever be certain that we are correct by observation alone. Appealing to authority---perhaps by reading the documentation for the shout function---may describe what the function does but that description could be incorrect. The problem with trust is that it, well, relies on trust. Particularly when trusting randos on the Internet we must be cautious.

I find reasoning more efficient than other methods. If I have a good model of the domain I can reason my way out of most problems with just a little thinking. Observation requires I run a program, which usually takes more to setup, and consulting others requires I interrupt a colleague or trawl through hundreds of Internet search results.

Reasoning is also amenable to automation. The most accessible form of automated reasoning is probably type checking but linters and similar tools are other examples. Although there are systems that can construct tests they are not anywhere near as widely available as type systems. We might consider code reviews a kind of automated code review, but the feedback loop is much slower.

The Limit of Reasoning

There are theoretical and practical limits to the power of reasoning.

The halting problem demonstrates fundamental limits to the power of reasoning. There are some properties that we simply cannot prove for all programs. (The incompleteness theorems express the same idea from a mathematical perspective.) However this is not usually what causes problems in practice. In my (admittedly limited) experience, for the working programmer the limits more often come from the cost of reasoning, reasoning across system boundaries, and issues with assumptions built into models. Let's address each of these in turn.

The first issue is perhaps the most important: reasoning can be just too expensive. The cost is usually that of learning. Most reasoning we do is informal, and the cost here is in acquiring a reliable mental model. All of us have limits on what we have time to learn and must choose to specialise to an extent. When reasoning formally we may already have a model but even then most of us do not have expertise to use a formal model efficiently. Some systems are too complex to reasonably build a model for. Attempting to build a cycle accurate CPU model, for example, is likely to be wasted effort. It's much simpler to measure actual program performance.

System boundaries also limit our ability to reason. We can only formally reason up to the boundaries of our system; when we interact with the outside world all bets are off. I imagine many developers have had the experience of interacting with a web service that doesn't adhere to its own specification. It doesn't matter what the documentation says, and it doesn't matter how we represent remote systems with types in our code; if the real world is different we must adapt. The only way we can reliably determine a remote system's behavior is by interacting with it and seeing what happens.

Finally, all formal models are built on assumptions. For example, when we say that the program 1 + 1 always evaluates to 2 we are making assumptions such as: arithmetic won't overflow, we aren't going to run into CPU bugs, and we don't have to worry about cosmic rays. Usually this is fine, but there are occasions where it is not. It is up to us to decide when our assumptions should be challenged.

Combining Reasoning, Observation, and Trust

I've presented reasoning, observation, and appeal to authority as alternatives but the truth is that they are complimentary. For example, we can take observations as a starting point for reasoning, and usually when debugging this is what we do. We can use reasoning to suggest optimizations that we then confirm with actual performance measurements. We implicitly rely on trust even in formal reasoning: trust that the model we work with is correct, the tools we are using are free from bugs, and those who taught us to reason did a good job. In my experience the skill is in realising which combination of techniques is appropriate in a given situation. Let me give two examples.

I don't fully understand the Scala sbt build tool. When I do have to work with sbt I know I'm going to have to rely on reading documentation and trial and error---which is to say appeal to authority and observation. If my goal is to work with a plugin I'll usually rely on that plugin's documentation. If I'm working with something core to sbt I'll go straight to the sbt documentation. I won't usually search the web as a first choice because I don't find it particularly reliable. One non-goal is developing a complete mental model of sbt. I don't mind if this happens but I don't have to modify my builds often enough that I think this worthwhile. Hence my reading is usually very task oriented---how do I do this?---versus understanding why things work the way they do.

In contrast I've recently been learning React and Typescript (hence the Typescript examples in recent posts.) Here my goal is to build a mental model. To this end I have read through most of the documentation with a focus on conceptual material. When I encounter a surprise when programming I actively try to inspect my mental model to see how it should be revised. This slows me down to start with but the time I spend learning is paid back every time I use the model I've learned.

Finally, I've noticed that some programmers have an over reliance on a particular technique for understanding, usually searching the web. I think it is important to build reliable mental models in our core skills, so if you are the type of person who jumps onto Google whenever you encounter a problem try instead to reason about it first, and then think how you need to adjust your mental model so you understand the cause of the error. I believe it will make you a better programmer in the long run.

Photo by Matthew Brodeur on Unsplash

Scoring Ten-pin Bowling with Algebraic Data and Finite State Machines

Noel Welsh — Thu, 18 Jun 2020 00:00:00 +0000

I recently led a training session where we implemented the rules for scoring ten-pin bowling in Scala. It makes for a good case study. It’s small enough that you can pick up the rules in a few minutes, but the dependencies between frames makes calculating the score non-trivial. I decided to implement my own solution which turned into an interesting exercise in algebraic data and finite state machines. In this post I'll describe my implementation and my process for developing it.

For my implementation I solely focused on scoring the game. I didn’t implement any parsing code, as that part of the problem didn’t interest me.

The Data

The core of my approach is getting the data structure right. Once they’re in place the rest of the code is relatively straightforward. This approach relies on some foundational features of functional programming, namely algebraic data types and structural recursion. Lets have a quick diversion into these topics.

Algebraic Data Types and Structural Recursion

“Algebraic data type” is a fancy phrase that functional programmers use to refer to data that is modelled in terms of logical ands and logical ors. Here are some examples:

a User is a name and an email address and a password;
a Result is a success or a failure;
a List of A is:
- the empty list (conventionally called nil) or
- a pair (conventionally called cons) containing a head of type A and a tail of type List of A.

If a language has support for algebraic data types, once we have a description of data such as the examples above we can directly translate it into code. Let’s use the example of the list as it is the most complex.

Here’s how we can define it in Scala.

sealed trait List[A]
final case class Nil[A]() extends List[A]
final case class Cons(head: A, tail: List[A]) extends List[A]

Here’s the same thing in Typescript.

type Nil<A> = { kind: "nil" }
type Cons<A> = { kind: "cons", head: A, tail: List<A> }
type List<A> = Nil<A> | Cons<A>

const Nil = <A>(): List<A> => 
  ({ kind: "nil" });
const Cons = <A>(head: A, tail: List<A>): List<A> => 
  ({ kind: "cons", head: head, tail: tail });

Here’s Rust.

enum List<A>{ 
    Nil, 
    Cons(A, Box<List<A>>)
}

Each language requires some language specific knowledge in the implementation. For example:

in Scala we can choose between covariance and invariance;
in Typescript we need to define constructors seperately;
in Rust we must wrap recursion in a Box.

The general concept, however, applies to all these languages and we can transfer knowledge from one language to another. To avoid writing all the code three times for the rest of this post I’ll be sticking to Scala.

Given an algebraic data type we can implement any transformation on that type using structural recursion (also known as a fold or a catamorphism)¹. The rules, informally, for structural recursion are:

each case (logical or) in the data must have a case in the structural recursion; and
if the data we are processing is recursive the function we are defining must be recursive at the same place.

Structural recursion cannot solve everything for us—we must add problem-specific code to fill out the implementation—but it gives us a substantial help.

Here’s one way we could write the structural recursion skeleton for a list in Scala.

def transform[A](list: List[A]): SomeResultType = 
  list match { 
    case Nil() => 
      ??? // Problem specific 
    case Cons(h, t) => 
      ??? // Problem specific but *must* include the recursion transform(t) 
  }

If we want to calculate the length of a list we can start with the skeleton

def length[A](list: List[A]): Int = 
  list match { 
    case Nil() =>
      ??? // Problem specific
    case Cons(h, t) => 
      ??? // Problem specific but *must* include the recursion length(t)
  }

and fill out the problem specific parts

def length[A](list: List[A]): Int = 
  list match { 
    case Nil() => 0 
    case Cons(h, t) => 1 + length(t)
  }

Any (yes, really, any) other method we can write that transforms a list to something else (or even another list) is going to have the same skeleton. So in summary, all we have to do is work out how to model our data using logical ands and ors and then we immediately get for free:

the representation of that data in code; and
a generic template for transforming that data into anything.

Diversion over! Let’s get back to bowling.

Bowling as an Algebraic Data Type

From reading the rules of bowling we can pull out a reasonably simple structure:

A game consists of 10 frames
Each frame can be a strike, a spare, or an open frame where
- an open frame is two rolls that sum to less than ten;
- a spare is one roll that is less than ten (the second rule is implied by the first roll); and
- a strike doesn’t need any additional information.

This is the model I started with but as I worked on it I realised the true model is more complicated because the final frame may have up to two bonus rolls. Hence I changed the model to

A game consists of 9 frames and 1 final frame
A frame can be a strike, a spare, or an open frame where
- an open frame is two rolls that sum to less than ten;
- a spare is one roll that is less than ten (the second roll is implied by the first roll); and
- a strike doesn’t need any additional information.
A final frame can be a strike, a spare, or an open frame which have the same definition as above and also
- a spare final frame has one bonus roll; and
- a strike final frame has two bonus rolls.

These definitions fit the criteria for an algebraic data type (they consist of logical ands and ors) and therefore translate to code in a straightforward way. Rather than paste a big lump of code I’ll just link to Frame, FinalFrame, and Game in the code repository. Note that not all the invariants can be expressed in the type system. For example, we cannot express the criteria that the rolls in an open frame must sum to less than 10. The problem specification says we only need to consider valid data, but I put some dynamic checks in the “smart constructors” on the companion objects. This turned out to be useful as it caught some errors in my tests (which I’ll talk about in a bit.)

Now that we have defined the data we just need to write a structural recursion over the Game type. Well, not quite. The scoring rules have dependencies between frames. For example, if a frame is a strike the next two rolls are added to the score for that frame. We need to keep around the information about pending frames—frames that have yet to be scored—while we process the data.

Reading through the rules we can extract the following.

The pending frames can be

a strike;
a spare; or
a strike and a strike .

This is another algebraic data type.

sealed trait Pending
case object Strike extends Pending
case object Spare extends Pending
case object StrikeAndStrike extends Pending

When we score a frame in a game we must calculate:

the score for this frame if it is not pending futures frames;
the score for any pending frames that are now complete; and
the pending frames after this frame.

In this way the scoring algorithm is a finite state machine (FSM). The pending frames are the current state of the FSM, the current frame is the output, and we output the next state (the updated pending frames) in addition to a score.

It’s useful to wrap the Pending information up with the score calculated so far, which gives all the information to calculate the total score so far. I called this State. Note that Pending is wrapped in an Option; there may be no frames for which the score is pending.

final case class State(
    score: Int,
    pending: Option[Pending]
) {
  def next(additionalScore: Int, newPending: Option[Pending]): State =
    this.copy( 
      score = score + additionalScore,
      pending = newPending
    )
}

With this definition the scoring function has type (State, Frame) => State, which is exactly the type of the transition function of a FSM. We can calculate the score of a List[Frame] by passing this function as the second argument to foldLeft, with the initial state forming the first argument. In code this is

frames.foldLeft(initialState)(transitionFunction)

The transition function, the scoring algorithm, is a structural recursion over the Frame as well as the State. The code is lengthy, but it isn’t hard to write and a good deal of it is generated by the IDE (in my case, Metals with Doom Emacs.)

Testing

Testing was important. The scoring rules aren’t amenable to much support from the type system (though now I think about it I could have expressed the rules in a different way that would have given me more compiler support) which means testing is the next best way to ensure the code is correct. This is an excellent application for property-based testing, for which I used ScalaCheck.

I defined a few different generators for the various types of frames. For example here is how I generate open frames.

def genOpen: Gen[Frame] = {
  for {
    misses <- Gen.choose(1, 10)
    hits = 10 - misses
    roll1 <- Gen.choose(0, hits)
    roll2 = hits - roll1
  } yield Frame.open(roll1, roll2)}

These generators enabled me to test both the examples given in the instructions and examples generated at random. I found quite a few errors with these tests, both in my scoring algorithm and in how I was generating data. Luckily they were all very easy to diagnose. As the scoring algorithm was very explicit it was easy to work out what I had done wrong (which was usually forgetting to include a roll somewhere).

Conclusions

I hope this article has given an insight in how I approached this case study. In summary there are three important components:

the core of my approach is to model the data correctly, as I know once I have the data model in place almost all of the rest of the code follows from it;
recognising the scoring algorithm was a finite state machine was another insight I needed to model it cleanly; and
using property-based testing allowed me to achieve a high degree of confidence in my implementation without a great deal of effort.

I have presented my process as if I moved straight from problem to implementation. This was not the case. It was a highly iterative process, and I changed the data model at least three times as I came to better understand the problem. I also interleaved developing the tests with the code under test.

Of course my approach is the only one. There is a write-up of a TDD approach in C# which may make an interesting contrast to mine.

Although this is well known in programming language theory I haven't been able to find a reference that has a chance of being comprehensible to the average programmer. I think the first place to state this result is [Data Structures and Program Transformation][malcolm90], but this uses the Bird-Meertens formalism which I find very hard to read. [A tutorial on the universality and expressiveness of fold][hutton99] only considers folds on list, but uses Haskell and more standard mathematical notation. I imagine this is still quite obscure for most but it is an improvement! ↩

What Functional Programming Is, What it Isn't, and Why it Matters

Noel Welsh — Wed, 10 Jun 2020 14:36:43 +0000

The programming world is moving towards functional programming (FP). More developers are using languages with an explicit bias towards FP, such as Scala and Haskell, while object-oriented (OO) languages and their communities adopt FP features and practices. (A striking example of the latter is the rise of Typescript and React in the Javascript community.) So what is FP and what does it mean to write code in a functional style? It's common to view functional programming as a collection of language features, such as first class functions, or to define it as a programming style using immutable data and pure functions. (Pure functions always return the same output given the same input.) This was my view when I started down the FP route, but I now believe the true goals of FP are enabling local reasoning and composition. Language features and programming style are in service of these goals. In this post I attempt to explain the meaning and value of local reasoning and composition.

What Functional Programming Is

I believe that functional programming is a hypothesis about software quality: that software that can be understood before it is run and is built of small reusable components is easier to write and maintain. The first property is known as local reasoning, and the second as composition. Let's address each in turn.

Local reasoning means we can understand pieces of code in isolation. When we see the expression 1 + 1 we know what it means regardless of the weather, the database, or the current status of our Kubernetes cluster. None of these external events can change it. This is a trivial and slightly silly example, but it illustrates a point. A goal of functional programming is to extend this ability across our code base.

It can help to understand local reasoning by looking at what it is not. Shared mutable state is out because relying on shared state means that other code can change what our code does without our knowledge. It means no global mutable configuration, as found in many web frameworks and graphics libraries for example, as any random code can change that configuration. Metaprogramming has to be carefully controlled. No monkey patching, for example, as again it allows other code to change our code in non-obvious ways. As we can see, adapting code to enable local reasoning can mean quite some sweeping changes. However if we work in a language that embraces functional programming this style of programming is the default.

Composition means building big things out of smaller things. Numbers are compositional. We can take any number and add one, giving us a new number. Lego is also compositional. We compose Lego by sticking it together. In the particular sense we're using composition we also require the original elements we combine don't change in any way when they are composed. When we create by 2 by adding 1 and 1 we get a new result but we don't change what 1 means.

We can find compositional ways to model common programming tasks once we start looking for them. React components are one example familiar to many front-end developers: a component can consist of many components. HTTP routes can be modelled in a compositional way. A route is a function from an HTTP request to a handler function or a value indicating the route did not match. We can combine routes as a logical or: try this route or, if it doesn't match, try this other route. Processing pipelines are another example that often use sequential composition: perform this pipeline stage and then this other pipeline stage.

Types

Types are not strictly part of functional programming but statically typed FP is the most popular form of FP and sufficiently important to warrant a mention. Types help compilers generate efficient code but types in FP are as much for the programmer as they are the compiler. Types express properties of programs, and the type checker automatically ensures that these properties hold. They can tell us, for example, what a function accepts and what it returns, or that is a value is optional. We can also use types to express our beliefs about a program and the type checker will tell us if those beliefs are incorrect. For example, we can use types to tell the compiler we do not expect an error at a particular point in our code and the type checker will let us know if have made an incorrect assumption. In this way types are another tool for reasoning about code.

Type systems push programs towards particular designs, as to work effectively with the type checker requires designing code in a way the type checker can understand. As modern type systems come to other languages they naturally tend to shift programmers in those languages towards a FP style of coding.

What Functional Programming Isn't

In my view functional programming is not about immutability, or keeping to "the substitution model of evaluation", and so on. These are tools in service of the goals of enabling local reasoning and composition, but they are not the goals themselves. Code that is immutable always allows local reasoning, for example, but it is not necessary to avoid mutation to still have local reasoning. Here is an example of summing a collection of numbers. First we have the code in Typescript:

function sum(numbers: Array<number>): number {
    let total = 0.0;
    numbers.forEach(x => total = total + x);
    return total;
}

Here's the same function in Scala:

def sum(numbers: List[Int]): Int = {
  var total = 0.0
  numbers.foreach(x => total = total + x)
  total
}

In both implementations we mutate total. This is ok though! We cannot tell from the outside that this is done, and therefore all users of sum can still use local reasoning. Inside sum we have to be careful when we reason about total but this block of code is small enough that it shouldn't cause any problems.

In this case we can reason about our code despite the mutation, but neither the Typescript nor the Scala compiler can determine that this is ok. Both languages allow mutation but it's up to us to use it appropriately. A more expressive type system, perhaps with features like Rust's, would be able to tell that sum doesn't allow mutation to be observed by other parts of the system¹. Another approach, which is the one taken by Haskell, is to disallow all mutation and thus guarantee it cannot cause problems.

Mutation also interferes with composition. For example, if a value relies on internal state then composing it may produce unexpected results. Consider generators in Javascript. They maintain internal state that is used to generate the next value. If we have two generators we might want to combine them into one generator that yields values from the two inputs. Here's the code in Typescript:

type Gen<a> = Generator<a, void, never>

function* infinite(): Gen<number> {
    let index = 0;

    while (true) {
        yield index++;
    }
}

function zip<a, b>(left: Gen<a>, right: Gen<b>): Gen<[a, b]> {
    function* zipIt () {
        const l = left.next();
        const r = right.next();
        if (!l.done && !r.done) {
            const result: [a, b] = [l.value, r.value];
            yield result;
        }
    }

    return zipIt();
}

This works if we pass two distinct generators to zip.

zip(infinite(), infinite()).next().value; // [0, 0]

However if we pass the same generator twice we get a surprising result.

const inf = infinite();
zip(inf, inf).next().value; // [0, 1]

The usual functional programming solution is to avoid mutable state but we can envisage other possibilities. For example, an effect tracking system would allow us to avoid combining two generators that use the same memory region. These systems are still research projects, however.

So in my opinion immutability (and purity, referential transparency, and no doubt more fancy words that I have forgotten) have become associated with functional programming because they guarantee local reasoning and composition, and until recently we didn't have the language tools to automatically distinguish safe uses of mutation from those that cause problems. Restricting ourselves to immutability is the easiest way to ensure the desirable properties of functional programming, but as languages evolve this might come to be regarded as a historical artifact.

Why It Matters

I have described local reasoning and composition but have not discussed their benefits. Why are they are desirable? The answer is that they make efficient use of knowledge. Let me expand on this.

We care about local reasoning because it allows our ability to understand code to scale with the size of the code base. We can understand module A and module B in isolation, and our understanding does not change when we bring them together in the same program. By definition if both A and B allow local reasoning there is no way that B (or any other code) can change our understanding of A, and vice versa. If we don't have local reasoning every new line of code can force us to revisit the rest of the code base to understand what has changed. This means it becomes exponentially harder to understand code as it grows in size as the number of interactions (and hence possible behaviours) grows exponentially. We can say that local reasoning is compositional. Our understanding of module A calling module B is just our understanding of A, our understanding of B, and whatever calls A makes to B.

We introduced numbers and Lego as examples of composition. They have an interesting property in common: the operations that we can use to combine them (for example, addition, substraction, and so on for numbers; for Lego the operation is "sticking bricks together") give us back the same kind of thing. A number multiplied by a number is a number. Two bits of Lego stuck together is still Lego. This property is called closure: when you combine things you end up with the same kind of thing. Closure means you can apply the combining operations (sometimes called combinators) an arbitrary number of times. No matter how many times you add one to a number you still have a number and can still add or subtract or multiply or...you get the idea. If we understand module A, and the combinators that A provides are closed, we can build very complex structures using A without having to learn new concepts! This is also one reason functional programmers tend to like abstractions such a monads (beyond liking fancy words): they allow us to use one mental model in lots of different contexts.

In a sense local reasoning and composition are two sides of the same coin. Local reasoning is compositional; composition allows local reasoning. Both make code easier to understand.

The Evidence for Functional Programming

I've made arguments in favour of functional programming and I admit I am biased---I do believe it is a better way to develop code than imperative programming. However, is there any evidence to back up my claim? There has not been much research on the effectiveness of functional programming, but there has been a reasonable amount done on static typing. I feel static typing, particularly using modern type systems, serves as a good proxy for functional programming so let's look at the evidence there.

In the corners of the Internet I frequent the common refrain is that static typing has neglible effect on productivity. I decided to look into this and was surprised that the majority of the results I found support the claim that static typing increases productivity. For example, the literature review in this dissertation (section 2.3, p16--19) shows a majority of results in favour of static typing, in particular the most recent studies. However the majority of these studies are very small and use relatively inexperienced developers---which is noted in the review by Dan Luu that I linked. My belief is that functional programming comes into its own on larger systems. Furthermore, programming languages, like all tools, require proficiency to use effectively. I'm not convinced very junior developers have sufficient skill to demonstrate a significant difference between languages.

To me the most useful evidence of the effectiveness of functional programming is that industry is adopting functional programming en masse. Consider, say, the widespread and growing adoption of Typescript and React. If we are to argue that FP as embodied by Typescript or React has no value we are also arguing that the thousands of Javascript developers who have switched to using them are deluded. At some point this argument becomes untenable.

This doesn't mean we'll all be using Haskell in five years. More likely we'll see something like the shift to object-oriented programming of the nineties: Smalltalk was the paradigmatic example of OO, but it was more familiar languages like C++ and Java that brought OO to the mainstream. In the case of FP this problems means languages like Scala, Swift, and Kotlin, and mainstream languages like Javascript and Java continuing to adopt more FP features.

Final Words

I've given my opinion on functional programming---that the real goals are local reasoning and composition, and programming practices like immutability are in service of these. Other people may disagree with this definition, and that's ok. Words are defined by the community that uses them, and meanings change over time.

Functional programming emphasises formal reasoning, and there are some implications that I want to briefly touch on. Later articles may expand on these points.

Firstly, I find that FP is most valuable in the large. For a small system it is possible to keep all the details in our head. It's when a program becomes too large for anyone to understand all of it that local reasoning really shows its value. This is not to say that FP should not be used for small projects, but rather that if you are, say, switching from an imperative style of programming you shouldn't expect to see the benefit when working on toy projects.

The formal models that underly functional programming allow systematic construction of code. This is in some ways the reverse of reasoning: instead of taking code and deriving properties and start from some properties and derive code. This sounds very academic but is in fact very practical and how I, for example, develop most of my code.

Finally, reasoning is not the only way to understand code. It's valuable to the limitations of reasoning, other methods for gaining understanding, and using a variety of strategies depending on the situation.

The example I gave is fairly simple. A compiler that used escape analysis could recognize that no reference to total is possible outside sum and hence sum is pure (or referentially transparent). Escape analysis is a well studied technique. In the general case the problem is a lot harder. We'd often like to know that a value is only referenced once at various points in our program, and hence we can mutate that value without changes being observable in other parts of the program. This might be used, for example, to pass an accumulator through various processing stages. To do this requires a programming language with what is called a substructural type system. Rust has such a system, with affine types. Linear types are in development for Haskell. ↩