Lift functions up into the module level

#functional

IN FUNCTIONAL programming, one of the touted benefits is that functions are first-class citizens, therefore they can be created anywhere, therefore let's create them specifically in the exact scopes that use them, and hide them from outer scopes. This is hailed as a kind of information hiding and thus beneficial.

I was a big believer of this as well, for a long time. However nowadays I'm a big believer in lifting functions up into toplevel (i.e., module) scope. Let me explain why. In the rest of this post I will use code examples in Scala, not for any particular reason, just it's a (somewhat) functional programming language that I happen to know.

Nested functions are tightly coupled

In most functional programming languages, nested functions automatically become closures and capture their surrounding environment. This means they have an implicit dependency on that environment. A dependency that will become painfully apparent if we ever try to refactor our code and extract the function. Here's a simple example:

// Main.scala

object Main {
  def main(args: Array[String]): Unit = {
    val program = args(0)

    def printArg(arg: String): Unit =
      println(s"$program arg: $arg")

    args.tail.foreach(printArg)
  }
}

Now let's say we want to lift printArg out into the Main module scope. Oops, we forgot it implicitly depends on program:

error: not found: value program
      println(s"$program arg: $arg")
                 ^

This leads to a more extended refactoring session where we finally decide to pass in the program as a curried argument:

// Main.scala

object Main {
  def main(args: Array[String]): Unit =
    args.tail.foreach(printArgFor(args(0)))

  private def printArgFor(program: String)(arg: String): Unit =
    println(s"$program arg: $arg")
}

Notice how this actually simplifies the code somewhat. (I realize it's possible to embed an arbitrary expression in a string interpolation, like s"${args(0)} arg: $arg", but in my opinion it's bad practice because it's mixing business logic with rendering logic.)

We also change the method name slightly to printArgFor, to more closely reflect how it will be used now.

This example is deliberately simple, but it's quite easy to extrapolate this to real codebases where logic gets accumulated over time and ends up with lots of nested functions implementing logic, and it's difficult to understand what their inter-dependencies are.

Nested functions aren't needed for information hiding

Notice that in the refactored Main module above the printArgFor method is now private. It achieves almost exactly the same level of information hiding as when it was defined inside the main method. One could argue that it's now visible to other members of the Main module so there is some loss of hiding.

My argument here is that–modules are meant to be a granular unit of information hiding. That's why almost every module system allows you to specify private members. A module–like the Main object above–is supposed to be a coherent unit of functionality, made of interconnected members, that exposes a well-thought-out API surface area. Members of a module are supposed to be able to access each other. If that's an issue, then the module itself needs to be rethought, because it's not granular enough.

Most compilers lift functions anyway

The process by which nested functions are lifted up into module scope is called lambda lifting and is a well-known technique–especially in compilers for functional programming languages. By doing it yourself, you keep explicit control over the process and–maybe–save a little time for the compiler. Not that this is the most important point, anyway. Most importantly...

Lifting functions forces you to think about the function's interface

I can't really put this any better than Garrett Smith does in his excellent talk on writing quality code (in Erlang, but really his points are universal):

The observation is that in a case expression, you can use all the free variables around the case expression–all the context–the arguments to the function and anything that you’ve defined above the case expression, you have access to. When you convert from a case expression to a function, you lose all the free variables that no longer apply. So it forces you to identify specifically what arguments are available in that operation, that’s the point. So you give it a name and you give it a list of arguments, and now you have a tightly-defined interface to that logic, that decision that you’re making. That’s why it’s valuable. The process of doing that is programming. We do that–that’s our job.