Andrew (he/him)

Posted on Oct 28, 2019

Bits of Syntax: Function Application

#discuss #design #healthydebate #watercooler

Parentheticals

After variable assignment, function application is probably the most fundamental aspect of any programming language's API.

Many, many, many programming languages use parentheses () to indicate function application:

print("this is a string")

In the above case, "this is a string" is an argument (or a parameter) that's passed to the function print(). (In fact, functions are often written in text with () after them to indicate that they are functions and not variables or some other kind of construct.) In these languages which use () to indicate a list of parameters, functions with no parameters often still need the () following the function name to indicate the function call:

cat()

In the R language, for instance, calling a zero-argument function with () appended will execute the function, while calling it without a trailing () will return some meta-information about the function itself, including (sometimes) its implementation:

> cat()
> cat
function (..., file = "", sep = " ", fill = FALSE, labels = NULL, 
    append = FALSE) 
{
    if (is.character(file)) 
        if (file == "") 
            file <- stdout()
        else if (substring(file, 1L, 1L) == "|") {
            file <- pipe(substring(file, 2L), "w")
            on.exit(close(file))
        }
        else {
            file <- file(file, ifelse(append, "a", "w"))
            on.exit(close(file))
        }
    .Internal(cat(list(...), file, sep, fill, labels, append))
}
<bytecode: 0x7fc8a7d676e8>
<environment: namespace:base>

When multiple arguments are passed to a function, the majority of languages require them to be comma-separated:

max(1, 5, 42, 19)

This pattern -- comma-separated parameters within parentheses to indicate arguments being passed to a function -- goes back to the very first programming languages and was likely inspired by mathematical notation, which looks essentially identical.

But not all languages use this syntax, particularly for functions which may have only one or two parameters.

Infix, Prefix, Postfix

Algebraic functions, like +, -, *, and / (addition, subtraction, multiplication, and division, respectively) are applied in a significantly different way, but we're so used to it we often don't notice.

These operators (functions with short, usually symbolic names) are infix operators, because they're fix-ed with-in the arguments:

But each of these is nothing more than a function which takes two arguments and returns a result:

     plus(3, 4)
    minus(5, 5)
    times(5, 7)
dividedby(6, 2)

Most languages hide this from the user, but Scala (as an example), makes this explicit. Any Scala function -- including the arithmetic operators -- can be applied using a particular notation:

3.+(4)
5.-(5)
5.*(7)
6./(2)

You can see that the "addition" function in Scala is actually called .+(), and is called from the 3 object. In Scala, everything -- even a raw number -- is an object, and has class methods which can be called from it. Scala adds some syntactic sugar to allow users to drop the preceding . and the () which surround the argument to the function. The result is the familiar notation we saw previously.

Prefix and postfix operators are another kind of function which is applied in an unusual way, but these functions often only have a single argument:

-3     // "negate" function operating on argument '3'
!true  // "not" function (some languages) operating on argument 'true'
~false // "not" function (other languages) operating on argument 'false'
C++    // "increment" operator operating on argument 'C'
--ii   // "decrement" operator operating on argument 'ii'
A.'    // "transpose" function operating on argument 'A'

Each of the above functions operates on a single argument. In some cases, the operator comes immediately before the argument, like -3, which is the "negate" function applied to the argument 3 (this function simply multiplies the argument by -1). These are prefix operators.

In other cases, the operator comes immediately after the argument, like C++, where the variable C is incremented using the ++ function. These are postfix operators.

In most languages, a prefix or postfix operator must be immediately adjacent to its argument, with no whitespace or any other characters in between. Because of this limitation, these functions are often symbolic, and don't contain any alphabetic characters.

Imagine if the - operator were instead called neg: -3 would become neg3, and a variable called ate would be negated with the syntax negate -- not very intuitive. Languages like Python which use operators like this (for instance, the logical not instead of ! or ~) do require whitespace between the function name and the argument, when parentheses are omitted.

We can see that this notation is not unlike the Scala notation seen above, where a bit of "syntactic sugar" (even if it is not explicitly called that by the language specification) hides the nature of these operators, that they are actually single-argument functions.

LISP

Some languages -- notably LISP -- have a particular kind of notation, where a function's name is simply followed by its arguments, separated by whitespace:

(+ 3 4 5)

The above code applies the + function to the arguments 3, 4, and 5. Every function in LISP operates this way, which is why LISP is often rife with parentheses:

(/ (+ (- b) (sqrt (- (* b b) (* 4 a c)))) (* 2 a))

That's the quadratic formula, which might look like this in Python:

(-b + sqrt(b**2 - 4*a*c)) / (2*a)

Note the "power" infix operator, **, in the Python code above.

Domain-Specific Function Application

Some languages allow particular syntax for particular actions, as well. For instance, many languages support array indexing with square brackets:

int arr[] = { 1, 2, 3 };
arr[1]; // == 2

...but what is this except a function applied to an array which returns an element of that array? This is a really unusual notation in which one argument (the array name) comes before the "parameter list start" delimiter [ and the others come before the "parameter list end" delimiter ]. Similar notation for "slicing" arrays allows extra notation within the square brackets:

arr <- c(1, 2, 3, 4, 5)
arr[2:4]

The above R code (which uses 1-based indexing) creates a vector with five elements, then returns the three "middle" elements, 2, 3, and 4, in a new vector.

Discussion

Function application notation using parentheses and comma-separated argument lists has existed in mathematics for hundreds of years. It is, of course, difficult to judge if this syntax is inherently intuitive or if this notation is simply so ubiquitous now that we don't question it.

We seem able to internalise -- and accept as normal -- other kinds of function application notation, particularly infix, postfix, and prefix notation. Infix notation, in particular, is taught from early grade school, and is therefore easy enough to translate from algebra to programming.

LISP notation seems foreign to beginners, though. But why should it? It's not a big leap from

a + b + c + d

plus(a, b, c, d)

(+ a b c d)

It seems like it's the nesting nature of LISP which makes it difficult. Programs with multiple functions and arguments intertwined can be difficult to parse, as we saw earlier:

(/ (+ (- b) (sqrt (- (* b b) (* 4 a c)))) (* 2 a))

...but indentation can make this a bit more intelligible:

(/
  (+
    (- b)
    (sqrt
      (- (* b b) (* 4 a c)))
  )
  (* 2 a)
)

So what's the best way to go about this? Having multiple different ways of applying functions -- with and without syntactic sugar -- feels messy. Is there a way to unify this notation in a way that doesn't feel as clunky as LISP?

DEV Community