Laurent Dami

Posted on Mar 3

Beautiful Perl feature : fat commas, a device for structuring lists

#perl #programming #beautifulperl

Beautiful Perl series

This post is part of the beautiful Perl features series.
See the introduction post for general explanations about the series.

Today's topic is a Perl construct called fat comma, which is quite different from the trailing commas discussed in the last post.

Fat comma: an introduction

A fat comma in Perl is a construct that doesn't involve a typographic comma! Visually it consists of an expression followed by an arrow sign => and another expression. This is used in many contexts, the most common being for initializing hashes or for passing named parameters to subroutines:

my %rect = (x      => 12,
            y      => 34,
            width  => 20,
            height => 10);
draw_shape(kind   => 'rect', 
           coords => \%rect,
           color  => 'green');

A fat comma is semantically equivalent to a comma; the only difference with a regular comma is purely syntactic: if the left-hand side is a string that begins with a letter or underscore and is composed only of letters, digits and underscores, then that string doesn't need to be enclosed in quotes. The example above took advantage of this feature, but it could as well have been written:

my %rect = ('x'      => 12,
            'y'      => 34,
            'width'  => 20,
            'height' => 10);
draw_shape('kind'   => 'rect', 
           'coords' => \%rect,
           'color'  => 'green');

or even:

my %rect = ('x', 12, 'y', 34, 'width', 20, 'height', 10);
draw_shape('kind', 'rect', 'coords', \%rect, 'color', 'green');

This last variant has exactly the same technical meaning, but clearly it does not convey the same impression to the reader; so the fat comma is mainly a device for improving code readability.

More general usage

Since Perl does not impose many constraints, fat commas can be used in many other ways than just initializing hashes or passing named parameters to subroutine calls:

they can appear at any place where a list is expected;
they need not be only in pairs: triplets, quadruplets, etc. are allowed;
mixtures of fat commas and regular commas are allowed (and even frequent);
the expression on the left-hand side of a fat comma need not be a string - it can be any value.

Most of these points are excellently illustrated in a collection of examples designed in 2017 by Sinan Ünür. Here is an excerpt from his answer to a StackOverflow question asking when to use fat comma if not for hashes:

Any time you want to automatically quote a bareword to the left of a fat comma:
system ls => '-lh';
or
my $x = [ a => [ 1, 2 ], b => [ 3, 4 ] ];
Any time you think it makes the code easier to see
join ', ' => @data;
Any time you want to say "into":
bless { value => 5 } => $class;
In short, => is a comma, plain and simple. You can use it anywhere you can use a comma. E.g.:
my $z = f($x) => g($y); # invoke f($x) (for its side effects) and g($y)
                        # assign the result of g($y) to $z

Fat commas for domain-specific languages

A number of CPAN modules took advantage of fat commas for designing domain-specific languages (DSLs), exploiting the fact that fat commas can be used liberally for other purposes than just expressing pairs.

Moose

Attribute declarations

Moose is the most well-known object-oriented framework for Perl; it also influenced several competing frameworks¹. Here is a short excerpt from the synopsis, showing a class declaration:

package Point;
use Moose;

has 'x' => (isa => 'Int', is => 'rw', required => 1);
has 'y' => (isa => 'Int', is => 'rw', required => 1);

This is an example where the fat comma does not introduce a pair of values, but rather a longer list in which the first element (the attribute name) is deliberately emphasized. Technically this x attribute could have been declared as:

  has('x', 'isa', 'Int', 'is', 'rw', 'required', 1);

with exactly the same result, but much less readability. Observe that in addition to the fat comma, the recommended Moose syntax also takes advantage here of two other Perl features, namely:

the fact that a subroutine can be treated like a list operator², without parenthesis around the arguments: so the call has 'x' => ... is technically equivalent to has('x' => ...).
the fact that a list within another list is flattened, so the parenthesis in 'x' => (isa => 'Int', ...) are technically not necessary; they are present just for stylistic preference.

You may have noticed that the single quotes around the attribute name are technically unnecessary: the x attribute name could go unquoted in

  has x => (isa => 'Int', is => 'rw', required => 1);

Here again it's a matter of stylistic preference; in this context I suppose that the Moose authors wanted to emphasize the difference between the subroutine name has and the string x passed as first argument.

Subtype declarations

Another domain-specific language in Moose is for declaring types. The cookbook has this example:

use Moose::Util::TypeConstraints;
use Locale::US;

my $STATES = Locale::US->new;

subtype 'USState'
    => as Str
    => where {
           (    exists $STATES->{code2state}{ uc($_) }
             || exists $STATES->{state2code}{ uc($_) } );
       };

Here again, fat commas and subroutine calls expressed as list operators were cleverly combined to form an expressive DSL for declaring Moose types.

Mojo

Mojolicious is one of the major Web frameworks for Perl. It uses a domain-specific language for declaring the routes supported by the Web application; here are some excerpts from the documentation:

my $route = $r->get('/:foo');
my $route = $r->get('/:foo' => sub ($c) {...});
my $route = $r->get('/:foo' => sub ($c) {...} => 'name');
my $route = $r->get('/:foo' => {foo => 'bar'} => sub ($c) {...});
my $route = $r->get('/:foo' => [foo => qr/\w+/] => sub ($c) {...});
my $route = $r->get('/:foo' => (agent => qr/Firefox/) => sub ($c) {...});
...
my $route = $r->any(['GET', 'POST'] => '/:foo' => sub ($c) {...});

Through these many variants we see a flexible language for declaring routes, where fat commas are used to visually convey some idea of structure within the lists of arguments. Observe that lines 3 and following are not pairs, but triplets, and that the last line has an arrayref (not a string!) to the left of the fat comma.

A word of caution about the quoting mechanism

Let's repeat the syntactic rule: if the left-hand side of the fat comma is a string that begins with a letter or underscore and is composed only of letters, digits and underscores, then that string doesn't need to be enclosed in quotes. We have seen numerous examples above that relied on this rule for more elegance and readability. One has to be careful, however, that builtin functions or user-defined subroutines could inadvertently be interpreted as strings instead of the intended subroutine calls. For example consider this snippet:

use constant foo => "tac";

sub build_hash {
 return {shift => 123, foo => 456, toe => 789};
}

my $h = build_hash('tic');

One could easily expect that the value of $h is {tic => 123, tac => 456, toe => 789} ... but actually the result is {foo => 456, shift => 123, toe => 789}, because both shift and foo were interpreted here as mere strings instead of subroutine calls. The ambiguity can be resolved easily, either by putting an empty argument list after the subroutine calls, or by enclosing them in parenthesis:

sub build_hash {
 return {shift() => 123, foo() => 456, toe => 789};
 # or: return {(shift) => 123, (foo) => 456, toe => 789};
}

Some people would perhaps argue that the Perl interpreter should automatically detect that shift or foo are subroutine names ... but that would introduce too much fragility. The interpreter would then be dependent on the list of builtin Perl functions, and also be dependent on the list of symbols declared at that point in the code; future evolutions on either side could easily break the behaviour. So Perl's design, that blindly applies the syntactic rule formulated above, is much wiser.

Similar constructs in other languages

To my knowledge, no other programming language has a general-purpose comma operator comparable to Perl's fat comma. What is quite common, however, is to have specific syntax for hashes (or "objects" or "dictionaries" or "records", as they are called in other languages), and sometimes specific syntax for named parameters in subroutine calls or method calls. This chapter explores some aspects on these directions.

JavaScript

JavaScript Objects

The equivalent of a Perl hash is called "object" in JavaScript; it is initialized as follows (example copied from the MDN documentation):

const obj = {
  property1:    value1, // property name may be an identifier
  2:            value2, // or a number
  "property n": value3, // or a string
};

Here the syntax is : instead of =>³. Like in Perl, any quoted string can be used as a property name, or a number, or an unquoted string if that string can be parsed as an identifier. What is not allowed, however, is to use an expression on the left-hand side: {(2+2): value} or {compute_name(): value} are syntax errors. The workaround for using expressions as property names is to first create the object, and then assign properties to it:

const obj           = {};
obj[2+2]            = value1;
obj[compute_name()] = value2;

Named parameters

JavaScript has no direct support for passing named parameters to subroutines; however there is of course an indirect way, which is to pass an object to the function:

function show_user(u) {
  return `${u.firstname} ${u.lastname} has id ${u.id}`;
}
console.log(show_user({id: 123, firstname:"John", lastname:"Doe"}));

Recent versions of JavaScript have a more sophistictated way of exploiting the object received as parameter: rather than grabbing successive properties into the object, the receiving function could instead use object destructuring to extract the values into local lexical variables:

function show_user_v2({firstname, lastname, id}) {
  return `${firstname} ${lastname} has id ${id}`;
}
console.log(show_user_v2({id: 123, firstname:"John", lastname:"Doe"}));

This technique can go even further by supplying default values to the lexical variables - an advanced technique described in the MDN documentation.

Python

Dictionaries

In Python the closest equivalent of a Perl hash is called a "dictionary". Like in JavaScript, dictionaries are initialized with list of keys and values separated by :, enclosed in curly braces :

point = {'x': 34, 'y': -1}

But unlike in JavaScript or Perl, keys on the left of the : separator are not quoted automatically: they are just ordinary expressions. This requires more typing from the programmer, but makes it possible to use operators or function calls, like in this example:

def double (x):
    return x * 2

obj = {
    'hello' + 'world': 11,
    234:               'foobar',
    double(3):         'doubled',
    }

print(obj) # prints : {'helloworld': 11, 234: 'foobar', 6: 'doubled'}

Keyword arguments

In Python, named parameters are called keyword arguments. The syntax is different from dictionary initializers: the symbol = is used to connect keywords to their values:

draw_line(x1=12, y1=-3, x2=55, y2=66)

Here the left-hand side does not need to be quoted; but it must obey the syntax rules for identifiers, which means for example that strings containing spaces are not eligible.

The construct of functions with keyword arguments is clearly different from the construct of dictionaries. They can be combined, however: a dictionary can be unpacked as a list of key-value pairs to be passed as arguments to a function.

points = {'x1':1, 'y1':2, 'x2':3, 'y2':4}
draw_line(**points)

but unlike in Perl or JavaScript, if the dictionary contains other keys than those expected by the function, an exception is raised ("got an unexpected keyword argument"). This is beneficial for defensive programming, where the interpreter exerts more control, but at the detriment of flexibility, because a dictionary received from an external source (for example a config file or an HTTP request) must be filtered before it can be flattened and passed to the called function.

PHP

PHP uses the => notation for key-value pairs in associative arrays, like in Perl, but without the automatic quoting feature. Therefore keys must be enclosed in double quotes or single quotes, like in Python.

In addition, PHP also uses the same notation => for anonymous functions, like in JavaScript, except that the fn keyword must also be present.

Here is an example where the two features are combined:

$array1 = ["foo" => "bar", 
           "fun" => fn($x) => fn($y) => $x+$y,
          ];

This is an associative array (like a Perl hash) where the key foo is associated to value "bar", and the key fun is associated with a function that returns another function. So beware when visually parsing a => in a PHP program!

Wrapping up

The Perl construct of fat commas is very simple, with coherent syntax and semantics, and applicable in a wide range of situations. It helps to write readable code by allowing the programmer to structure lists and emphasize some relations between values in the list. This capability is often used to design domain-specific sublanguages within Perl. A beautiful construct indeed!

About the cover picture

The picture shows the coupling mechanism on an old pipe organ. The french word for this is "accouplement", which in other contexts also means "mating"!

When the mechanism is activated, notes played on the lower keyword also trigger the notes on the upper keyboard ... which bears some resemblance to the bindings in programming that were discussed in this article.

since v5.38 some object-oriented features are also implemented in Perl core; but CPAN object-oriented frameworks like Moose are still heavily used. ↩
See perlsyn: "Declaring a subroutine allows a subroutine name to be used as if it were a list operator from that point forward in the program". ↩
the notation => is also present in JavaScript, but with a meaning totally different from Perl: it is used for arrow function expressions, a compact alternative to traditional function expression. ↩

Top comments (3)

Bernhard Schmalhofer • Mar 4

This is an excellent article. I'd just add that a fat comma can also be a trailing comma.

$ cat t.pl
use 5.024;
use strict;
use warnings;
use utf8;

use Data::Dx;

my @arr1 = (1,,,,);
my @arr2 = (2=> => => =>);
my @arr3 = (3=>=>=>=>);
Dx \@arr1, \@arr2, \@arr3;
$ perl t.pl
#line 12 t.pl
\@arr1, \@arr2, \@arr3 = [[1], [2], [3]]

Laurent Dami • Mar 4

aha, interesting, I never thought of that, thanks. But I can't imagine any good use case for exploiting this "feature" in a reasonable way.

Bernhard Schmalhofer • Mar 4 • Edited

Neither can I. I thought about a wall of code layout, but that is not nice at all.

`
$ cat t.pl
use v5.24;
use strict;
use warnings;

use Data::Dx;

my %roads = (
short =>=>=>=>=>=>=>=>=> 'alley',
very_long_and_winded => 'highway',
tough =>=>=>=>=>=>=>=>=> 'street',
);

Dx %roads;
bernhard@bernhard-Aspire-A515-57:~/devel/OTOBO/otobo$ perl t.pl
#line 13 t.pl
%roads = { short => "alley", tough => "street", very_long_and_winded => "highway" }
`