Laurent Dami

Posted on Mar 11

Beautiful Perl feature : two-sided constructs, in list or in scalar context

#perl #programming #beautifulperl

Beautiful Perl series

This post is part of the beautiful Perl features series.
See the introduction post for general explanations about the series.

Today's topic is about two-sided constructs that behave differently when used in list context or in scalar context: this is a feature unique to Perl, often disconcerting for people coming from other programming backgrounds, but very convenient once you are used to it.

The notion of context

Natural languages are full of ambiguities, as can be seen with the well-known sentences "Time flies like an arrow; fruit flies like a banana", where the same words are either nouns or verbs, depending on the context.

Programming languages cannot afford to have ambiguities because the source code must be translatable into machine instructions in a predictable way. Therefore most languages have unambiguous syntactic constructs that can be parsed bottom-up without much complication. Perl has a different approach: some core operators - and also some user-written subroutines - have two-sided meanings, without causing any ambiguity problem because the appropriate meaning is determined by the context in which these operators are used.

Technically Perl has three possible contexts: list context, scalar context and void context; but the void context is far less important than the other two and therefore will not be discussed here. An example of a list context is foreach my $val (1, 2, 3, 5, 8) {...}, where a list of values is expected within the parenthesis; an example of a scalar context is if ($x < 10) {...} where a scalar boolean condition is expected within the parenthesis. Some of the common Perl idioms that depend on context are:

construct	result in list context	result in scalar context
an array variable `@arr`	members of the array	number of array elements
the readline operator `<STDIN>`	list of all input lines	the next input line
the glob operator `<*.pl>`	list of all files matching the pattern	the next file that matches the pattern
regular expression with the `/g` flag for "global match"	all strings captured by all matches	boolean result of the next match attempt
the `localtime` function	list of numbers as seconds, minutes, hours, etc.	a string like "Fri Mar 6 23:00:12 2026"

These are just a few examples; read perlfunc and perlop for reference to many other context-sensitive constructs.

The advantage of having two different but related meanings for the same construct is that it reduces the number of constructs to learn. For example just remember that a regex with a /g flag is a global match, and Perl will "do the right thing" depending on where you use it; so given:

my $regex_p_word = qr( \b   # word boundary
                       p    # letter 'p'
                       \w+  # one or more word letters
                     )x;

you can either write:

my @words_starting_with_p = $text =~ /$regex_p_word/g;

while ($text =~ /$regex_p_word/g) {
  do_something_with($&);  # $& contains the matched string
}

Reducing the number of constructs is quite helpful in a rich language like Perl where the number of core functions and operators is large; but of course it requires that programmers are at ease with the notion of context. Perl textbooks put strong emphasis on this aspect of the Perl culture: for example the "Learning Perl" book starts the section on context by saying:

This is the most important section in this chapter. In fact, it’s the most important section in the entire book. In fact, it wouldn’t be an exaggeration to say that your entire career in using Perl will depend upon understanding this section.

Context-sensitive constructs also contribute to make the source code more concise and focused on what the programmer wanted to achieve, leaving aside the details; this is convenient when readers just need an overview of the code, for example when deciding whether to adopt a module or not, or when explaining an algorithm to a business analyst who doesn't know Perl (yes I did this repeatedly in my career, and it worked well - so don't tell me that Perl is not readable!).

This is not to say that the details can always be ignored; of course the people in charge of maintaining the code need to be aware of all the implications of context-sensitive operations.

Relationship between the list result and the scalar result

For every context-sensitive construct, the results in list context and in scalar context must somehow be related; otherwise it would be incomprehensible. But what would be a sensible relationship between the two contexts? Most Perl core constructs are built along one of those two patterns:

the scalar result is a condensed version of the list result, like the @arr or localtime examples in the table above;
the scalar result is an iterator on some implicit state, like the <STDIN> or glob examples in the same table.

When the scalar result is a condensed version, more detailed information may nevertheless be obtained by other means: for example, although a regular expression match in scalar context just returns a boolean result, various details about the match (the matched string, its position, etc.) can be retrieved through global variables.

When the scalar result is an iterator, it is meant to be called several times, yielding a different result at each call. Depending on the iterator, a special value is returned at the end to indicate to the caller that the iteration is finished (usually this value is an undef). This concept is quite similar to Python's generator functions or JavaScript's function* construct, except that each of the Perl core operators is specialized for one particular job (iterating on lines in a file, or on files in a directory, or on occurrences of a regex in some text). Such iterators are particularly useful for processing large data, because they operate lazily, one item at a time, without loading the whole data into memory.

As an aside, let us note that unlike Python or JavaScript, Perl does not have a builtin construct for general-purpose iterators; but this is not really needed because iterators can be constructed through Perl's closures, as beautifully explained in the book Higher-Order Perl - quite an ancient book, but essential and still perfectly valid. There are also several CPAN modules that apply these techniques for easier creation of custom iterators; Iterator::Simple is my preferred one.

I said that the two patterns just discussed cover most core constructs ... but there is an exception: the range operator .., like the documentation says, is "really two different operators depending on the context", so the meanings in list context and in scalar context are not related to one another. This will be discussed in more detail in a future article.

Writing your own context-sensitive subroutines or methods

Context-sensitive operations are not limited to core constructs: any subroutine can invoke wantarray¹ to know in which context it is called so that it can adapt its behaviour. But this is only necessary in some very specific situations; otherwise Perl will perform an implicit conversion which in most cases is perfectly appropriate and requires no intervention from the programmer - this will be described in the next section.

In my own modules the places where I used wantarray were for returning condensed information:

in DBIx::DataModel, statement objects have an sql method that in list context returns ($sql, @bind), i.e. the generated SQL followed by the bind values. Here the default Perl conversion to scalar context would return the last bind value, which is of no use to the caller, so the method explicitly returns just $sql when called in scalar context;
in Search::Tokenizer, the tokenizer called in list context returns a tuple ($term, length($term), $start, $end, $term_index). When called in scalar context, it just returns the $term.

Implicit conversions

When an expression is not context-sensitive, Perl may perform an implicit conversion to make the result fit the context.

Scalar value in list context

If a scalar result is used in list context, the obvious conversion is to make it a singleton list:

my @array1 = "foo"; # converted to ("foo")

If the scalar is undef or an empty string, this will still be a singleton list, not the same thing as an empty list: so in

my @array2 = undef; # converted to (undef)
my @array3;         # initialized to ()

@array2 is a true value because it contains one element, while @array3 contains no element and therefore is a false value.

List value in scalar context

If a list value is used in scalar context, the initial members of the list are thrown away, and the context gets the last value:

my $scalar = (3, 2, 1, 0); # converted to 0

This behaviour is consistent with the comma operator inherited from C.

An array variable is not the same thing as a list value. An array is of course treated as a list when used in list context, but in scalar context it just returns the size of the array (an integer value). So in

my @countdown    = (3, 2, 1, 0);
my $should_start = @countdown ? "yes" : "no";
say $should_start;  # says "yes"

the array holds 4 members and therefore is true in scalar context; by contrast the mere list has value 0 in scalar context and therefore is false:

$should_start = (3, 2, 1, 0) ? "yes" : "no";
say $should_start;  # says "no"

Programming languages without context-sensitive constructs

Since context-sensitivity is a specialty of Perl, how do other programming languages handle similar situations? Simply by providing differentiated methods for each context! Let us look for example at the "global matching" use case, namely getting either a list of all occurrences of a regular expression in a big piece of text, or iterating over those occurrences one at a time.

Global match in JavaScript

In Perl a global match of shape $text =~ /$regex/g involves a string and a regex that are put together through the binding operator =~. In JavaScript, since there is no binding operator, regex matches are performed by method calls in either way:

the String class has methods:
- match(), taking a regex as argument, returning an array of all matches;
- matchAll(), taking a regex as argument, returning an iterator;
- search(), taking a regex as argument, returning the character index of the first match (and therefore ignoring the /g flag);
the RegExp class has methods:
- exec(), taking a string as argument, returning a "result array" that contains the matched string, substrings corresponding to capture groups, and positional information. When the regex has the /g flag for global match, the exec() method can be called repeatedly, iterating over the successive matches;
- test(), taking a string as argument, returning a boolean result.

The MDN documentation has a good guide on regular expresssions in JavaScript. The purpose here is not to study these methods in detail, but merely to compare with the Perl API: in JavaScript the operations have explicit method names, but they are more numerous. The fact that method names are english words does not dispense from reading the documentation, because it cannot be guessed from the method names that match() returns an array and matchAll() returns an iterator!

Global match in Python

Regular expressions in Python do not belong to the core language, but are implemented through the re module in the standard library. Matching operations are performed by calling functions in that module, passing a string and a regex as arguments, plus possibly some other parameters. Functions re.search(), re.match() and re.fullmatch() are variants for performing a single match; for global match, which is the subject of our comparison, there is no /g flag, but there are specific methods:

re.findall(), taking a regex, a string and possibly some flags as arguments, returning a list of strings;
re.finditer(), also taking a regex, a string and possibly some flags as arguments, returning an iterator yielding Match objects.

Conclusion

Thanks to context-sensitive operations, Perl expressions are often very concise and nevertheless convey to the hasty reader an overview of what is going on. Detailed comprehension of course requires an investment in understanding the notion of context, how it is transmitted from caller to callee, and how the callee can decide to give different responses according to the context. Newcomers to Perl may think that the learning effort is greater than in other programming languages ... but we have seen that in absence of context-sensitive operations, the complexity goes elsewhere, in a greater number of methods or subroutines for handling all the variant situations. So context-sensitivity is definitely a beautiful feature of Perl!

About the cover picture

This is a side-by-side view of the Victoria-Hall, the main concert hall in Geneva, where the stage holds either a full symphonic orchestra, or just a solo recital. Same place, different contexts!

as stated in the official documentation, wantarray is ill-named and should really be called wantlist ↩

DEV Community