Mark Gardner

Posted on Jul 20, 2021 • Originally published at phoenixtrap.com on Jul 20, 2021

The four noisy horsemen of Perl hate

#perl #regex #hate

I publish Perl stories on this blog once a week, and it seems every time there’s at least one response on social media that amounts to, “I hate Perl because of its weird syntax.” Or, “It looks like line noise.” (Perl seems to have outlasted that one—when’s the last time you used an acoustic modem?) Or the quote attributed to Keith Bostic: “The only language that looks the same before and after RSA encryption.”

So let’s address, confront, and demystify this hate. What are these objectionable syntactical, noisy, possibly encrypted bits? And why does Perl have them?

Regular expressions

Regular expressions, or regexps, are not unique to Perl. JavaScript has them. Java has them. Python has them as well as another module that adds even more features. It’s hard to find a language that doesn’t have them, either natively or through the use of a library. It’s common to want to search text using some kind of pattern, and regexps provide a fairly standardized if terse mini-language for doing so. There’s even a C‑based library called PCRE, or “Perl Compatible Regular Expressions,” enabling many other pieces of software to embed a regexp engine that’s inspired by (though not quite compatible) with Perl’s syntax.

Being itself inspired by Unix tools like grep, sed, and awk, Perl incorporated regular expressions into the language as few other languages have , with binding operators of =~ and !~ enabling easy matching and substitutions against expressions, and pre-compilation of regexps into their own type of value. Perl then added the ability to separate regexps by whitespace to improve readability, use different delimiters to avoid the leaning-toothpick syndrome of escaping slash (/) characters with backslashes (\), and name your capture groups and backreferences when substituting or extracting strings.

All this is to say that Perl regular expressions can be some of the most readable and robust when used to their full potential. Early on this helped cement Perl’s reputation as a text-processing powerhouse, though the core of regexps’ succinct syntax can result in difficult-to-read code. Such inscrutable examples can be found in any language that implements regular expressions; at least Perl offers the enhancements mentioned above.

Sigils

Perl has three built-in data types that enable you to build all other data structures no matter how complex. Its variable names are always preceded by a sigil, which is just a fancy term for a symbol or punctuation mark.

A scalar contains a string of characters, a number, or a reference to something, and is preceded with a $ (dollar sign).
An array is an ordered list of scalars beginning with an element numbered 0 and is preceded with a @ (at sign).
A hash , or associative array, is an unordered collection of scalars indexed by string keys and is preceded with a % (percent sign).

So variable names $look @like %this. Individual elements of arrays or hashes are scalars, so they $look[0] $like{'this'}. (That’s the first element of the @look array counting from zero, and the element in the %like hash with a key of 'this'.)

Perl also has a concept of slices, or selected parts of an array or hash. A slice of an array looks like @this[1, 2, 3], and a slice of a hash looks like @that{'one', 'two', 'three'}. You could write it out long-hand like ($this[1], $this[2], $this[3]) and ($that{'one'}, $that{'two'}, $that{'three'} but slices are much easier. Plus you can even specify one or more ranges of elements with the .. operator, so @this[0 .. 9] would give you the first ten elements of @this, or @this[0 .. 4, 6 .. 9] would give you nine with the one at index 5 missing. Handy, that.

In other words, the sigil always tells you what you’re going to get. If it’s a single scalar value, it’s preceded with a $; if it’s a list of values, it’s preceded with a @; and if it’s a hash of key-value pairs, it’s preceded with a %. You never have to be confused about the contents of a variable because the name will tell you what’s inside.

Data structures, anonymous values, and dereferencing

I mentioned earlier that you can build complex data structures from Perl’s three built-in data types. Constructing them without a lot of intermediate variables requires you to use things like:

lists , denoted between ( parentheses )
anonymous arrays , denoted between [ square brackets ]
and anonymous hashes , denoted between { curly braces }.

Given these tools you could build, say, a scalar referencing an array of street addresses, each address being an anonymous hash:

$addresses = [
  { 'name' => 'John Doe',
    'address' => '123 Any Street',
    'city' => 'Anytown',
    'state' => 'TX',
  },
  { 'name' => 'Mary Smith',
    'address' => '100 Other Avenue',
    'city' => 'Whateverville',
    'state' => 'PA',
  },
];

(The => is just a way to show correspondence between a hash key and its value, and is just a funny way to write a comma (,). And like some other programming languages, it’s OK to have trailing commas in a list as we do for the 'state' entries above; it makes it easier to add more entries later.)

Although I’ve nicely spaced out my example above, you can imagine a less sociable developer might cram everything together without any spaces or newlines. Further, to extract a specific value from this structure this same person might write the following, making you count dollar signs one after another while reading right-to-left then left-to-right:

say $$addresses[1]{'name'};

We don’t have to do that, though; we can use arrows that look like -> to dereference our array and hash elements:

say $addresses->[1]->{'name'};

We can even use postfix dereferencing to pull a slice out of this structure, which is just a fancy way of saying “always reading left to right”:

say for $addresses->[1]->@{'name', 'city'};

Which prints out:

Mary Smith
Whateverville

Like I said above, the sigil always tells you what you’re going to get. In this case, we got:

a sliced list of values with the keys 'name' and 'city' out of…
an anonymous hash that was itself the second element (counting from zero, so index of 1) referenced in…
an anonymous array which was itself referenced by…
the scalar named $addresses.

That’s a mouthful, but complicated data structures often are. That’s why Perl provides a Data Structures Cookbook as the perldsc documentation page, a references tutorial as the perlreftut page, and finally a detailed guide to references and nested data structures as the perlref page.

Special variables

Perl was also inspired by Unix command shell languages like the Bourne shell (sh) or Bourne-again shell (bash), so it has many special variable names using punctuation. There’s @_ for the array of arguments passed to a subroutine, $$ for the process number the current program is using in the operating system, and so on. Some of these are so common in Perl programs they are written without commentary, but for the others there is always the English module, enabling you to substitute in friendly (or at least more awk-like) names.

With use English; at the top of your program, you can say:

$LIST_SEPARATOR instead of $"
$PROCESS_ID or $PID instead of $$
the @{^CAPTURE} array instead of the numbered regular expression capture variables like $1, $2, and $3
et cetera.

All of these predefined variables, punctuation and English names alike, are documented on the perlvar documentation page.

The choice to use punctuation variables or their English equivalents is up to the developer, and some have more familiarity with and assume their readers understand the punctuation variety. Other less-friendly developers engage in “code golf,” attempting to express their programs in as few keystrokes as possible.

To combat these and other unsociable tendencies, the perlstyle documentation page admonishes, “Perl is designed to give you several ways to do anything, so consider picking the most readable one.” Developers can (and should) also use the perlcritic tool and its included policies to encourage best practices, such as prohibiting all but a few common punctuation variables.

Conclusion: Do you still hate Perl?

There are only two kinds of languages: the ones people complain about and the ones nobody uses.

Bjarne Stroustrup, designer of the C++ programming language

It’s easy to hate what you don’t understand. I hope that reading this article has helped you decipher some of Perl’s “noisy” quirks as well as its features for increased readability. Let me know in the comments if you’re having trouble grasping any other aspects of the language or its ecosystem, and I’ll do my best to address them in future posts.

Top comments (1)

Tib • Jul 20 '21

Very good post with several nice links! Thank you @mjgardner