DEV Community

Nicholas Hubbard
Nicholas Hubbard

Posted on

An Interesting Perl Pattern That Doesn't Work

I recently came up with a pattern that is supposed to use a closure to protect a configuration hash from being mutated by its callers. Unfortunately this pattern has a terrible flaw.

{
    my %config;

    sub config {
        return %config if %config;
        %config = create_config();
        return %config;
    }
}
Enter fullscreen mode Exit fullscreen mode

The config() subroutine is a lexical closure over the %config hash. No other code in a program would be able to access the %config variable, as everything is defined in its own block.

As an aside, this pattern could just as easily be written with a state variable, but I find it harder to explain the pattern when done this way.

Blocks

To understand this pattern we first have to understand how blocks work. Here is a code example that shows how blocks work:

{
    my $var = 'foo';
    print "from inside the block where \$var is defined \$var = $var\n";

    {
        print "from the most inner block \$var = $var\n";
    }
}

print "from outside the block \$var = $var\n";

__END__

$ perl tmp.pl
from inside the block where $var is defined $var = foo
from the most inner block $var = foo
from outside the block $var = 
Enter fullscreen mode Exit fullscreen mode

This code shows that a block introduces a lexical scope. Variables defined in a lexical scope are only available in its defining scope, and scopes nested inside their defining scope. The program output shows that the $var variable is available in the scope it is defined in, and from the scope nested in its defining scope. However, outside of its defining scope, $var is not defined.

If we turn on strict we get a fatal compilation error for trying to use $var from outside its defining scope:

Global symbol "$var" requires explicit package name (did you forget to declare "my $var"?) at tmp.pl line 14.
Execution of tmp.pl aborted due to compilation errors.
Enter fullscreen mode Exit fullscreen mode

You may be wondering if the config() subroutine is accessible from outside the block it was defined in. The answer is that it is indeed available, because subroutine declarations are always global to the current package. This code example shows this fact:

use strict;

{
    sub foo {
        print "hello from foo!\n";
    }
}

foo();

__END__

$ perl tmp.pl
hello from foo!
Enter fullscreen mode Exit fullscreen mode

Closures

Now that we understand blocks we can understand closures. In Perl, a closure is a subroutine that has access to the lexical environment that it was defined in. Here is the classic example:

use strict;

{
    my $n = 0;        

    sub increment {
        $n += 1;
        return $n;
    }

    sub decrement {
        $n -= 1;
        return $n;
    }
}

print 'increment() -> ', increment(), "\n";
print 'increment() -> ', increment(), "\n";
print 'decrement() -> ', decrement(), "\n";
print 'increment() -> ', increment(), "\n";
print 'increment() -> ', increment(), "\n";
print 'decrement() -> ', decrement(), "\n";

__END__

$ perl tmp.pl
increment() -> 1
increment() -> 2
decrement() -> 1
increment() -> 2
increment() -> 3
decrement() -> 2
Enter fullscreen mode Exit fullscreen mode

The increment() and decrement() subroutines are both able to access the $n variable, though no other subroutines in a larger program would be able to, due to the outer block. For this reason the increment() and decrement() subroutines are closures over $n;

The Problem

We should now have all the knowledge needed to understand the pattern that this article is about.

The idea of the pattern is that if the %config variable has already been set then we just return it, and otherwise we set its value before returning it. This means that %config will only be set the first time that we call config(), and on all subsequent calls it will simply be returned. Therefore config() can be thought of as a function constant ... right?

Here is a code example where our pattern works as expected:

use strict;

{
    my %config;

    sub config {
        return %config if %config;
        %config = create_config();
        return %config;
    }
}

sub create_config {
    print "hello from create_config()\n";
    return (foo => 12);
}

my %config1 = config();

$config1{foo} = 1004;

my %config2 = config();

print "%config2's foo key = $config2{foo}\n";

__END__

$ perl tmp.pl
hello from create_config()
%config2's foo key = 12
Enter fullscreen mode Exit fullscreen mode

This output displays a couple of important points. First, we know that the create_config() subroutine was only invoked a single time, even though we invoked config() twice. We know this because the "hello from create_config()" message is only printed a single time. The other important thing to note is that because we got the output "%config2's foo key = 12", we know that our modification of %config1's foo key (which we set to 1004), did not effect the %config variable that our config() subroutine closes over. If it had, then %config2's foo key would associate to 1004.

So what is the problem? Well ... everything falls apart when the %config variable is set to a multi-dimensional data structure. The following code encapsulates the fundamental problem with our pattern:

use strict;

{
    my %config;

    sub config {
        return %config if %config;
        %config = create_config();
        return %config;
    }
}

sub create_config {
    return (foo => [1, 2, 3]);
}

my %config1 = config();

$config1{foo}->[0] = 1004;

my %config2 = config();

print "%config2's foo key = [", join(', ', @{$config2{foo}}), "]\n";

__END__

$ perl tmp.pl
%config2's foo key = [1004, 2, 3]
Enter fullscreen mode Exit fullscreen mode

Uh oh! We were able to mutate the %config variable that config() closes over, which means that config() is not actually a constant function. Now we come to the fundamental problem of our pattern. Because in Perl multi-dimensional data structures are made up of references, and perl does not perform deep-copying by default, we are able to mutate the underlying references of multi-dimensional data structures.

Here is a code example that shows that Perl does not perform deep-copying:

use strict;

my @array1 = ([1, 2, 3], [4, 5, 6], [7, 8, 9]);

print '@array1 contents:', "\n";
for my $elem (@array1) {
    print "    $elem\n";
}

# copy @array1 to @array2
my @array2 = @array1;

print '@array2 contents:', "\n";
for my $elem (@array2) {
       print "    $elem\n";
}

__END__

$ perl tmp.pl
@array1 contents:
    ARRAY(0x13875e8)
    ARRAY(0x13d1ef8)
    ARRAY(0x13d1fe8)
@array2 contents:
    ARRAY(0x13875e8)
    ARRAY(0x13d1ef8)
    ARRAY(0x13d1fe8)
Enter fullscreen mode Exit fullscreen mode

In this programs output we can see that @array1 and @array2 contain the exact same references, which means that Perl does not perform deep-copying. If Perl did perform deep-copying, then when we copied @array1 into @array2, Perl would have made (recursive) copies of all the references in @array1 into new refererences. Perl's lack of deep-copying is the fundamental flaw of our pattern, as it means that we can modify %config's references from its copies that are returned by config().

Solutions

There are many ways we can solve this problem. First, we could use lock_hash_recurse from the core Hash::Util module to lock %config. After locking %config, we would get an error if we tried to mutate any of its values.

We could also use Const::Fast from CPAN to make %config an actual read-only hash. Similarly to locking the hash, we would get an error if we tried to mutate %config.

Finally, we could use Clone from CPAN to return a deep-copy of %config from the config() subroutine. Unlike the other solutions, our code could freely modify copies of %config without getting any errors, but these modifications would not affect the actual %config.

Top comments (3)

Collapse
 
davorg profile image
Dave Cross

is supposed to use a closure to return a read-only configuration

Where have you seen this pattern described like that? There's nothing in that code that even hints at the returned hash being read-only.

Collapse
 
nicholasbhubbard profile image
Nicholas Hubbard • Edited

The fact that %config is returned by value instead of by reference is what hints that %config may not be able to be mutated. Perhaps it wasn't clear, but the point was not for config() to return a read-only value, but for %config to be un-mutatable by config()'s callers. I updated the articles opening paragraph to better communicate the patterns intent.

Collapse
 
rlauer6 profile image
Rob Lauer • Edited

but the point was not for config() to return a read-only value, but for %config to be un-mutatable by config()'s callers.

Even if that were the case you did not alter the hash element, you altered an array element pointed to by the hash. You could have made the hash and the array read only by using Readonly. If you are looking to create ro elements before you know about them, consider using Class::Accessor which has methods that would have allowed you to make things ro at run-time. Moreover, if this pattern was for providing a dynamic ro configuration variable facility I can't help but think there might be more than one CPAN module that already provides that in a standard, efficient manner. ;-)

p.s. I use the closure method typically when I want to cache elements, not protect them from destruction.