DEV Community

Perl 7: A Modest Proposal

Dan Book on February 08, 2021

My previous two blog posts (Perl 7: A Risk-Benefit Analysis and Perl 7 By Default) explored the reasons that a Perl 7 with incompatible interpreter...
Collapse
 
fgasper profile image
Felipe Gasper • Edited

utf8, of course, does more than merely “declare” to Perl that the source is UTF-8; it also makes Perl decode() strings. Thus, simple one-liners like this:

perl -M7 -E'print "épée'''
Enter fullscreen mode Exit fullscreen mode

… will print mojibake, thusly:

> perl -Mutf8 -e'print "épée"'
�p�e
Enter fullscreen mode Exit fullscreen mode

If utf8 is to be on by default, should we not preserve the functionality of such simple one-liners? That would entail making STDOUT automatically encode to UTF-8. And then if STDOUT is UTF-8, should STDIN be?

My experience, FWIW, has been that utf8 makes sense only if you care about the strings as text. I, at least, have only been in that scenario a few times. Generally when multi-byte UTF-8 characters come across code I’ve written, I don’t care about the characters themselves; I’m just doing I/O.

Collapse
 
grinnz profile image
Dan Book • Edited

If a user does use v7 or -M7 under this proposal, part of what they have opted into is for their source code to be decoded to characters. Changing the behavior of the global STDIN and STDOUT handles in a reasonable way is unfortunately impossible, but you can already do that with -CS if you accept the consequences.

My oneliners often use ojo which already enables the utf8 pragma and it's operated as expected. Data that flows from STDIN to STDOUT would be unchanged by this, though you already needed to use -CS or appropriate decoding and encoding if you want to operate on it as text. Unfortunately there is no way around learning how and when character encoding occurs if you want to interact with text as bytes.

Collapse
 
fgasper profile image
Felipe Gasper • Edited

The proposal here, though, defines what someone opts into.

All the other pieces of your proposal seem, at least from my own vantage point, to be “easy wins”. Auto-decode without an auto-encode, though, seems ripe for subtle misuse. If Perl could somehow mark the PVs as decoded, and always trigger a warning or error on output, I’d be less concerned.

Forgive my ignorance, but why would enabling -CS by default be any less feasible than use utf8 by default?

Thread Thread
 
grinnz profile image
Dan Book • Edited

I don't see how use utf8 is analogous to auto-encoding on STDOUT - the opposite is of course auto-decoding from STDIN. use utf8 is instead a lexical declaration of how the source code shall be interpreted.

The problem with -CS and any other application of layers to STDIN/STDOUT/STDERR is that the handles and any layers applied to them are global. So for example, it will cause Mojo::Log's encoded output to STDERR to be double-encoded. (This experiment was attempted in Perl 5.8.0 and failed miserably.)

Thread Thread
 
fgasper profile image
Felipe Gasper

If there were a variant of use utf8 that didn’t auto-decode strings in the source, I’d be much less concerned. But the issue I see with defaulting use utf8 to on is that it would break any code like this:

perl -e'print "épée"'
Enter fullscreen mode Exit fullscreen mode

In fact, it’ll even break things like this:

my $text = utf8_decode("épée");
_send_to_dbus($text);
Enter fullscreen mode Exit fullscreen mode

Ostensibly the goal of Perl 7 would be to define a set of defaults that only break “undesirable” practices. Changing the value of hard-coded strings in the source code seems likely to break a lot of things and thus deter people from using the new set of defaults.

Thread Thread
 
grinnz profile image
Dan Book

"A variant of use utf8 that didn't auto-decode strings in the source" would be a no-op - that is the only thing use utf8 does.

I appreciate your opinion though I believe it would be more helpful to new code than harmful. The purpose of use v7 is of course not to blindly apply to existing code - as proposed, it will also break any code defining subroutine prototypes, for example.

Thread Thread
 
fgasper profile image
Felipe Gasper

Prototypes have been “gently discouraged” for some time, though, AFAIK. More so, I think, than writing new Perl without use utf8.

use utf8 seems the most disruptive of the changes you propose—disruptive insofar as that developers themselves would need to exercise especial care when writing new code or porting existing code. use v7 defined with use utf8 would be problematic where I work, for example, where strings are understood by default to be undecoded/binary/encoded. Whereas enabling strict/warnings/signatures will generate “loud”, easily-fixed breakages, breakages from auto-decode of strings in source seem likely to be subtler.

Anyhow … the appreciation of opinions is mutual. :) We’ll see what comes. Thanks!

Collapse
 
kraih profile image
Sebastian Riedel

Agreed, those are very sensible goals.

Collapse
 
fgasper profile image
Felipe Gasper

Also, does use utf8 slow Perl down by storing strings internally as upgraded?

To get the length() of an upgraded string, Perl has to parse the individual characters. But the length() of a downgraded string is just its SvCUR.

Collapse
 
grinnz profile image
Dan Book

Operating on unicode is of course always slower. But only non-ascii strings are stored upgraded by use utf8. So the performance impact is necessary to get the correct length of such strings. (It's also cached in MG_LEN after the first access.)

Collapse
 
defc0n profile image
Mitch McCracken

add async+await and we are in business :)

Collapse
 
tobyink profile image
Toby Inkster

For one-liners, it would be good if -E automatically enabled all the "positive" v7 features by default. -e would still be backwards-compatible.

Collapse
 
grinnz profile image
Dan Book • Edited

-E already has the behavior of enabling the feature bundle of the current Perl version. It does not enable strict, and so I would suggest it should not enable warnings or utf8 either (as mentioned in the post, -M7 can be used to apply whatever use v7 may end up entailing).

Collapse
 
kjetillll profile image
kjetillll

Perl7 is a good idea. I use List::Util or List::MoreUtils all the time and was thinking perl7 could have at least some of those as built-ins. sum, min, max, uniq, zip and then some. Maybe even File::Slurp since handling file content all at once, for most files, have become a lot cheaper since perl started.

Collapse
 
pwr22 profile image
Peter Roberts

Sounds great to me!

Collapse
 
sigzero profile image
sigzero

Those are great suggestions. I hope they are all adopted for Perl 7. Thanks for the write up.

Collapse
 
thibaultduponchelle profile image
Tib

Great article, as usual.