DEV Community

Discussion on: Perl, Unicode, and Bytes

Collapse
 
fgasper profile image
Felipe Gasper

Decode it so others can continue to pretend like Perl has one type of string.

This isn’t a pretense, though; it’s the literal truth. What defines a Perl string is its sequence of code points. Nothing more.

And there is no reason to ever not use utf8 in your source files.

Source-decode by default makes some sense. I would personally rather it be deferred, though, until Perl can tell whether a string is decoded or not. There’s enough Perl out there already that screws this stuff up; changing recommended defaults without providing any additional “guard rails” seems likely to confuse.

I’m also—as I related in a thread on a recent article Dan wrote proposing that use utf8 be part of use v7—a bit worried about STDIN, pipes, and the like still defaulting to undecoded when the source code auto-decodes. If we’re going to source-decode, I’d rather we go the extra mile and make inputs/outputs default to UTF-8, or maybe ape node.js and require that an encoding be specified in order to create a filehandle.