DEV Community

Discussion on: 100 Languages Speedrun: Episode 47: Raku (Perl 6) Regular Expressions

Collapse
 
taw profile image
Tomasz Wegrzanowski

Yes, it is a massive bug. It causes a lot of programs to match a lot more than they expect, including very likely a lot of security validations. Everyone including people who wrote those docs assumes \d matches ASCII digits only, and this is needed for basically any parsing of either machine format or human text.

It is exceedingly rare to want to match <:Nd> (I double anyone ever actually used that), and if you absolutely need to, well, you can say <:Nd>, or more likely some more specific range.

It won't even do for extracting numbers from natural language text, as most common numerical systems (Roman and Chinese numerals) don't match <:Nd> as they reuse letters.

Collapse
 
jj profile image
Juan Julián Merelo Guervós

They don't really reuse letter codepoints; they use a different codepoint in Unicode. They match <:N> alright, and also <:Nl>:

raku -e 'say "Ⅻ " ~~ /<:Nl>/'
「Ⅻ」
Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
taw profile image
Tomasz Wegrzanowski

Nice one, I didn't know they had separate characters for Roman numerals in Unicode. I don't think it's actually used in the wild much, still, nice.