DEV Community

Sven
Sven

Posted on β€’ Originally published at sven-dittmer.netlify.com

6 1

TIL: Match unicode letters in regex

Note: This post was originally posted on sven-dittmer.netlify.com.

Today I faced an interesting challenge on exercism.io.
To me, the most interesting part was about finding out if a sentence was
yelled or not. A sentence counts as yelled if all its word are upper case.

You can find the whole challenge here: Bob in elixir - exercism.io

Here's my first attempt:

String.upcase(input) == input
Enter fullscreen mode Exit fullscreen mode

This one passes some tests, but it failes for sentences that don't contain any word, like "1, 2, 3". Since there's no upper case word, it shouldn't count as yelled.
Next try:

input =~ ~r/\w/u && String.upcase(input) == input
Enter fullscreen mode Exit fullscreen mode

With this regex, I made sure the word contains at least on letter. And since I'm still checking that every letter is upper case, that should do the job, right? Well, not quite.

The exercise's last test was about a sentence yelled in Russian. (Link:
Bob in elixir test suite)
Turns out \w matches only letters of the latin alphabet. It is equivalent to [a-zA-Z]. To make the regex work, I'd have to match any unicode letter. After googling a little bit, I found the solutions here: Regular-Expressions.info. With PCRE we can match different categories of unicode with \p{<some_unicode_category>}. For letters, you can use \p{Letter} or its short form \p{L}. Here's the final code that makes the last test pass:

input =~ ~r/[\p{L}]/u && String.upcase(input) == input
Enter fullscreen mode Exit fullscreen mode

This is my first post on dev.to. Feedback is much appreciated!

Image of Datadog

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

πŸ‘‹ Kindness is contagious

Please leave a ❀️ or a friendly comment on this post if you found it helpful!

Okay