DEV Community

Discussion on: Quick and easy way of counting UTF-8 characters in Javascript

galdolber profile image
Gal Dolber • Edited on

Hi Alexandru,

Nice post, I recently had to deal with this.
There are some cases where destructuring wont work, for example with punctuation.

▶ 9

My original code is in clojurescript:

In javascript:
"וְאֵ֗לֶּה".split(/(\P{Mark}\p{Mark}*)/u).filter((val) => val)
▶ ["וְ", "אֵ֗", "לֶּ", "ה"]

coolgoose profile image
Alexandru Bucur Author

Hi Gal,

That's really interesting, any idea why that might be the case ?

galdolber profile image
Gal Dolber

I think is because punctuation symbols are separate unicode characters that are collapsed into the first preceding non-Mark character.

Example: ד ָ דָ

So if you want to count the visible characters, you need to account for the marks.