DEV Community

Discussion on: Quick and easy way of counting UTF-8 characters in Javascript

Collapse
galdolber profile image
Gal Dolber • Edited on

Hi Alexandru,

Nice post, I recently had to deal with this.
There are some cases where destructuring wont work, for example with punctuation.

[..."וְאֵ֗לֶּה"].length
▶ 9

My original code is in clojurescript:
gist.github.com/galdolber/1568e767...

In javascript:
"וְאֵ֗לֶּה".split(/(\P{Mark}\p{Mark}*)/u).filter((val) => val)
▶ ["וְ", "אֵ֗", "לֶּ", "ה"]

Collapse
coolgoose profile image
Alexandru Bucur Author

Hi Gal,

That's really interesting, any idea why that might be the case ?

Collapse
galdolber profile image
Gal Dolber

I think is because punctuation symbols are separate unicode characters that are collapsed into the first preceding non-Mark character.

Example: ד ָ דָ

So if you want to count the visible characters, you need to account for the marks.