In Swift, a Character is an extended grapheme cluster, which will consist of one-or-more Unicode scalar values. It's what a reader of a string will perceive as a single character. And a String consists of zero or more Characters.
This is, I think, the compromise that comes closest to making sense. Check out the examples at grapheme-splitter -- I think the resulting graphemes align closely with the intuitive definition of a "character". However, think about how you would access and manipulate these graphemes programmatically: one code point at a type (or even one byte at a time). There's a disconnect between the programmer's understanding of a character and the layperson's understanding of a character. What I'm arguing is that eliminating the term "character" should eliminate that ambiguity.
The API in Swift allows getting to a UTF-8 Encoding Unit, or a UTF-16 Encoding Unit, or a UTF-32 Codepoint. Treating them as an index into an array of those sub-Character things. (Depending on what the developer is trying to do.)
Swift and Python 3 both seem to have a good handle on Unicode strings.
Alas, I have to work with C++, which has somewhat underwhelming support for Unicode.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
In Swift, a
Character
is an extended grapheme cluster, which will consist of one-or-more Unicode scalar values. It's what a reader of a string will perceive as a single character. And aString
consists of zero or moreCharacters
.This is, I think, the compromise that comes closest to making sense. Check out the examples at grapheme-splitter -- I think the resulting graphemes align closely with the intuitive definition of a "character". However, think about how you would access and manipulate these graphemes programmatically: one code point at a type (or even one byte at a time). There's a disconnect between the programmer's understanding of a character and the layperson's understanding of a character. What I'm arguing is that eliminating the term "character" should eliminate that ambiguity.
The API in Swift allows getting to a UTF-8 Encoding Unit, or a UTF-16 Encoding Unit, or a UTF-32 Codepoint. Treating them as an index into an array of those sub-Character things. (Depending on what the developer is trying to do.)
Swift and Python 3 both seem to have a good handle on Unicode strings.
Alas, I have to work with C++, which has somewhat underwhelming support for Unicode.