Just out of curiosity, if the goal is simply to compare two strings, would this still be necessary? I assume that it must be possible to represent any string, regardless of encoding, in some lower-level way. All we'd want to do is to make sure that the two strings are the same when compared in this way, like a list of "characters" or even just an array of arbitrary values...
Yes. Presumably you want to do a real-world comparison, not a byte-for-byte comparison. There are numerous rules in Unicode on how that should be done. Thanks to combining characters, precomposed characters, ligatures, and some other features, there are numerous ways two equal strings can be encoded.
Just out of curiosity, if the goal is simply to compare two strings, would this still be necessary? I assume that it must be possible to represent any string, regardless of encoding, in some lower-level way. All we'd want to do is to make sure that the two strings are the same when compared in this way, like a list of "characters" or even just an array of arbitrary values...
Yes. Presumably you want to do a real-world comparison, not a byte-for-byte comparison. There are numerous rules in Unicode on how that should be done. Thanks to combining characters, precomposed characters, ligatures, and some other features, there are numerous ways two equal strings can be encoded.
Interesting and a bit scary! I’d have thought it should be possible to force some kind of ordering to prevent that...