A diacritical mark is a mark on letter that conveys meaning like a change in pronunciation, like an accent. Diacritical marks in Unicode can have one or more encodings.
For example the letter c with a cedilla (ç) can be the single character:
Character | UTF-8 encoding | Name |
---|---|---|
ç | 0xc3 0xa7 | LATIN SMALL LETTER C WITH CEDILLA |
or as two:
Character | UTF-8 encoding | Name |
---|---|---|
c | 0x63 | LATIN SMALL LETTER C |
0xcc 0xa7 | COMBINING CEDILLA |
The first is known as "normal form C" (NFC). The second is called "normal form D" (NFD) -- it is the canonical decomposition of normal form C.
Something interesting happens when you save a file with a name in NFC: macOS will not open it.
For example:
filename = 'français.txt'
with open(filename, 'wt') as f:
print('Bonjour!', file=f)
Try to double-click this file in Finder. TextEdit will show in the Dock, but nothing else happens.
Now close TextEdit, delete the file, and change the file name to its canonical decomposition (NFD):
import os
import unicodedata
filename = 'français.txt'
filename = unicodedata.normalize('NFD', filename)
try:
os.remove(filename)
except FileNotFoundError:
pass
with open(filename, 'wt') as f:
print('Bonjour!', file=f)
TextEdit will open this file just fine.
If you're wondering why a bunch of your files stopped opening after the latest upgrade to macOS (13.3.1), this is why...
Top comments (0)