Here is a discipline I am trying to adopt in my Python programs: use "My string".casefold() instead of "My string".lower() when comparing strings irrespective of case.
When checking for string equality, in which I don't care about uppercase vs. lowercase, it is tempting to do something like this:
if "StrinG".lower() == "string".lower():
print("Case-insensitive equality!")
Of course, it works.
But some human languages have case rules that function with a bit more nuance. Let's say we have three strings with slightly different ways of writing Kubernetes (writing Kubernetes in Greek makes you sound doubly smart).
k8s = "ΚυβερνΉτης"
k8S = "ΚυβερνΉτηΣ"
k8s_odd = "ΚυβερνΉτησ" # Apologies to the scribes of Athens
These three are all mixed-case strings. The first one correctly ends with a final lowercase sigma, the second one has a capital sigma, and that last one, oddly, has a non-final sigma.
Let's imagine we have a use case in which we want to consider all of these as equal. Would str.lower() work?
>>> k8s.lower()
'κυβερνήτης'
>>> k8S.lower()
'κυβερνήτης'
>>> k8s_odd.lower()
'κυβερνήτησ'
Apparently not.
Using str.casefold() instead:
>>> k8s.casefold()
'κυβερνήτησ'
>>> k8S.casefold()
'κυβερνήτησ'
>>> k8s_odd.casefold()
'κυβερνήτησ'
All are equal! Exactly what we want for case-insensitive string comparison.
One should not use str.casefold() if you are aiming for clean spellings. str.upper().lower() might yield a more printable result:
>>> k8s_odd.upper().lower()
'κυβερνήτης'
But for case-insensitive comparison that respects a wide range of human languages, str.casefold() is our friend.
References
- Python docs on
str.casefold - The Unicode Standard, Section "3.13 Default Case Algorithms" on page 150 of chapter 3
Top comments (0)