Python is broken? Strings are sequences of strings 🤦

#python #typing #programming #pep

This is a piece of Python code that works, and no linter will complain about it.

def average_length(strings: Sequence[str]) -> float:
    total_length = sum(len(string) for string in strings)
    return float(total_length) / len(strings)

print(average_length(["foo", "bar", "spam"]))
#> 3.3333333333333335

print(average_length("Hello World!"))
#> 1.0

It looks like "Hello World!" shouldn't be a valid input for the average_length function. The intent is obviously to calculate an average length of a sequence of strings like ["foo", "bar", "spam"]. Why is this happening?

Well, an str instance is a sequence of strings as well: it has length, it's iterable, and it even can be reversed (see https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes). So, from the interpreter's point of view, any string in Python is just a sequence of one-character strings.

I'm not saying that Python is broken. It is the way it is, and it is not changing soon. However, what could be changed that will cause minimal harm but still will make things a bit better? I suggest introducing a built-in single character type, i.e. char. This immediately will result in str becoming Sequence[char] and not Sequence[str]. Other than requiring length to be precisely one, the char type could behave exactly like str.

Also, things like chr and ord functions could benefit from a more precise type annotation using char instead of str.

I think this would be a great addition to the language.