loading...

Why I stay away from Python type annotations

etenil profile image Guillaume Pasquet ・6 min read

Ever since optional static typing was added to Python 3.5+, the question of using type annotations keeps creeping back everywhere I work. Some see them as a step forward for the Future of Python™, but to me and many others it's a step back for what coding with Python fundamentally is. I've been in a number of debates over type annotations at work and so decided to compile some of the recurring points of discussion here.

Static typing will protect you

This is the argument universally put forward about type annotations. They'll save us from ourselves. Someone already did a study of this concept about Typescript, but I think looking at code will suffice.

Let's take some of the examples of type annotations you can easily find out there:

def concat(a: int, b: int) -> str:
    return str(a) + str(b)

Okay so you've written a custom concat that only operates on integers. But does it really? Python's str() will work with anything that supports it, not just int, so really this function will work with any two arguments that can be cast to strings. Here's the way this function should be written so that typing is enforced at runtime:

def concat(a: int, b: int) -> str:
    "Raises TypeError" # <- Type annotations don't support this
    if type(a) != int or type(b) != int:
        raise TypeError()
    return str(a) + str(b)

The parameter types aren't checked at runtime, therefore it is essential to check them yourself. This is particularly true if the code you're writing will be used as a library. And given python's modular nature, any code can be imported and re-used. Therefore relying on type annotations isn't sufficient to ensure that your code is safe and honours the contract outlined by the type annotations.

Readability

Another claim I see often is that type hints improve readability. Let's take a look.

def concat(a, b):
    ...

def concat(a: int, b: int) -> str:
    ...

Okay at face value this is actually clearer. Or at the very least the impact of typing doesn't affect readability. Now let's look at real life.

def serialize(instance, filename, content, **kwargs):
    ...

def serialize(instance: Instance, filename: str, content: Optional[Dict[str, Any]] = None, **kwargs: Any) -> bool:
    ...

Now that's becoming hairy. Don't laugh, this is inspired by real code I see daily.

So we have a function that serializes, god knows what, then it takes an instance, filename and some content. If we have the type annotated version, we can tell that the instance is an Instance confusingly, the filename is a str, and content is a horrible optional mess, it probably goes deeper which is why the author gave up and just put Any. It returns a boolean, but we have no idea what the boolean value means.

So in this case, the type hints just let us ask more questions, which could be a good thing. However let's be honest, this function wouldn't pass code review in either case.

Here's a slightly better one:

def serialize_foo_on_instance(instance, filename, content, **kwargs):
    ...

class Foo:
    data: dict[str, Any] = {}
    ...

def serialize_foo_on_instance(instance: Instance, filename: str, content: Optional[Foo], **kwargs: Any) -> bool:
    ...

Okay that's slightly better. The secret sauce here was just to improve our naming to make the function's role more explicit -- a best practice.

Note that to get rid of the lengthy type annotation we had to define a new class in the bottom option. This is the recommended way I've found. However there are times where adding abstraction layers isn't the right approach. They divorce the code from the original data and have a certain performance impact.

It's also possible to alias the type; but I still feel the typing is pushing me towards more abstraction.

Self-documenting code?

Let's have one more go to see if we can improve readability further:

def serialize_foo_on_instance(instance, filename, content, **kwargs):
    """
    Serializes foo on a specific instance of bar.
    Takes a foo data, serializes it and saves it as ``filename`` on
    an instance of bar.
    :instance: instance to serialize the foo on
    :filename: file name to serialize to
    :content: foo data, just creates the file if None
    :returns: True on success, False on error
    """
    ...

Okay, now we know what the function does, and what the parameters are supposed to be. Let's see how that looks with type annotations:

def serialize_foo_on_instance(instance: Bar, filename: str, content: Optional[Foo], **kwargs: Any) -> bool:
    """
    Serializes foo on a specific instance of bar.
    Takes a foo data, serializes it and saves it as ``filename`` on
    an instance of bar.
    :instance Bar: instance to serialize the foo on
    :filename str: file name to serialize to
    :content Optional[Foo]: foo data, just creates the file if None
    :returns bool: True on success, False on error
    """
    ...

Right so we're a bit more verbose and specified the types each parameters take. We've introduced docstrings for both our definitions, and they explain what the function does, the role of the parameters, what happens to optional ones and what the values of bool in the return means.

Could we do away with the docstring and solely rely on "self-documentation" through type annotations? Not a chance: -> bool doesn't say anything about what it means to receive either True or False. In the same way, Optional[Foo] doesn't give us a clue about what happens when the value is None.

Write generic code = reuse

Python is magnificent by how reusable it is. Every file you write is a module and can be reused for any purpose. Ages ago I wrote a software forge for Bazaar just by reusing modules from Bazaar itself even though they were never intented to be used that way. This transpires through the entire language, including the function definitions.

By clamping down on types, are we making our code less reusable? Possibly, let's experiment. Let's assume that instance is an object obtained from a string ID, and that we'd really like to use some kind of string generator for filename. Let's have a look:

class FilenameGenerator:
    def __str__(self):
        return "blah.txt"

def serialize_foo_on_instance(instance, filename, content, **kwargs):
    if type(instance) == str:
        instance = Bar.by_name(instance)
    ...

filename_gen = FilenameGenerator()
serialize_foo_on_instance("bob", filename_gen, content)
serialize_foo_on_instance("bob", filename_gen, content)

Pretty straight-forward here. Now let's annotate this.

def serialize_foo_on_instance(instance: Union[Bar, str], filename: Union[str, ??], content: Foo, **kwargs: Any):
    if type(instance) == str:
        instance = Bar.by_name(instance)

Wow, that's already more involved. But is it even truly generic? In other languages we'd use interfaces or abstract types and inheritance to make functions generic. I couldn't find the type name for any object that can be cast to a str so I put ?? for now.

Without type annotations, our code is generic right-off the bat. Possibly overly so, so we need to do some pre-flight checks. With annotations, our code is "specific" by default and we work hard to make it generic. Note the quotes around "specific" as this is only enforced by linting tools like mypy and so you still need to do your pre-flight checks. This is a fundamental shift in the nature of the language.

Python vs the world

A lot of developers like to claim that type hints are the best thing for the language since sliced bread. However I get the feeling those people only look at Python in a vacuum.

Python's selling points as a language are the readability and the ease of coding, which results in speed for the developers. Adding type annotations reduces those advantages greatly. And Python becomes less attractive in the programming world.

Below is a fibonacci implementation in Python:

from typing import List

def fibonacci(previous: List[int], length: int, depth: int = 0) -> List[int]:
    if depth == length:
        return previous
    previous.append(previous[-1] + previous[-2])
    return fibonacci(previous, length, depth + 1)

if __name__ == "__main__":
    start: List[int] = [1, 2]
    print(fibonacci(start, 100))

And the same with Rust:

fn fibonacci(previous: &mut Vec<u128>, length: u32, depth: u32) -> &Vec<u128> {
    if depth == length {
        return previous;
    }
    previous.push(previous[previous.len() - 2] + previous[previous.len() - 1]);
    fibonacci(previous, length, depth + 1)
}

fn main() {
    let mut start = vec![1, 2];
    println!("sequence: {:?}", fibonacci(&mut start, 100, 0))
}

Here we see that the difference between Python and a more advanced, powerful language has become minimal. Rust is much faster and allows me to do more than Python, so the question becomes: why choose Python?

Please don't pick on my choice of Rust, this argument would also work with Go, Java, C#, etc...

Conclusion

From my perspective, type annotations provide little benefit at the cost of extra work. They can lead to a false sense of security that the parameter types you're getting in a function are guaranteed, but there is no such check performed at runtime.

There's also a false sense that type annotation provides documentation, but they never explain what a function does and how the data within the types is effected during a function call. So it's no substitute for good docstrings.

Given this, I prefer to not use them, and to keep clean, well documented code.

Posted on Jun 22 by:

etenil profile

Guillaume Pasquet

@etenil

An ex-PHP dev, now full-time Python dev. Writes Python & modules in Rust.

Discussion

markdown guide