Carles Julià

Posted on Jun 20, 2023

Improving the Python code quality of the team with pylint (and ChatGPT)

ℹ️ Note

This is the english version of the original post, in Catalan.
It was translated using ChatGPT (GPT-4).

Modern Development in Python

Python is not what it used to be. For a while now, modern development with Python involves a whole set of tools that help us improve code quality.

Linting projects such as pylint, and more recently ruff, have been analyzing code statically to find issues for some time.

More recently, mypy helps us verify that the code is correct through an explicit declaration of types.

And to top it off, pre-commit provides a way to automate these tools to run regularly.

All of these tools turn out to be essential in the day-to-day life of the development team.

The Problem

If you have experience with mypy, you have probably encountered situations where it does not behave as you expected. The temptation to add a # ignore: typing is great, or to use typing.cast to remove the error. After all, if we don't satisfy mypy, we won't pass the pre-commit.


import typing

class A:
    def a():
        print('a')

class B(A):
    def b():
        print('b')

things: dict[str,A] = {'a':A(), 'b':B()}

b = things['b']
# We know that b is an instance of B, but mypy doesn't

b.b()  # mypy doesn't like this

typing.cast(B, b).b()  # mypy accepts the change

Although this satisfies mypy, it is very dangerous. What if the part of the code changes and b ends up being from class A? It could possibly lead to runtime errors, but in less evident places. If instead of directly calling b.b() we passed it to another function, it could go unnoticed for a long time:


from queue import Queue

cua: Queue[B] = Queue()

def do_something_later(b: B) -> None:
    cua.put(b)

b_b = typing.cast(B, b)  # mypy accepts the change
do_something_later(b_b)

Then we will not see the error until the element is consumed from the queue.

Alternatives

The simplest alternative to typing.cast is assert isinstance:


b = things['b']
# We know that b is an instance of B, but mypy doesn't
assert isinstance(b,B)  # now mypy does
b.b()  # No problem

If you notice, the code reads: "I, the programmer, assure you that I know b is an instance of B, and if I'm wrong, let an exception be raised!". This prevents us from experiencing what happened in the previous example where the error went undetected for a long time.

Also, in a production environment, asserts can be disabled, eliminating any cost overhead.

However, this does not work for types like dict[str,B]. To solve this, we can use pydantic and create a replacement command:


from typing import cast
from pydantic import parse_obj_as

def assert_cast(expected_type: type[T], value: Any) -> T:
    """Drop-in replacement for typing.cast, but with runtime checks.
    If assert is disabled, this function does nothing.
    Checks that value is of type expected_type and returns it as such.
    this also works with arbitrary types such as dict[...], Pydantic models, etc.
    Because of how Pydantic works, we can't prevent the check to use type coercion.
    This should be solved in Pydantic V2, in the future"""

    # If asserts are disabled, we don't want to do anything

    if not __debug__:
        return cast(T, value)

    try:
        parse_obj_as(expected_type, value)
    except ValueError as exc:
        raise AssertionError(
            f"Expected value of type {expected_type} but got {type(value)} instead"
        ) from exc
    return cast(T, value)

Now replace cast with assert_cast, and you will have all the benefits of cast and assert at the same time.

assert_cast(dict[str,B],b_dict)

Note: Remember that you can also ask ChatGPT to generate unit tests for the function. And remember to review them!

Implementation

Once we have identified the problem in our codebase, we need to find a way to prevent it from happening again. Obviously, we can explain it to the team members, but we cannot rely solely on memory; it will happen again.

We would need to find a way to detect typing.cast in the pre-commit to prevent it from slipping through, but how? Well, let's make a pylint plugin.

The problem is that writing a pylint plugin seems to be a long and boring task. And, worst of all, we would become experts in pylint plugins and would be constantly asked to write them. Nobody wants that.

Therefore, the best alternative is to entrust it to ChatGPT.

ChatGPT

For ChatGPT to make us a pylint plugin we have to ask for it:

Please write a pylint plugin that warns about using typing.cast function anywhere in the code.

For best results, it's better to use GPT-4, GPT-3.5 hallucinates too much.

On the first attempt, it will generate correct code, but it will miss some cases. For example, initially it only detected when typing.cast was called but not when cast was called after doing from typing import cast. The best approach top solve that is to keep asking ChatGPT to add these cases.

It helps a lot to have various canary files where we deliberately add typing.cast in different ways, to verify that the plugin indeed recognizes them.

Once it works, it's very helpful to make a document in the same repository explaining the whole rationale for banning the use of typing.cast and referring to it in the warning message. That way, everyone will be able to understand what's going on when the error pops up.

Recovering the `pre-commit`

Once the plugin is added, the pre-commit will fail, finding all the times typing.cast appears in the code.

We find ourselves in the situation where we would have to solve all pylint errors before being able to push the rule into th repository, and this can be a too big of a task.

In this case, the strategy we chose was to add the pylint rule as a warning and not an error, so that over time, we can make the change and progressively replace the cases of typing.cast.

Conclusions

pylint and pre-commit are fantastic for this use case: enforcing a rule that bans the use of a function. This way we don't have to manually monitor that people don't skip it.

We discovered that ChatGPT is very effective for this task.

If you can't get the pre-commit green in one go, you will need to organize sessions to finish eliminating the banned function.

DEV Community

Improving the Python code quality of the team with pylint (and ChatGPT)

Modern Development in Python

The Problem

Alternatives

Implementation

ChatGPT

Recovering the `pre-commit`

Conclusions

Top comments (0)

Modern Development in Python

The Problem

Alternatives

Implementation

ChatGPT

Recovering the pre-commit

Conclusions

Recovering the `pre-commit`