DEV Community

Josh Holbrook
Josh Holbrook

Posted on

pyee Release 9.0: Type Annotations, New APIs & More!

I just published a fresh release of pyee and boy does it have a lot of changes from the last *counts on fingers* 5 months! I wanted to talk about the changes and what went into them, as some of them are fun and interesting.

I'm going to cover a lot of ground - this probably should've been multiple posts - but feel free to skip around the headings and read what interests you. Onward!

Let's Not Bury The Lede: Type Annotations!

The most important change in v9 is type annotations, type-checked with pyright and tagged with a py.typed file so it (hopefully) works in mypy - if your type checker complains to you about missing stubs, please file an issue!! I really want this to work!

Why I Chose pyright Instead of mypy

I've typically used mypy to do type checking. Python projects at work started integrating them around 2017 or so (my fellow senior data engineer at GMG, Mike, quietly just up and annotated the entire warehouse codebase one day, wasn't even mad), and as I got better at it I started using them at home as well. The standard, obvious tool was always mypy, and so that's what we used.

People who follow me on Twitter may know that I find mypy's model of documentation and support absolutely infuriating. To be both clear and fair, this is a personal problem. The way mypy's docs are structured is such that if you have the patience to sit down and read something end-to-end you'll have a decent time. Most of what you need is in there. But if you have the attention span of a toddler like I do and prefer to lean on good error messages and interactive chats, well, it doesn't really work. A frustrating experience for sure, but asking people to read something isn't too much to ask either. At any rate: after a lot of pouting, stomping around and generally showing my ass, I figured out most of mypy's tricks. It's workable! I would - and will! - use it again. Didn't stop me from being grouchy though. Sorry, mypy devs!

But during my kvetching on the internet, someone turned me onto pyright. Their suggestion was that it would "do the right thing" more often than mypy for me, and that I'd like it better. I took a look at it, and while it appeared promising, mypy seemed the more conservative option (my typical preference for enterprise codebases) and also it was written in typescript, introducing a dependency on Node.js. Not great for a data engineering team with a Python codebase and a healthy fear of JavaScript!

I decided pretty quick that it wasn't the right choice for DoubleVerify, at least, but I was pretty interested in trying it out at home. I began my career writing Node.js in the early tens - I still reach for it when I'm feeling nostalgic - and as of September I'm using it at work again. Installing it is no problem for me. I also like being more creative at home than at work (aside: Eaze has an usually creativity-friendly culture - pleasant! - but alas uses Python sparingly), so trying something new was up my alley.

It turns out that, past frustrations with mypy aside (no, seriously), I like pyright quite a bit! It's true that it complains about missing stubs less often. I also much prefer the ./typings convention over mypy/stubgen's environment based configuring and ./out default. There's an argument for mypy's approach - "explicit is better," probably - but for someone who hates thinking, it feels nice.

I mentioned that pyright is written in typescript, but what's really funny is that pyright is basically tsc's type checker but for Python. The type checking output between tsc and pyright are so similar that I sometimes get confused about which language I'm working with. This, I suppose, explains the Node.js dependency. But I work with typescript quite a bit now - it's a standard part of Eaze's stack, sure, but it's also a language I earnestly enjoy. It feels like a structurally typed scala! It has a rough analog to scalaz! And its type system is way more powerful than it has any business being?? What's not to love? So given that typescript is already a standard part of my stack, this synergy is - as it turns out - really valuable. It's great!

As a bonus, pyright's baked in vscode support - something it shares with typescript - not only implies a buttery smooth vs code environment, but also leaves the door open for other lsp-friendly editor/IDE plugins. I personally use neovim and coc.nvim, and as it turns out pyright integrates with coc.nvim quite nicely.

So when I decided to add type annotations to pyee, pyright was the choice that felt right in my heart. Adopting it was pretty smooth, both for local dev and in CI. I'm really happy with how adopting pyright went for me!

How Type-Safe is pyee?

The answer is "not very." But let's get into the weeds a little bit. What type signatures does pyee have? How safe are they? Why isn't the API more safe than is, and what can we do about it? All questions I'm sure you're clamoring to ask! But ones I want to answer.

(Quick note: The following snippets have NOT been type checked. They were never meant to be complete/valid annotations BUT there may be clear typos in some details, such as whether a type variable is covariant or contravariant. Be forgiving, and flag bugs if you see them!)

To start, here's the type signature for EventEmitter#on that I shipped in v9.0:

Handler = TypeVar(name="Handler", bound=Callable)

def on(self, event: str, f: Optional[Handler] = None) -> Handler:
    ...
Enter fullscreen mode Exit fullscreen mode

OK, so event is a string as it is in Node.js, and f is a type parameter bound to any Callable, regardless of type signature. Great start! The method also returns a Callable, but there are two callables which may be returned: either f itself, or a decorator with type Callable[[Handler], Handler], where Handler is the type of f.

(Update, 2022/01/18 - I fixed a bug this morning in the types around on and friends, and updated the post in this general region accordingly.)

We can clarify things a little by adding two new methods: add_listener, which doesn't support the decorator usage, and listens_to, which always returns a decorator. Something like:

class EventEmitter:
    ...
    def on(self, event: str, f: Optional[Handler]) -> Union[Handler, Callable[[Handler], Handler]:
        if f:
            return self.add_listener(event, f)
        return self.listens_to(event)

    def add_listener(self, event: str, f: Handler) -> Handler:
        ...

    def listens_to(self, event: str) -> Callable[[Handler], Handler]:
        ...
Enter fullscreen mode Exit fullscreen mode

You can see that the return type for on is a union of Handler and Callable[[Handler], Handler], the return types for add_listener and listens_to respectively. Typing listens_to's return value as Callable[[Handler], Handler] ensures that it will always be called as a decorator, and similarly for add_listener. The arguments for add_listener and listens_to are also more strict than on's: you must call add_listener with a non-None, callable f, and listens_to doesn't accept that argument at all. Pyright, meanwhile, has no way of knowing that on's return value depends on f. Overall, adding these methods is a win for type safety.

But we're getting off-track! Here's the type signature for EventEmitter#emit:

def emit(self, event: str, *args: Any, **kwargs: Any) -> bool:
    ...
Enter fullscreen mode Exit fullscreen mode

We again see that event is a string, but we also see that the arguments for the function are *args and **kwargs and both typed to Any! Calamity - Any is an explicit admission that no type safety is given that tells the type checker to throw its hands in the air. But then we recall that the handler called with those arguments is any Callable - so we're calling any function with any arguments? That sounds really dangerous! Is the EventEmitter abstraction really that unsafe?

We can start by looking at what the DefinitelyTyped stubs for node core, and there we see that the event name really is typed string (or Symbol, but Python doesn't have those), and that handlers are typed (...args: any[]) => void. This type annotation is slightly more strict than Callable. The closest thing in Python is a bit like this:

class Handler(Protocol):
    def __call__(self, *args: Any, **kwargs: Any) -> None:
        ...
Enter fullscreen mode Exit fullscreen mode

There are two reasons this protocol type isn't more appropriate, however. The first is that in typescript, (...args: Any[]) captures more or less every call signature (afaik! if you disagree please correct me!!) while the proposed Handler protocol only works with callables containing both *args and **kwargs in the signature. The second is a bit of a nit: the DefinitelyTyped stubs check that the handler doesn't return, as a convention for side-effect functions which don't have a useful return value; but the value is also thrown away by us, so why enforce that aside from pedantry? It's also worth noting that a lot of non-type-checked Node.js code contains idioms like if err return callback(err) which violate that signature. Between these, allowing functions to not have a *arg and **kwarg in the signature and allowing any return value meant that Callable would do the job.

So OK, this is roughly as type-safe as Node.js. But why can't we parameterize the types for event and Handler? Perhaps we can use type variables for **args and **kwargs and build an interface around that?

Event = TypeVar(name="Event")
Arg = TypeVar(name="Arg", contravariant=True)
Kwarg = TypeVar(name="Kwarg", contravariant=True)

class Handler(Protocol[Arg, Kwarg]):
    def __call__(self, *args: Arg, **kwargs: Kwarg) -> Any:
        ...

class EventEmitter(Generic[Event, Arg, Kwarg]):
    def add_listener(self, event: Event, f: Handler[Arg, Kwarg]) -> Handler[Arg, Kwarg]:
        ...

    def emit(self, event: Event, **args: Arg, **kwarg: Kwargs) -> bool:
        ...
Enter fullscreen mode Exit fullscreen mode

We would ideally like to accept other type signatures for Handlers than *args: Arg, **kwargs: Kwarg. Not only do signatures containing*args and **kwargs need to accept an arbitrary number of arguments; they can't differentiate between the types of individual *args and **kwargs either (their types being a Tuple[...Arg] and a Dict[str, Kwarg] respectively). It's not my favorite! But it's also how the handlers are being called today, and frankly we have bigger problems. So let's set that aside for now.

Otherwise, this design looks sensible - we have a consistent protocol for a handler and we have a parameterized type for events. But let's talk about why it doesn't hold up.

Our first problem comes from the type of event. We've typed it as Event here, but we actually have two "internal" events that all EventEmitters have: error and new_listener. So now we need to include those values in potential events:

ErrorEvent = Literal["error"]
NewListenerEvent = Literal["new_listener"]

class EventEmitter(Generic[Event, Arg, Kwarg]):
    def add_listener(
        self,
        event: Union[Event, ErrorEvent, NewListenerEvent],
        f: Handler[Arg, Kwarg]
    ) -> Handler[Arg, Kwarg]:
        ...

    def emit(
        self,
        event: Union[Event, ErrorEvent, NewListenerEvent],
        **args: Arg,
        **kwarg: Kwarg
    ) -> bool:
        ...
Enter fullscreen mode Exit fullscreen mode

Our type signature has become more complicated. This sort of thing is a bit of a smell with type checking, a sign that an interface is too complex or unsafe. What we have so far isn't show-stopping, but it's a little annoying.

Don't worry, it gets worse. If those two events are there, what do their handlers look like? Here's what we have for error handlers:

Err = TypeVar(
    name="Err",
    bound=Exception,
    contravariant=True
)

class ErrorHandler(Protocol[Err]):
    def __call__(self, error: Err) -> Any: ...
Enter fullscreen mode Exit fullscreen mode

This protocol is markedly different from what we sketched out for the handler before! It can't absorb *args or **kwargs, for instance, and it's parameterized by an Err type, bound to Exception.

Then there's the "new_listener" handler:

class NewListenerHandler(Protocol[Event, Arg, Kwarg, Err]):
    def __call__(
        self,
        event: Union[Event, ErrorEvent, NewListenerEvent],
        handler: Union[
            Handler[Arg, Kwarg],
            ErrorHandler[Err],
            "NewListenerHandle[Event, Arg, Kwarg, Err]"
        ]
    ) -> Any:
        ...
Enter fullscreen mode Exit fullscreen mode

This handler takes the event that a new listener is for and the listener itself, i.e. a Handler. In either case, we'll need to incorporate these signatures into those of possible handlers.

Then there's one other minor issue: If we're going to parameterize Event, Arg, Kwarg and Err, shouldn't we parameterize Error events as well? This is relevant to TwistedEventEmitter's "failure" event, which is handled like a second kind of error.

If we put all of that together, we start seeing type signatures like this:

class EventEmitter(Generic[Event, ErrorEvent, Arg, Kwarg, Err]):
    def add_listener(
        self,
        event: Union[
            Event,
            ErrorEvent,
            Literal["error"],
            Literal["new_listener"]
        ],
        f: Union[
            Handler[Arg, Kwarg],
            ErrorHandler[Err],
            NewListenerHandler[Event, ErrorEvent, Arg, Kwarg, Err]
        ]
    ) -> Union[
        Handler[Arg, Kwarg],
        ErrorHandler[Err],
        NewListenerHandler[Event, ErrorEvent, Arg, Kwarg, Err]
    ]:
        ...

    def emit(
        self,
        event: Union[
            Event,
            ErrorEvent,
            Literal["error"],
            Literal["new_listener"]
        ],
        **args: Arg,
        **kwarg: Kwarg
    ) -> bool:
        ...
Enter fullscreen mode Exit fullscreen mode

(sidebar: I tried to wrap some of these up into type aliases to keep things less busy, but struggled to make them work well with type variables. If you have pro tricks for this sort of thing, let me know!)

Now the types are extremely unwieldy - EventEmitter has five type parameters?? - and this immediately signals that this interface is too complex. If nothing else, pyright's output when trying to check test code was completely unreadable. Things aren't looking very good here!

But believe it or not, this isn't actually why I threw in the towel on safer annotations. In fact, the core problem is that storing a handler is a "lossy" operation when it comes to its type, and it's not possible to get that information back. Let's get even deeper into the weeds to get into why that is.

Internally, pyee stores events in a nested dict data structure, something like:

self.events: Dict[
    Event,
    """OrderedDict[
        Union[Handler, ErrorHandler, NewListenerHandler],
        Union[Handler, ErrorHandler, NewListenerHandler]
    ]"""
] = dict()
Enter fullscreen mode Exit fullscreen mode

The first key is the event that given handlers are registered to. The key to the OrderedDict is the handler itself (used for removing listeners), and the value is the function actually called by emit (in the case of once, a wrapped handler). OK, cool. So let's suppose we have a Handler, and we want to add it to some handlers:

# f: Handler

self.events["data"][f] = f
Enter fullscreen mode Exit fullscreen mode

Suppose that we knew f to be a Handler prior to this operation. When we add it to self.events, we throw that away and treat f as one of any of these unioned handler types:

for k, v in self.events["data"].items():
    # k: Union[Handler, ErrorHandler, NewListenerHandler]
    # v: Union[Handler, ErrorHandler, NewListenerHandler]
Enter fullscreen mode Exit fullscreen mode

In these situations, the trick is usually to use type guards to sort out the types. For an example of what I mean, consider this toy implementation of a Maybe type:

from abc import ABC, abstractmethod
from typing import Callable, Generic, TypeVar

T = TypeVar(name="T")
U = TypeVar(name="U")

class Maybe(ABC, Generic[T]):
    @abstractmethod
    def map(self, fn: Callable[[T], U]) -> "Maybe[U]":
        raise NotImplementedError("Don't use Maybe directly!")


class Some(Maybe[T]):
    def __init__(self. value: T):
        self.value: T = value

    def map(self, fn: Callable[[T], U]) -> "Maybe[U]":
        return Some(fn(self.value))


class Nothing(Maybe[T]):
    def map(self, fn: Callable[[T], U]) -> "Maybe[U]":
        return Nothing()
Enter fullscreen mode Exit fullscreen mode

It would be relatively common that we have a thing where we don't know if it's a Some or a Nothing, just a Maybe. But we can use isinstance checks to sort it out, and the type checker will respect that:

maybe_hello: Maybe[str] = Some("hello world")

if isinstance(maybe_hello, Some):
    # We can safely access .value because the type checker
    # knows it's a `Some` now
    print(f"Some({maybe_hello.value})")
    # We now know that `maybe_hello` is a `Some[str]`
else:
    print("Nothing()")
Enter fullscreen mode Exit fullscreen mode

It would be nice if we could do something similar for pyee handlers:

if isinstance(f, Handler):
    # yeah?
Enter fullscreen mode Exit fullscreen mode

but sadly, there is - as far as I know - no good way to type guard your way to an arbitrary callable's signature.

So given the current EventEmitter API, you're stuck. Short of nominally typing the handlers, that's it - Callable is just about the best that you can do. I'm not going to make those changes to the existing EventEmitter implementations, and so these type signatures are the ones that make sense.

Will I revisit this someday? Perhaps. There's certainly room to experiment with typed EventEmitters, and it wouldn't be surprising to add a pyee.typed module to the package. I'm punting on it for this release, but am tracking this possibility in an issue.

Dropping Python 3.6 Support

I dropped support for Python 3.6, and while I think that news isn't ground-breaking in and of itself, it is a consequence of my general deprecation policy for pyee, which I don't think I've ever voiced before! Are you ready? My deprecation policy is:

Deprecate Python version support when test dependencies start failing for that version.

You see, pyee doesn't have any strict dependencies, aside from language features in Python. However it does have submodules which depend on various frameworks: asyncio and concurrent.futures from the stdlib; and twisted and trio from the greater ecosystem. So despite not having install dependencies, all of these things are required to run all the tests. It's these dependencies which push me to deprecate Python version support. Basically: if the tests fail for the oldest Python version in the matrix because they depend on a recent library version or Python language feature, remove that Python version from the test matrix without thinking about it too hard. If there's a new Python version out in the wild to replace it with, pay it no mental effort at all.

My stance on this wasn't always so, and in fact I arrived at this position somewhat organically. Long ago, pyee supported Python 2.7 and Python 3.5+. Writing code which would run in both 2 and 3 wasn't particularly challenging for pyee at first, and I was able to flag installed dependencies and tests as dependent on Python versions. It was a little hairy but OK.

However, Python 2 support became much more difficult when my testing dependencies started deleting their own compatibility hacks. I eventually found myself in a situation where I would need to track down the latest version of every dependency which supported Python 2, pin each of them as hard as I could, and hope they all still worked together (!!) if I wanted to continue maintaining Python 2 support. Alas, this was so hard I couldn't actually solve it with a reasonable amount of effort -- and hey, if the overall Python ecosystem was sweeping Python 2 support into the dustbin, perhaps it was time to follow that lead. So I did, and it let me simplify my code significantly. It was good.

I was also confronted with this push-and-pull after adopting trio support. Trio, in my experience, is relatively quick to adopt new Python language features. You may find room to quibble on that statement -- is December 2016 really all that recent?? -- but these days it is the most likely culprit for breaking tests. I could flag trio on the earliest version it supported at the time of me writing the tests, sure, but after the sheer relief I felt at ditching that stuff with Python 2, I decided to effectively let Trio's support policy drive mine.

Part of me feels like I could try to support my users better. After all, if the base experience still works in Python 3.5 and you're not using it with Trio, why not support it? Why not put in the extra time to have broader support?

On the other hand, Trio as far as I can tell simply drops support for Python versions when they reach end-of-life status from the Python core team, as with 3.6 this last December. (ed: I'm writing this in January 2022.) As with Python 2: If the bedrock - "the community" - is dropping support, who am I to enable the kind of people who smugly use CentOS? ;)

Moved Interfaces, Progressive Enhancement and Module Exports

A change I've been wanting to make for a long time is deprecating the imports/exports of ExecutorEventEmitter, TrioEventEmitter and TwistedEventEmitter from __init__, moving them to pyee.executor, pyee.trio and pyee.twisted respectively. This probably sounds like pushing the peas around, but I promise it's for a good reason!

Many moons ago, when I first implemented AsyncIOEventEmitter, pyee was just the single __init__.py file, and EventEmitter would progressively detect if it was handed a coroutine. This behavior extended to Twisted as well. You just imported pyee.EventEmitter, threw whatever you wanted at it, and went to town. This was fine at first, but stopped scaling when I added trio support. So, I took a nod from projects like apscheduler and implemented subclasses for each of these cases. I also slowly but surely deprecated the old behavior of EventEmitter, eventually removing it about a release ago. This left me with one "public" imported module (pyee) and a bunch of "private" modules for each implementation.

But a major pain point to that approach is that I need to attempt an import, catch ImportErrors, and shrug if one occurs. In practice, this means that from pyee import TwistedEventEmitter will only work if twisted is installed; no such thing will get exported otherwise. But in practice it's not very fun.

I learned the lesson of how I actually want this to work the hard way, after making similar (but worse) mistakes with an API client on the job. In that case, mypy was struggling to handle optional imports and was making the pain points apparent. It was while brainstorming with a colleague that I realized: What I really want is for someone to explicitly import a module with the dependency - if they don't import it they never wanted the functionality, and if they do import it they'll probably appreciate the error! I know I would.

Like I said, I'd been sitting on this change for a while. I wanted to make it in a major release (it definitely breaks stuff!) and, well, now I have one to roll it into. This will hopefully be the last time the pyee API goes through major breaking API changes like this - fingers crossed!

A New Method: EventEmitter#event_names()

There's something about pyee that's a little funny to me: while I initially went for a straight implementation of Node.js's EventEmitter, my humble package's evolution hasn't tracked Node at all. Basically anything added to EventEmitter after *checks commits* Summer 2011 hasn't been on my radar at all, meanwhile coroutine support evolved completely separately. So it's not surprising that a user would go looking for a Node method analog and not find it. That's what happened here.

You can see in the initial PR that there was a question about why new_listener was showing up in the events when no handlers had been added. The answer to that was surprising - at least to me.

I mentioned earlier that the event table in pyee is typed Dict[str, OrderedDict[Callable, Callable]]. But in v8 of pyee, the outer dict was actually a defaultdict. This enabled handy shortcuts when developing the on and emit logic for sure, but it also meant that any time the dict was accessed -- for example with a .emit call -- it would lazily create an internal OrderedDict. This is what was happening with new_listener - I'd emit the event internally and defaultdict would lazily add the key to the events table. What's more, I wasn't cleaning up these OrderedDicts when they became empty either. Whoops!

If this kind of thing interests you, I tried my best to make the commit understandable -- take a look!

New Module: pyee.cls

Something else that's funny about pyee is that I use it sparingly and most of the new features I add to it are in response to user requests! On one hand it's pretty neat that enough people use my module that I get feature requests, but on another it's a bit of me guessing sometimes.

One such issue came across my desk this Spring. Here, @drmikecrowe asked about using the @on decorator with instance methods of an EventEmitter.

The answer to this problem in v8 of pyee would be "no", with a solution to set up the handlers in the constructor instead:

class HelloEventEmitter(EventEmitter):
    def __init__(self):
        super(HelloEventEmitter, self).__init__()
        self.add_listener("message", self.print_message)

    def print_message(self, message):
        print(message)

ee = HelloEventEmitter()
ee.emit("message", "Hello world!")
Enter fullscreen mode Exit fullscreen mode

Or, if we're to believe that inheritance is Bad:

class Hello:
    def __init__(self):
        self.event_emitter = EventEmitter()
        self.event_emitter.on("message", self.print_message)

    def print_message(self, message):
        print(message)

hello = Hello()
hello.event_emitter.emit("message", "Hello world!")
Enter fullscreen mode Exit fullscreen mode

That's certainly doable. But wouldn't it be cool if there was a slick class/method decorator API for it? Well, this is what I came up with:

from python.cls import evented, on


@evented
class Hello:
    # generated __init__ defines self.event_emitter, but you can
    # also set it yourself in __init__ manually for clarity/typing reasons

    @on("message")
    def print_message(self, message):
        print(message)


hello = Hello()
hello.event_emitter.emit("message", "Hello world!")
Enter fullscreen mode Exit fullscreen mode

Well, maybe. Like I said, sometimes this feels a little like guessing. I run into this kind of architecture sometimes, but don't have that problem right now, and I don't have a lot of people ready to code review me on this project. It really is a solo operation!

So I implemented it August and sat on it until now. It's basically not going to see usage until it's released, and what, like it could be more bonkers than the pyee.uplift API? Please.

So consider this API "experimental", but please kick the tires on it! Let's figure out where the edge cases are and whether or not it's even a good idea!

Builds on GitHub Actions Instead of TravisCI

After years of Totally Meaning To, I finally got around to updating pyee's CI build to use GitHub Actions instead of Travis CI. Between the super sketchy layoffs at Travis CI in 2019, the unsettling changes to their pricing model and things generally feeling stale and clunky, the move was becoming pressing!

One minor hiccup in my plan was, if you can believe it, including PowerPc in the testing matrix - someone working at IBM added ppc64le to the Travis test matrix in late 2020, an architecture obscure enough that GitHub Actions didn't support it. I was going to use the Actions build for everything but ppc, under the theory that running ppc tests on Travis was non-harmful, but funny enough what stopped me was a migration issue from Travis switching from a .net to a .com at some point, leaving my project busted until I dialed up Travis support. At that point, it just wasn't worth it to me!

GitHub Actions are pretty nice though. I like the UI a lot more than I did Travis, and the secrets management feels a lot more robust. I'm ultimately glad to have made the switch.

Local Development with Virtualenv Instead of Conda

Finally, I made one change that's going to effect nobody but which represents a major paradigm shift for me: I adjusted the default development toolset from conda to vanilla python3 -m venv with whatever Python 3 is in the $PATH.

The truth is that I actually love conda. A certain Josh Laurito introduced me to it when he decided GMG's data engineering team was going to standardize on it, and it really won me over!

One of the things I really like about conda is that it can create and manage environments using a pretty complete yaml-based declarative config DSL, giving them human-readable names and storing them in a sensible location. In general, I think that:

CAT <<EOF > "environment.yml"
name: my-env
dependencies:
  - python=3.10.2
EOF
conda env create
conda activate -n my-env
Enter fullscreen mode Exit fullscreen mode

is more ergonomic than:

pyenv install 3.10.2
"$(pyenv root)/versions/3.10.2/bin/python3" -m venv ./my-env
source ./my-venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

but there are a few things virtualenvs are better at, which motivated me to switch.

For one, venv is a stdlib module available on just about every Python I've seen in recent memory, meaning the bar is at installing Python instead of at installing conda. Some of this advantage wanes when you consider using pyenv to manage the Python version; but Homebrew and Fedora both ship (afaik) Python 3.10 right now, so you're no longer motivated by wanting a newer Python version. What's left is wanting to avoid things breaking on old Pythons, and for that I've found that things like tox work more or less fine.

For another, it's easier to integrate with automation, such as pyee's Makefile. In main, running tests looks like this:

if [ -d venv ]; then . ./venv/bin/activate; fi; pytest ./tests
Enter fullscreen mode Exit fullscreen mode

That's pretty easy - each line of the Makefile is run in a shell, sourcing the script is straightforward, and we move on.

On the other hand, activating conda (which is a prerequisite for calling conda activate -n my-env!) in my ~/.bashrc looks like this:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/josh/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/home/josh/anaconda3/etc/profile.d/conda.sh" ]; then
        . "/home/josh/anaconda3/etc/profile.d/conda.sh"
    else
        export PATH="/home/josh/anaconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<
Enter fullscreen mode Exit fullscreen mode

There's way more going on here than in the activate script, because conda delegates more to the sourcing script. Very roughly, this calls conda shell.bash hook and evals the output, with backup plans of sourcing a conda.sh and adding conda's bins to the PATH.

In our case we could simplify, assuming that the shell hook bit works right:

eval "$(~/anaconda3/bin/conda shell.bash hook 2> /dev/null)" && conda activate my-env; pytest ./tests
Enter fullscreen mode Exit fullscreen mode

This snippet doesn't fall back to the host environment when ./venv isn't there (a fallback for supporting conda) but is otherwise equivalent enough. But it's more moving parts for sure.

But the biggest reason I've shifted towards venv is that even when I was using conda, I wasn't effectively able to ditch requirements.txt files, and it's the venv strategy with direct calls to pip that feels best here.

You can see a little of what I mean if you look at the environment.yml file still included with pyee:

name: pyee
channels:
  - conda-forge
  - default
dependencies:
  - python=3.8.3
  - pip=20.2.3
  - trio=0.17.0
  - twine=3.2.0
  - twisted=20.3.0
  - pip:
    - -r requirements.txt
    - -r requirements_dev.txt
    - -e .
Enter fullscreen mode Exit fullscreen mode

This file installs a few dependencies from conda's channels directly, but the bulk of the dependencies are captured in requirements.txt, requirements_dev.txt and an editable install of pyee itself, and this configuration causes conda to execute pip with those requirements as arguments.

This is partially because some packages are only available on PyPI. But mainly it's because using the requirements.txt files means direct pip workflows are doable. In fact, most of the changes I made to support this shift were in the Makefile - the environment.yml file is still available and minus regressions I introduced works just fine.

Thanks For Coming to my Ted Talk

If you made it this far: thanks for reading! 😅 Or if you just scrolled to the bottom: Hello!

You can check out pyee on GitHub or install v9 from PyPI!

p.s.: BY THE WAY my team is hiring rust developers! I know this post was about Python - if you write rust take a look!

Top comments (0)