DEV Community

✨iMichael✨
✨iMichael✨

Posted on • Edited on

5 1

Introducing PipFlow

I recently published a new Python package manager called PipFlow. If you're like me and use Docker for everything (local development + containers in production), you might like it.

Let's take a step back and dive into what problem it solves and at the end I'll tell you why the newer package managers were not a good fit for me. Here are my priors:

  • I believe in making my local development mirror production NOT make production mirror my local development.
  • Therefore I do everything in Docker (and k8s)
  • In Docker, the slimmer the image the better. Also if you can avoid redundant dependencies and build steps, you should.

With that out of the way, let's add some new packages. Requests is cool –– let's install it the old way.

pip install requests

I now have this package in my host operating system, but it's useless there because I use docker locally. So let's get the package and add it to my requirements.txt file.

pip freeze | grep requests >> requirements.txt

My requirements now has requests==2.22.0 in it. Okay cool, but now we need to rebuild our docker image to bake in the new dependency

docker-compose build app

Everything works great now, but let's take a moment to identify some waste: I just installed a package twice to use it. There must be a better way, right? There is:

pipflow add requests

The command above gets the latest requests version and adds it to our requirements file, sorts the file, and rebuilds the docker image – which is the only installation step that matters. The great thing is that we don't need to install pipflow in our Dockerfile, just once in our host operating system.

Pipflow is really a "pip workflow" that just simplifies a few steps into one step. Inspired by the yarn command line API, pipflow also does upgrades for single packages pipflow upgrade <package> and removes packages with pipflow remove <package>. You can also upgrade all packages with pipflow upgrade-all.

So why don't I use one of these newer tools like Poetry or Pipenv? The answer is that these tools solve problems that don't exist in Docker.

Lets read the pipenv README:

You no longer need to use pip and virtualenv separately. They work together

Virtual Environments are not necessary in Docker – Docker already has isolation and sandboxing. Next time you do ADD venv . in Docker, ask yourself what problem you're solving.

What about Poetry and other tools that use lock files, those are useful aren't they?

Nothing is gained by using a secondary lock file in Docker. Requirements txt files are perfectly fine at pinning versions (requirements.txt is your manifest + lockfile already). In fact there is a loss: You have to generate the lock file AND/OR you have to install the tool in your Docker image. Both are wasteful, redundant build steps.

Another thing to remember is that when you do RUN pip install -r requirements.txt in Docker, a new image layer is created – this layer represents all the new packages added to the file system and each layer gets a unique sha256 digest (in effect a lock file mechanism). If you build again and nothing changed in your requirements, the layer is cached. It's a beautiful thing.

Here's what my Dockerfile looks like in a production project that uses pipflow:

FROM python:3.8.0-alpine

EXPOSE 8000

WORKDIR /app

RUN addgroup -S app && adduser -S -G app app

ADD requirements.txt .

RUN pip install -r requirements.txt

ADD . .

USER app

ENTRYPOINT ["scripts/entrypoint.sh"]

You can find the repo on Github and the PyPi package here or just install it with pip install pipflow. The only assumptions are: your Dockerfile/image has pip installed and you have these two lines in your Dockerfile (ideally before you add the rest of the source code for optimal caching).

ADD requirements.txt .

RUN pip install -r requirements.txt

Let me know if you love it, hate it, or ¯\_(ツ)_/¯. It's a young project, so expect a few kinks 😃.

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Image of Datadog

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay