DEV Community

Cover image for How to Convert your Python project into a private pip Package
Riccardo for Datalynx

Posted on • Updated on

How to Convert your Python project into a private pip Package

I was recently tasked to to turn our simple Python project (let’s say a very simple REST API with 1 or 2 endpoints) into a private package, installable using pip or poetry. What sounds simple and quick at first turned out to be a real challenge.

There are many resources online, but some are not up to date, and others do not use best practices. So today, I’ll walk you through what I did to convert a Python project into a Python package.

The story

At Datalynx we initially started with a 2 level API to achieve separation of concerns. The main API ‘backend’ that serves the frontend directly and a secondary API ‘ml_backend’ that serves the main backend application.

That also required us to spin 2 ECS services (more 💰 at the end of the month 😔) to serve a single web interface! With time we realized that ‘ml_backend’ was not as computationally intensive as expected and it would be nice to be able to import the classes, that were once called by hitting an API, directly in ‘backend’.

Prepare code for transition

  • Remove environment dependencies

Our application relied upon a certain number of API keys that were stored into .env files. We addressed that by by requiring all classes that used these variables to have additional parameters in the constructor.

Users (projects that use the library) are now required to pass these parameters.

From:

import os
from dotenv import load_dotenv

class User:
    def __init__(self):
        load_dotenv()
        self.api_key = os.getenv("API_KEY")
Enter fullscreen mode Exit fullscreen mode

To:

class User:
    def __init__(self, api_key):
        self.api_key = api_key
Enter fullscreen mode Exit fullscreen mode

At the end of this process you should be able to use classes without having any .ini or .env files. (make sure you test for that!)

  • Rename project and module names to avoid confusion

PEP8 defines a standard for how to name packages and modules:

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

We had to rename ‘ml_backend’ to ‘mlbackend’, rename/delete redundant folders to achieve this. I used this opportunity to rename classes and variable names as well (like Sunday cleaning).

Create pyproject.toml metadata file

The pyproject.toml is a config file that contains metadata and some instructions about your package. This file is used by a build backend (like setuptools or hatch) that builds your package and create associated distribution files. Those files will be uploaded to your package repository (in our case AWS CodeArtifact).

The build frontend (like pip or poetry) will be responsible of downloading the package and and managing its installation in the user's environment.
For our case this file will include the project info, a version, and the dependencies. Nothing else.

Here is a sample pyproject.toml that works for this simple project:

[project]
name = "mlbackend"
version = "1.0.0"
dependencies = [
  "boto3>=1.16.0",
  "numpy>=1.21.5",
  "pydantic>=2.5.2",
  "pytest>=7.4.3",
  "requests>=2.31.0",
  "websockets>=11.0.3"
]

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
Enter fullscreen mode Exit fullscreen mode

Here is list of other config info you can add.

You are now ready to build your project. Make sure you install build

python3 -m pip install build
Enter fullscreen mode Exit fullscreen mode

Then build the package

python3 -m build --sdist
Enter fullscreen mode Exit fullscreen mode

You will notice that it creates a /dist folder and inside of it there’s your .tar.gz package 🎉

Upload package to AWS CodeArtifact

We now want to upload our package into a private repository. At Datalynx we use AWS for basically everything so we’ll stick to use AWS CodeArtifact to create a private repository and upload our package to it.

After creating your private repository, AWS provides a command to authenticate with the repository from your local machine that would look like this

aws codeartifact login --tool twine --repository [your-repository] --domain [your-domain] --domain-owner [aws-account-id] --region [your-region]
Enter fullscreen mode Exit fullscreen mode

Once authentication is done you can go ahead and upload your package. There’s a few tools to achieve this but I’ll stick to the most popular one. twine

Install twine

python3 -m pip install twine
Enter fullscreen mode Exit fullscreen mode

Then upload your package

twine upload --repository codeartifact dist/*
Enter fullscreen mode Exit fullscreen mode

And that’s it!

Next up would be to build a pipeline to automate this process on your preferred trigger (new PR created, push to a branch..)

Top comments (0)