DEV Community

Cover image for Scikit-Learn Hello World with Docker Init
Mohammad-Ali A'RÂBI for Docker

Posted on • Originally published at aerabi.Medium

Scikit-Learn Hello World with Docker Init

This article is a simple example of how to use Scikit-Learn to create a simple classifier based on the Iris dataset.

Installation

In this example, we use Python version 3.12.0 together with virtualenv. Ubuntu 24.04 comes with Python 3.12 pre-installed.
To install Python 3.12 on MacOS, you can use Homebrew:

brew install python@3.12
Enter fullscreen mode Exit fullscreen mode

On Windows, you can use the official Python installer from the Python website.

Later we also use Docker to run the Python code in a container. The easiest way to install Docker is to install
Docker Desktop. For Linux, you can alternatively install Docker Engine natively.

Create the Project with a Hello World Python script

First, create a new directory for the project and navigate into it. Then create a new file called main.py with
the following content:

def main():
    print("Hello, world!")


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

This is a simple Python script that prints "Hello, world!" to the console. Try running it with the following command:

python main.py
Enter fullscreen mode Exit fullscreen mode

If your Python binary requires you to specify the version, you can use python3.12 instead of python.

Initiate a virtual environment and install Scikit-Learn

Make sure you have virtualenv installed:

virtualenv --version
Enter fullscreen mode Exit fullscreen mode

Then create a new virtual environment for the project:

virtualenv -p python3.12 venv
Enter fullscreen mode Exit fullscreen mode

Activate the virtual environment:

source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

Install Scikit-Learn:

pip install scikit-learn
Enter fullscreen mode Exit fullscreen mode

To have a list of all the installed packages with their versions, you can run:

pip freeze > requirements.txt
Enter fullscreen mode Exit fullscreen mode

This will dump all the installed packages into a file called requirements.txt. This way, the next person who works on
the project can install the exact same versions of the packages.

Create a simple classifier with Scikit-Learn

Now that we have Scikit-Learn installed, let's create a simple classifier based on the Iris dataset. Change the content
of main.py to the following:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier


def main():
    """
    This function is a hello-world function for scikit-learn.
    It loads the Iris dataset, splits it into training and testing sets, and trains a random forest classifier.
    :return: The model
    """
    # Load the Iris dataset
    iris = load_iris()
    X = iris.data
    y = iris.target

    # Split the dataset into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

    # Train a random forest classifier
    model = RandomForestClassifier()
    model.fit(X_train, y_train)

    # Print the model's accuracy
    print(f'Model accuracy: {model.score(X_test, y_test)}')

    return model


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

Some explanations about the code:

  • We load the Iris dataset using load_iris() from Scikit-Learn. Iris is a simple dataset with 150 samples and 4 features.
  • We split the dataset into training and testing sets using train_test_split(). We use 80% of the data for training and 20% for testing.
  • We train a random forest classifier using RandomForestClassifier(). This is a simple classifier that works well on the Iris dataset.
  • We print the model's accuracy on the test set using model.score(X_test, y_test).
  • Finally, we return the trained model.

Try running the script with:

python main.py
Enter fullscreen mode Exit fullscreen mode

You should see the model's accuracy printed to the console. In my case, the output was:

Model accuracy: 0.9
Enter fullscreen mode Exit fullscreen mode

Dockerize the project

To run the Python code in a Docker container, use the docker init command to create a new Dockerfile:

docker init
Enter fullscreen mode Exit fullscreen mode

The prompt will ask you a few questions. Here are the answers you should provide:

  • What application platform does your project use? Select Python. It should be automatically detected.
  • What version of Python do you want to use? It is also automatically detected. In my case, it was 3.12.3. Just press Enter.
  • What port do you want your app to listen on? Press Enter to accept the default value 8080.
  • What is the command you use to run your app? Enter python main.py.

The following resources will be added to your project:

  • A Dockerfile that describes how to build the Docker image.
  • A .dockerignore file that specifies which files and directories to exclude from the Docker build context.
  • A compose.yaml file that describes how to run the Docker container.
  • A README.Docker.md file with instructions on how to build and run the Docker container.

To run the Docker container, use the following command:

docker compose up --build
Enter fullscreen mode Exit fullscreen mode

This will build the Docker image and start the container. You should see the output of the Python script in the console.

Attaching to server-1
server-1  | Model accuracy: 1.0
server-1 exited with code 0
Enter fullscreen mode Exit fullscreen mode

The next time you want to start the container, you can use:

docker compose up
Enter fullscreen mode Exit fullscreen mode

It won't rebuild the image, so it will start faster.

Conclusion

This article showed you how to create a simple classifier using Scikit-Learn and run it in a Docker container. You learned how to:

  • Install Python and Scikit-Learn.
  • Create a simple Python script.
  • Train a classifier on the Iris dataset.
  • Dockerize the project and run it in a container.

You can now extend this example by trying different classifiers, datasets, or even building a web service around the classifier.
Scikit-Learn is a powerful library for machine learning, and Docker is a great way to package and run your Python code.
By building the Docker image for the target platform, you can deploy your application to the cloud with ease, or share it with others.

Top comments (0)