Yhary Arias

Posted on Oct 1, 2024

Orchestrating Models: Machine Learning with Docker Compose

#docker #api #flask #machinelearning

Docker Compose is a powerful tool for easily and efficiently defining and managing multi-container Docker applications. In this article, we will explore the basic concepts of Docker Compose and how you can start using it to orchestrate your applications.

What is Docker Compose?

Docker Compose is a tool that allows you to define and run multi-container Docker applications using a YAML file to configure your application's services. Then, with a single command, you can create and run all the defined containers. This simplifies the creation and configuration of complex development and production environments where multiple services need to interact with each other.

Optional

Installing Docker Compose Before getting started, make sure you have Docker Compose installed on your machine. You can install it by following the official Docker instructions.
If you are using a Mac, you can install Docker Compose with the following command. Before running it, make sure you have Docker Desktop installed on your machine.
$ brew install docker-compose
Now, verify the version you have installed:
$ docker-compose --version

Antes de probar Docker Compose con un proyecto de Machine Learning, vamos aclarar la diferencia entre Docker Compose y Kubernetes

Docker Compose:

What it's for: Docker Compose is a tool that lets you run multiple containers together. It's mainly designed for development and testing. It's ideal if you want to quickly spin up multiple services on your machine, like a database, an API, and a web app, all running locally.

How it works: You use a file called docker-compose.yml to define which containers you are going to use and how they connect to each other. For example, you can say, "I want to launch my application and connect it to a database." Compose will take care of that for you with a single command.

Ideal for: Small projects or development environments where you don't need a very complex system and just want to quickly test on your machine.

Kubernetes:

What it's for: Kubernetes is much larger and more powerful than Docker Compose. It not only helps you launch containers but also helps you manage applications in production, on real servers, efficiently and at scale.

How it works: Kubernetes ensures that your application is always running, has enough resources, and can handle many users. If one of your containers fails, Kubernetes will automatically replace it. It can also scale (increase or decrease the number of containers) based on what your application needs at any given moment.

Ideal for: Large production applications that need to be always available and handle high traffic. Large companies or projects that plan to grow significantly often use Kubernetes.

In summary:

Docker Compose is for quickly launching and managing multiple containers on your local machine, ideal for development or testing. Kubernetes is for managing large production applications that need more control, stability, and scalability. Compose is like a small engine that helps you work on your project. Kubernetes is like a large machine that keeps your application running smoothly, even with lots of users and traffic.

Now, let's get to what we're here for 😎
I’ll show you how to apply Docker Compose in an ML project. We’re going to create a simple application that trains a machine learning model and exposes a web service for making predictions.

Project Goal

Our project will consist of two services:

ML Service: A machine learning model trained using scikit-learn, exposed through a web API using Flask.
Database Service: A PostgreSQL database to store prediction results.

Project Structure

The basic file structure will be as follows:

ml_project/
│
├── docker-compose.yml
├── ml_service/
│   ├── Dockerfile
│   ├── app.py
│   ├── model.py
│   ├── requirements.txt
└── db/
    ├── init.sql

1. Define the docker-compose.yml

The first step is to define the services in a docker-compose.yml file.

version: '3'
services:
  ml_service:
    build: ./ml_service
    ports:
      - "5000:5000"
    depends_on:
      - db
  db:
    image: postgres
    environment:
      POSTGRES_DB: ml_results
      POSTGRES_USER: ml_user
      POSTGRES_PASSWORD: ml_password
    volumes:
      - ./db/init.sql:/docker-entrypoint-initdb.d/init.sql
    ports:
      - "5432:5432"

2. Create the Machine Learning Service

Inside the ml_service/ folder, we create a Dockerfile that will install the necessary Python dependencies, train a model, and expose the service.

Dockerfile (for the ML service)

FROM python:3.8

WORKDIR /app

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

COPY . .

CMD ["python", "app.py"]

requirements.txt
Here we define the dependencies we will use, such as Flask for creating the web server and scikit-learn for the ML model.

Flask==2.1.0
scikit-learn==1.0.2
psycopg2-binary==2.9.3  # Para conectar con PostgreSQL

model.py
This file contains the code to train the machine learning model. We’ll use a simple classification model like Logistic Regression.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import pickle

def train_model():
    # Cargar dataset de ejemplo
    iris = load_iris()
    X, y = iris.data, iris.target

    # Entrenar modelo
    model = LogisticRegression()
    model.fit(X, y)

    # Guardar el modelo en un archivo
    with open('model.pkl', 'wb') as f:
        pickle.dump(model, f)

if __name__ == "__main__":
    train_model()

This code trains a simple classification model and saves it as a model.pkl file.

app.py
This is the Flask file that creates the API to make predictions using the trained model. We will also store predictions in the database.

from flask import Flask, request, jsonify
import pickle
import psycopg2

app = Flask(__name__)

# Cargar el modelo de ML entrenado
with open('model.pkl', 'rb') as f:
    model = pickle.load(f)

# Conectar con la base de datos PostgreSQL
conn = psycopg2.connect(
    dbname="ml_results", user="ml_user", password="ml_password", host="db"
)
cur = conn.cursor()

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    X_new = [data['features']]

    # Hacer predicción
    prediction = model.predict(X_new)[0]

    # Guardar predicción en la base de datos
    cur.execute("INSERT INTO predictions (input, result) VALUES (%s, %s)", (str(X_new), int(prediction)))
    conn.commit()

    return jsonify({"prediction": int(prediction)})

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=5000)

This Flask service listens on port 5000 and, upon receiving a POST request with input features, returns a prediction and saves the result in PostgreSQL.

Set Up the Database

In the db/ directory, we create an init.sql file to initialize the database with a table for storing predictions.

init.sql

CREATE TABLE predictions (
    id SERIAL PRIMARY KEY,
    input TEXT,
    result INTEGER
);

This script will automatically run when the PostgreSQL container starts and will create a table named predictions.

Running the Project

Now that everything is set up, we can run the entire project using Docker Compose. From the project’s root directory, run the following command: $ docker-compose up

This will have Docker Compose:

Build the machine learning service image.
Start the ML and database containers.
Run the ML model and the Flask web server on port 5000.

5. Testing the API

To make a prediction, you can send a POST request to http://localhost:5000/predict with a JSON body containing the dataset features.

Example curl command: $ curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d '{"features": [5.1, 3.5, 1.4, 0.2]}'

If everything is configured correctly, you will receive a response like this:

{
  "prediction": 0
}

Conclusion

With Docker Compose, we’ve created a machine learning project that includes a web service for making predictions with an ML model, as well as a PostgreSQL database to store results. Docker Compose simplifies managing these services in both development and production environments, allowing you to work with multiple containers in a coordinated way.

This example is just a starting point, and you can expand it by adding more services, connecting other machine learning models, or integrating tools like Redis for caching or Celery for asynchronous tasks.

Now you're ready to use Docker Compose in more complex machine learning projects!

Autor: Yhary Arias.
LinkedIn: @yharyarias
Instagram: @ia.fania