DEV Community

Cover image for Complete Guide to Deploying Machine Learning Models with Flask and Docker(NO fluff configure and run like a pro)
Mendy Kevin
Mendy Kevin

Posted on

Complete Guide to Deploying Machine Learning Models with Flask and Docker(NO fluff configure and run like a pro)

Hello all! Welcome. This article addresses the technical aspects of deploying Machine Learning models that use Logistic Regression, a linear model used to make predictions based on trained data. I promise you'll be technical like a pro in configuring machine learning models , so stick around till the end.

What You'll Learn

  • Packaging models with Pickle
  • Serving ML models with Flask
  • Containerizing apps with Docker

- Exposing inference endpoints in Docker


The Big Picture: Understanding ML Model Deployment

Let's understand the overall workflow of deploying a machine learning model:

  1. Save the Model: Start by taking your Jupyter notebook where the model resides and save it to a file with a .bin extension.

  2. Load as a Web Service: Load this model from a different process (using a Python script) in a web service—for example, a "churn service" that predicts a customer's churn rate. We'll use Flask to transform the model into a web service.

  3. Isolate Python Dependencies: Use pipenv (similar to conda or pip) to isolate the dependencies for this service and prevent interference with other services on your machine.

  4. Isolate System Dependencies: Add another layer using Docker to isolate system dependencies.

  5. Deploy to the Cloud: Once the local setup is complete, deploy the service to the cloud. You can use any cloud platform, but we'll use AWS Elastic Beanstalk (EB) for this tutorial.


Setting Up the Environment

Before training our model, we need to ensure our development environment has the right dependencies without interfering with other projects (which might require different versions of scikit-learn, pandas, etc.). We'll use pipenv for this.

Installing Pipenv

Note: If you have Anaconda installed and added to your system variables, it will automatically activate the (base) conda environment when you open a new shell/terminal. We don't want to install pipenv inside conda. Instead, we'll install it globally.

# Deactivate conda
conda deactivate

# Install uv (a faster Python package manager)
pip install uv

# Install pipenv globally
uv pip install pipenv
Enter fullscreen mode Exit fullscreen mode

Creating a Pipenv Virtual Environment

Once you're in your project directory, manage all Python libraries and dependencies via pipenv:

# Create a pipenv virtual environment
# This automatically creates Pipfile and Pipfile.lock
pipenv --python 3.12

# Activate the virtual environment
pipenv shell

# Install all requirements for your project
# (Using scikit-learn==1.5.1 because this project requires this specific version)
pipenv install flask scikit-learn==1.5.1 numpy pandas requests

# Note: pickle is built into Python, no need to install it separately

# Check dependencies
pipenv graph

# Update Pipfile.lock according to your current Pipfile
pipenv lock
Enter fullscreen mode Exit fullscreen mode

Pipenv Installation

Launching Jupyter Notebook

After installation, launch Jupyter notebook inside your virtual environment:

# If you prefer VS Code
code .

# Or if you have Anaconda installed
jupyter lab
# or
jupyter notebook
Enter fullscreen mode Exit fullscreen mode

If done correctly, you'll see your virtual environment in the Jupyter launcher.

Jupyter Environment


Training the Model

Let's look at how we trained our model. (This isn't the primary focus, so I'll keep it brief.)

Making Necessary Imports

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold

from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
Enter fullscreen mode Exit fullscreen mode

Data Preparation

# Read and prepare data
df = pd.read_csv('data-week-3.csv')

# Make column names homogeneous
df.columns = df.columns.str.lower().str.replace(' ', '_')

# Handle categorical columns
categorical_columns = list(df.dtypes[df.dtypes == 'object'].index)

for c in categorical_columns:
    df[c] = df[c].str.lower().str.replace(' ', '_')

# Handle numerical data
df.totalcharges = pd.to_numeric(df.totalcharges, errors='coerce')
df.totalcharges = df.totalcharges.fillna(0)

# Convert target variable to binary
df.churn = (df.churn == 'yes').astype(int)

# Define feature types
numerical = ['tenure', 'monthlycharges', 'totalcharges']

categorical = [
    'gender', 'seniorcitizen', 'partner', 'dependents',
    'phoneservice', 'multiplelines', 'internetservice',
    'onlinesecurity', 'onlinebackup', 'deviceprotection', 
    'techsupport', 'streamingtv', 'streamingmovies', 
    'contract', 'paperlessbilling', 'paymentmethod'
]
Enter fullscreen mode Exit fullscreen mode

Data Splitting

# Split into training and test sets
df_full_train, df_test = train_test_split(df, test_size=0.2, random_state=1)
Enter fullscreen mode Exit fullscreen mode

Define Training Function

Important: To apply the model later, we need to return both the DictVectorizer and the model. Otherwise, the function will return None.

def train(df_train, y_train, C=1.0):
    dicts = df_train[categorical + numerical].to_dict(orient='records')

    dv = DictVectorizer(sparse=False)
    X_train = dv.fit_transform(dicts)

    model = LogisticRegression(C=C, max_iter=1000)
    model.fit(X_train, y_train)

    return dv, model
Enter fullscreen mode Exit fullscreen mode

Define Prediction Function

def predict(df, dv, model):
    dicts = df[categorical + numerical].to_dict(orient='records')

    X = dv.transform(dicts)
    y_pred = model.predict_proba(X)[:, 1]

    return y_pred
Enter fullscreen mode Exit fullscreen mode

K-Fold Cross Validation

C = 1.0
n_splits = 5
kfold = KFold(n_splits=n_splits, shuffle=True, random_state=1)

scores = []

for train_idx, val_idx in kfold.split(df_full_train):
    df_train = df_full_train.iloc[train_idx]
    df_val = df_full_train.iloc[val_idx]

    y_train = df_train.churn.values
    y_val = df_val.churn.values

    dv, model = train(df_train, y_train, C=C)
    y_pred = predict(df_val, dv, model)

    auc = roc_auc_score(y_val, y_pred)
    scores.append(auc)

print('C=%s %.3f +- %.3f' % (C, np.mean(scores), np.std(scores)))
Enter fullscreen mode Exit fullscreen mode

Saving the Model with Pickle

We use 'wb' (Write Binary) mode to save the model. Include the DictVectorizer in your file so that when you load the model in your churn service, you can convert customer data from a dictionary into a feature matrix (which the model requires for predictions).

import pickle

output_file = 'model_C=1.0.bin'

with open(output_file, 'wb') as f_out:
    pickle.dump((dv, model), f_out)
Enter fullscreen mode Exit fullscreen mode

Creating the Churn Service

Use a Python script for this. Load the model and make sure to use the POST HTTP method since we need to send information to the web service.

predict.py:

import pickle
from flask import Flask, request, jsonify

model_file = 'model_C=1.0.bin'

# Load the model
with open(model_file, 'rb') as f_in:
    dv, model = pickle.load(f_in)

app = Flask('churn')

@app.route('/predict', methods=['POST'])
def predict():
    # Get customer data from JSON request
    customer = request.get_json()

    # Transform and predict
    X = dv.transform([customer])
    y_pred = model.predict_proba(X)[0, 1]
    churn = y_pred >= 0.5

    result = {
        'churn_probability': float(y_pred),  # Convert to native Python type
        'churn': bool(churn)  # Convert to native Python type
    }

    return jsonify(result)

if __name__ == "__main__":
    app.run(debug=True, host='0.0.0.0', port=9696)
Enter fullscreen mode Exit fullscreen mode

Running the Service

Launch your app on your local server. The --host=0.0.0.0 flag makes the server publicly available. The --debug flag auto-reloads the app when you save changes.

flask --app predict.py run --debug --host=0.0.0.0
Enter fullscreen mode Exit fullscreen mode

Querying the Service

Now we'll send a POST request to our server with customer details and receive a churn prediction. Use a Jupyter notebook or separate Python script:

import requests

url = 'http://localhost:9696/predict'

# Sample test data from the test dataset
customer = {
    'customerid': '4183-myfrb',
    'gender': 'female',
    'seniorcitizen': 0,
    'partner': 'no',
    'dependents': 'no',
    'tenure': 21,
    'phoneservice': 'yes',
    'multiplelines': 'no',
    'internetservice': 'fiber_optic',
    'onlinesecurity': 'no',
    'onlinebackup': 'yes',
    'deviceprotection': 'yes',
    'techsupport': 'no',
    'streamingtv': 'no',
    'streamingmovies': 'yes',
    'contract': 'month-to-month',
    'paperlessbilling': 'yes',
    'paymentmethod': 'electronic_check',
    'monthlycharges': 90.05,
    'totalcharges': 1862.9
}

response = requests.post(url, json=customer).json()
print(response)

if response['churn'] == True:
    print(f'Sending promo email to {customer["customerid"]}')
else:
    print(f'Not sending promo email to {customer["customerid"]}')
Enter fullscreen mode Exit fullscreen mode

The server responds with a 200 OK response:

Server Response

The query was successful:

Query Result


Docker Time!

Understanding Docker Components

DOCKERFILE: A text file (usually named Dockerfile) containing a series of instructions for building Docker images. Each line represents a new instruction, forming a stack of layers. Each layer is cacheable—when you build an image twice, it uses the cache. When you change a line, it rebuilds all instructions after and including the change.

IMAGE: The output of building a Dockerfile. Think of it as an executable—just like clicking an icon launches an application, you start an image to launch a container. The image encapsulates your application code and all dependencies, ensuring consistency across environments.

CONTAINER: A dynamic, running instance of a Docker image. One image can spawn many containers. On Linux, containers run as processes on the host machine. On Windows/macOS, Docker runs in a VM. Containers share the kernel but have isolated file systems—they appear like VMs but are much lighter.

Creating a Dockerfile

# Create a new Dockerfile
touch Dockerfile

# Open in your editor
code Dockerfile
Enter fullscreen mode Exit fullscreen mode

Important Notes:

  • Make sure your Python script name matches the name in the ENTRYPOINT layer
  • Gunicorn is only for Unix-based systems; use Waitress for Windows
  • Leave a space after every line of code

Dockerfile:

FROM python:3.12-slim

RUN pip install pipenv

WORKDIR /app

# Copy dependency files
COPY ["Pipfile", "Pipfile.lock", "./"]

RUN pipenv install --system --deploy

# Copy application files
COPY ["LOAD.py", "model_C=1.0.bin", "./"]

EXPOSE 9696

# Start the application with Gunicorn
ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:9696", "LOAD:app"]
Enter fullscreen mode Exit fullscreen mode

CMD vs ENTRYPOINT

ENTRYPOINT defines the main command that must always run—it's like the container's executable.

ENTRYPOINT ["python", "app.py"]
Enter fullscreen mode Exit fullscreen mode

Running docker run myapp executes python app.py. You can pass parameters: docker run myapp --debug becomes python app.py --debug.

CMD defines default arguments or a fallback command that can be completely overridden.

CMD ["python", "app.py"]
Enter fullscreen mode Exit fullscreen mode

Running docker run myapp bash overrides CMD and runs bash instead.

Combining Both:

ENTRYPOINT ["python", "app.py"]
CMD ["--port=9696"]
Enter fullscreen mode Exit fullscreen mode
  • docker run myapppython app.py --port=9696
  • docker run myapp --debugpython app.py --debug (overrides CMD only)

ENTRYPOINT can only be overridden with the --entrypoint flag.

Multistage Dockerfiles

For Machine Learning applications, you often need a large environment to build/train your model but only a small runtime environment to serve predictions. Multistage Dockerfiles help create:

  • Smaller images – no unused dependencies
  • More secure – fewer libraries = less attack surface
  • Faster deployment – smaller images push/pull faster
  • Better maintainability – clean separation of concerns

Building and Running the Container

Build the Docker image:
churn-prediction is just a name for the docker image you can literally call it anything. But make sure the name of the build image is the same you pass to the docker run command.

docker build -t churn-prediction .
Enter fullscreen mode Exit fullscreen mode

Docker Build

Run the container:
PS: The errors you are seing on terminal of a running docker instance is because I somehow installed scikit-learn==1.6.1 instead of 1.5.1 but otherwise our model is working just fine.


docker run -it --rm -p 9696:9696 churn-prediction
Enter fullscreen mode Exit fullscreen mode

Docker Commands Reference

Managing Images

# List all local images
docker image ls

# Run an image
docker run churn-prediction
Enter fullscreen mode Exit fullscreen mode

Docker first looks at the local registry for images. If not found locally, it checks Docker Hub. You can also use custom registries:

# Pull from a custom registry
docker run https://registrydomain.com/repository-server:0.1.0
Enter fullscreen mode Exit fullscreen mode

Managing Containers

List containers:

# List running containers
docker container ls
# or
docker ps

# List all containers (including stopped)
docker container ls --all
Enter fullscreen mode Exit fullscreen mode

Start, stop, and remove containers:

# Stop a container
docker container stop <container-id>

# Restart a container
docker container restart <container-id>

# Remove a stopped container
docker container rm <container-id>

# Kill a running container
docker kill <container-id>
Enter fullscreen mode Exit fullscreen mode

Cleanup Commands

# Remove unused containers, networks, and images
docker system prune

# Remove everything including unused images
docker system prune -a

# Also remove volumes (deletes data!)
docker system prune -a --volumes
Enter fullscreen mode Exit fullscreen mode

Accessing Containers

For debugging purposes:

# Access a container's shell
docker exec -it <container-id> bash
Enter fullscreen mode Exit fullscreen mode


Deploying to AWS Elastic Beanstalk

Now that we have our containerized application, let's deploy it to AWS Elastic Beanstalk using the EB CLI.

Prerequisites

Before deploying, ensure you have:

  • An AWS account
  • AWS credentials configured
  • A credit card to verify and activate your account
  • Your Docker image working locally

Other free alternatives include render, railway,fly.io, heroku etc. But the deployment process is almost pretty follows the same logic only that you have to check out their documentation to know the commands they use. However, for now let's use aws elastic beanstalk service.

Installing the EB CLI

First, install the Elastic Beanstalk CLI:

# Install EB CLI using pipenv (recommended for this project)
pipenv install awsebcli --dev

# Or install globally using pip
pip install awsebcli
Enter fullscreen mode Exit fullscreen mode

Verify the installation:

eb --version
Enter fullscreen mode Exit fullscreen mode

Configuring AWS Credentials

If you haven't configured your AWS credentials yet:

# Configure AWS CLI
aws configure
Enter fullscreen mode Exit fullscreen mode

You'll be prompted to enter:

  • AWS Access Key ID
  • AWS Secret Access Key
  • Default region (e.g., us-east-1)
  • Default output format (e.g., json)

Initializing Elastic Beanstalk

Navigate to your project directory and initialize EB:

# Initialize Elastic Beanstalk
eb init -p docker -r us-east-1 churn-prediction
Enter fullscreen mode Exit fullscreen mode

Flags explained:

  • -p docker: Specifies the platform (Docker)
  • -r us-east-1: AWS region (choose your preferred region)
  • churn-prediction: Your application name

You'll be prompted with questions:

  1. Select your region
  2. Enter application name (or accept default)
  3. Choose whether to use CodeCommit (typically select "no")
  4. Set up SSH for your instances (recommended for debugging)

Creating an Environment

Create an environment to run your application:

# Create an environment named 'churn-prediction-env'
eb create churn-prediction-env
Enter fullscreen mode Exit fullscreen mode

This process takes several minutes as AWS:

  • Creates an EC2 instance
  • Sets up load balancers
  • Configures security groups
  • Deploys your Docker container

You'll see real-time logs of the deployment process.

Monitoring Deployment Status

Check the status of your environment:

# Check environment status
eb status

# View recent events
eb events

# Follow logs in real-time
eb logs --stream
Enter fullscreen mode Exit fullscreen mode

Testing Your Deployed Application

Once deployment is complete, get your application's URL:

# Open your application in a browser
eb open
Enter fullscreen mode Exit fullscreen mode

Or manually test the endpoint:

import requests

# Replace with your actual EB URL
url = 'http://churn-prediction-env.us-east-1.elasticbeanstalk.com/predict'

customer = {
    'customerid': '4183-myfrb',
    'gender': 'female',
    'seniorcitizen': 0,
    'partner': 'no',
    'dependents': 'no',
    'tenure': 21,
    'phoneservice': 'yes',
    'multiplelines': 'no',
    'internetservice': 'fiber_optic',
    'onlinesecurity': 'no',
    'onlinebackup': 'yes',
    'deviceprotection': 'yes',
    'techsupport': 'no',
    'streamingtv': 'no',
    'streamingmovies': 'yes',
    'contract': 'month-to-month',
    'paperlessbilling': 'yes',
    'paymentmethod': 'electronic_check',
    'monthlycharges': 90.05,
    'totalcharges': 1862.9
}

response = requests.post(url, json=customer).json()
print(response)
Enter fullscreen mode Exit fullscreen mode

Updating Your Application

When you make changes to your code:

# Deploy updates
eb deploy

# Monitor deployment
eb status
Enter fullscreen mode Exit fullscreen mode

Environment Configuration

You can modify environment variables and settings:

# Set environment variables
eb setenv FLASK_ENV=production MODEL_VERSION=1.0

# View current configuration
eb config
Enter fullscreen mode Exit fullscreen mode

Scaling Your Application

Scale your application based on traffic:

# Enable auto-scaling (via AWS Console or CLI)
# Minimum 1 instance, maximum 4 instances
eb scale 2  # Set to 2 instances

# Or configure auto-scaling through the console
Enter fullscreen mode Exit fullscreen mode

Monitoring and Logs

Access logs and monitoring:

# Download logs
eb logs

# Stream logs in real-time
eb logs --stream

# View health status
eb health
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Common Issues

Issue: Deployment fails

# Check detailed logs
eb logs

# SSH into the instance for debugging
eb ssh
Enter fullscreen mode Exit fullscreen mode

Issue: Health status is degraded

  • Check if the application is responding on the correct port (9696)
  • Verify the Docker container is running
  • Check environment variables

Issue: Connection timeout

  • Ensure security groups allow inbound traffic on port 80/443
  • Verify the load balancer health check settings

Cost Management

Important: Elastic Beanstalk environments incur costs. To avoid unnecessary charges:

# Terminate the environment when not in use
eb terminate churn-prediction-env

# Or just stop it temporarily (still incurs some costs)
eb stop
Enter fullscreen mode Exit fullscreen mode

Configuration Files

For more control, create an .ebextensions directory with configuration files:

.ebextensions/01_flask.config:

option_settings:
  aws:elasticbeanstalk:application:environment:
    PYTHONPATH: "/var/app/current:$PYTHONPATH"
  aws:elasticbeanstalk:container:python:
    WSGIPath: predict:app
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Use environment variables for sensitive data (API keys, database credentials)
  2. Set up health checks to ensure your application is responding correctly
  3. Enable logging to CloudWatch for better monitoring
  4. Use HTTPS by configuring SSL certificates
  5. Implement auto-scaling based on your traffic patterns
  6. Tag your resources for better cost tracking

Useful EB CLI Commands Summary

# Initialize EB in your project
eb init

# Create a new environment
eb create <environment-name>

# Deploy application updates
eb deploy

# Open application in browser
eb open

# Check environment status
eb status

# View logs
eb logs

# SSH into instance
eb ssh

# Terminate environment
eb terminate

# List all environments
eb list

# Set environment variables
eb setenv KEY=VALUE

# Scale application
eb scale <number-of-instances>
Enter fullscreen mode Exit fullscreen mode

Conclusion

You've successfully:

  • Trained a machine learning model
  • Packaged it with Pickle
  • Created a Flask web service
  • Containerized the application with Docker
  • Deployed it to AWS Elastic Beanstalk

Your churn prediction model is now production-ready and accessible via a public URL. The containerized approach ensures consistency across environments, and AWS Elastic Beanstalk handles scaling, monitoring, and infrastructure management automatically.


Additional Resources

Top comments (0)