DEV Community

Cover image for Building a Logistic Regression Model with Python & Running the Project in Docker: A Hands-On Guide
Jabastin .A
Jabastin .A

Posted on

Building a Logistic Regression Model with Python & Running the Project in Docker: A Hands-On Guide

Building a Logistic Regression Model with Python: A Hands-On Guide

Logistic regression is one of the most commonly used machine learning algorithms for classification problems. In this post, we will walk through the steps to implement a simple logistic regression model on the famous Iris dataset, visualize some data characteristics, and save the trained model for future use.

Download this Dataset

Image description

Step 1: Loading the Dataset

First, we load the Iris dataset, which is stored in a CSV file. The dataset contains 150 rows and 5 columns, where each row represents a flower sample and includes measurements like Sepal Length, Sepal Width, Petal Length, and Petal Width, along with the species of the flower.

from pandas import read_csv
filename = "Iris.csv"
data = read_csv(filename)
Enter fullscreen mode Exit fullscreen mode

Step 2: Displaying Dataset Information

After loading the dataset, it's essential to understand its structure. We can check the dataset's shape and preview the first 20 rows to get a feel of the data.

print("Shape of the dataset:", data.shape)
print("First 20 rows:\n", data.head(20))
Enter fullscreen mode Exit fullscreen mode

Step 3: Visualizing Data with Histograms

Before building a model, it's crucial to visualize the data. Histograms give us a quick overview of the distribution of each feature. We plot and save them silently to avoid displaying them in the notebook.

from matplotlib import pyplot
data.hist()
pyplot.savefig("histograms.png")
pyplot.close()
Enter fullscreen mode Exit fullscreen mode

Step 4: Density Plot for Data Distribution

Next, we plot density plots to better understand the distribution of each feature. This can help us detect skewness or identify any patterns that might be useful during training.

data.plot(kind='density', subplots=True, layout=(3,3), sharex=False)
pyplot.savefig("density_plots.png")
pyplot.close()
Enter fullscreen mode Exit fullscreen mode

Step 5: Preparing the Data for Training

Now, we convert the dataset into a NumPy array and separate the features (the measurements) from the labels (species). This will allow us to feed the data into the logistic regression model.

array = data.values
X = array[:, 1:5]  # Features: Sepal/Petal measurements
Y = array[:, 5]    # Target: Species
Enter fullscreen mode Exit fullscreen mode

Step 6: Splitting the Data

We split the data into training (67%) and testing (33%) sets to evaluate our model's performance. The train_test_split function from sklearn handles this.

from sklearn.model_selection import train_test_split
test_size = 0.33
seed = 7
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
Enter fullscreen mode Exit fullscreen mode

Step 7: Building and Training the Logistic Regression Model

We now create a logistic regression model and train it using the training data. We set max_iter=200 to ensure the model has enough iterations to converge.

from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter=200)
model.fit(X_train, Y_train)
Enter fullscreen mode Exit fullscreen mode

Step 8: Evaluating Model Accuracy

Once the model is trained, we evaluate its performance on the testing set. We calculate the accuracy to see how well the model generalizes.

result = model.score(X_test, Y_test)
print("Accuracy: {:.2f}%".format(result * 100))
Enter fullscreen mode Exit fullscreen mode

Step 9: Saving the Trained Model

Finally, we save the trained model using joblib so we can load it later for predictions without retraining.

import joblib
joblib.dump(model, "logistic_model.pkl")
Enter fullscreen mode Exit fullscreen mode

Running the Project in Docker

Prerequisite

  • Download Docker Desktop
  • Run the App

Create the Project Structure

Create a folder named iris_logistic_regression/ and place the following files inside:

iris_logistic_regression/
│
├── Iris.csv
├── model.py
├── requirements.txt
└── Dockerfile
Enter fullscreen mode Exit fullscreen mode

Dockerfile code

FROM python:3.9-slim

RUN pip install pandas scikit-learn matplotlib

COPY . .

CMD ["python","iris.py"]
Enter fullscreen mode Exit fullscreen mode

Build the Docker Image

docker build -t iamjabastin/24mcr037-ml:latest .
Enter fullscreen mode Exit fullscreen mode

Check the image

docker images
Enter fullscreen mode Exit fullscreen mode

Push the image

docker push iamjabastin/24mcr037-ml
Enter fullscreen mode Exit fullscreen mode

Open the Docker Desktop

Image description

Run the Docker Image

Image description
Docker Image runs successfully with the output.

Sentry image

Make it make sense

Only the context you need to fix your broken code with Sentry.

Start debugging →

Top comments (0)

👋 Kindness is contagious

Engage with a wealth of insights in this thoughtful article, cherished by the supportive DEV Community. Coders of every background are encouraged to bring their perspectives and bolster our collective wisdom.

A sincere “thank you” often brightens someone’s day—share yours in the comments below!

On DEV, the act of sharing knowledge eases our journey and forges stronger community ties. Found value in this? A quick thank-you to the author can make a world of difference.

Okay