We'll start with the questions on your minds right now.
What is Logistic Regression?
- Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable.
- In logistic regression, the dependent variable is a binary variable that contains data coded as
1(yes, success, etc.) or
0(no, failure, etc.).
It looks like this:
What you should depict from this image is that in logistic regression, your data is classified into
If you've been following up with the series. Just know that this is a special one because today you're gonna do the Feature Extraction by yourself.
The question that's probably on your mind if you've not been following up with the series:
What is Feature Extraction?
Feature extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing.
In other terms, it is the act of selecting useful features from a dataset and dumping the rest.
Click here to download the dataset we're gonna be using today. Normally, once you click on the link it starts downloading but as I said this article is different. Since you're doing the Feature Extraction yourself, you'll have to know which feature's you're gonna select. This means that you'll have to study the attribute information yourself.
We're gonna make a model that would be able to predict if someone has heart disease or doesn't.
We're gonna start coding now
import pandas as pd import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split
df = pd.read_csv('heart.csv') df.head()
This is where you do your research check which features are important.
When a large amount of data is at hand, a set of samples can be set aside to evaluate the final model. The "training" data set is the general term for the samples used to create the model, while the “test” or “validation” data set is used to qualify performance.
train_data, validation_data, train_labels, validation_labels = train_test_split( data, labels, train_size=0.8, test_size=0.2, random_state=1)
train_sizeis how big or small you want your training set to be. This is the same for
random_stateis basically used for reproducing your problem the same every time it is run.
model = LogisticRegression() model.fit(train_data,train_labels) print(model.score(validation_data,validation_labels))
The score is not too bad but it's not good.
Now it's time to make a prediction, using the features that you've picked.
You can visit Kaggle to find more datasets that you can perform Logistic Regression on.
Feel free to ask questions in the comments.
GOOD LUCK 👍