Domain Knowledge
In this case, we train our model with several medical informations such as the blood glucose level, insulin level of patients along with whether the person has diabetes or not so this act as labels whether that person is diabetic or non-diabetic so this will be label for this case.
So once, we feed this data to our support vector machine model. What happens is our model tries to plot the data in a graph.
Once it plots the data, it tries to find a hyperplane so in this image you can see a hyper plane. So, what happens is that this hyperplane separates these two datas.
Then once, we feed new data to this model, it tries to put that particular data in either of these two groups. With that, it can predict whether the person will be diabetic or non-diabetic.
"Okay so in this case, we use several medical information such as bmi, blood glucose level, insulin level of patient and etc."
Requirements for this project
→ Python
→ Support Vector Machine
First of all, we will talk about Support Vector Machine.
SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems.
However, primarily, it is used for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane.
Workflow
Diabetes Data→ Take data from hospitals or patients.
Data Preprocessing → We need to preprocess data where we will try to analyze the data and this data won't be very suitable to feed to the machine learning model and we need to standardize this data at first because there is a lot of medical information and we want all these data to be on the same bridge. So, what we do is to standardize all of this data so that all this data will lie in the same range.
And all these things will be done in the data pre-processing.
Train-test-split → We will train our machine learning model with training data and then we will try to find the accuracy score of our model with the help of test data. So simply, it will tell us how well our model is performing.
So once, we had done with splitting of data into training and testing data.
Support Vector Machine Classifier → Now, we will feed data to our SVM classifier model. We will be using a classifier model where this model will classify whether the patient is diabetic or non-diabetic.
Once, we have trained it. We'll have a trained SVM classifier. So, when we give a new data. So, it can now predict whether the patient is diabetic or non-diabetic.
Code Summary :
# Importing Dependencies
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler #(standardize data to common range)
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score
# Loading the diabetes dataset to pandas dataframe
df=pd.read_csv("diabetes.csv")
# all data except labels
X= df.drop(columns="Outcome",axis=1)
y=df["Outcome"]
# train test split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,stratify=y, random_state=2)
# Model Training
classifier=svm.SVC(kernel='linear')
classifier.fit(X_train,y_train)
Github Link for Jupyter File and Dataset : Click Here
Top comments (0)