DEV Community

Cover image for Building A Heart Disease Prediction Model Using Machine Learning
 Oluseye Jeremiah
Oluseye Jeremiah

Posted on

Building A Heart Disease Prediction Model Using Machine Learning

In the dynamic world of healthcare, we’re witnessing a groundbreaking shift towards using technology to better understand and tackle life-threatening conditions. One such leap forward is the integration of machine learning (ML) in predicting heart disease, a pervasive threat that claims lives globally. In this article, we embark on a journey to explore the development of an innovative ML model, aiming to redefine how we approach and safeguard cardiovascular health.

Heart disease, with its various complexities affecting the heart and blood vessels, calls for a proactive approach to healthcare. While traditional risk assessments have been helpful, the rise of ML holds the promise of heightened precision and accuracy. This article dives into the process of creating a predictive model that can identify potential heart issues before symptoms emerge, leveraging algorithms and vast datasets.

Come along as we unravel the intricate steps involved in building a machine-learning model for heart disease prediction. We’ll explore the pivotal roles of data collection, feature engineering, model selection, and validation strategies. This article not only sheds light on the technical side of ML but also emphasizes the profound impact these innovations can have on reshaping the landscape of preventive healthcare.

Picture a future where data-driven insights empower healthcare professionals to intervene early, potentially saving lives and fostering a healthier society. As we delve into the world of predictive analytics for heart disease, let’s envision a human-centric approach that prioritizes well-being and brings us one step closer to a healthier tomorrow.

For our text editor, we’ll be using DeepNote

Deepnote is a cloud-based collaborative workspace for data science and analytics teams. Think of it as a supercharged Jupyter notebook with built-in features for:

Seamless collaboration: Multiple users can work on notebooks together in real time, see each other’s changes, and even chat within the platform.
Powerful data analysis: It combines code blocks, SQL queries, and visualization tools, enabling teams to explore and analyze data efficiently.
Easy sharing and documentation: Notebooks can be easily shared with colleagues and stakeholders, and version control ensures everyone’s on the same page.
Beautiful dashboards and reports: Create interactive dashboards and reports to present findings clearly and compellingly.
Integrated tools and extensions: Connect to popular data sources, libraries, and cloud platforms directly within Deepnote.
Overall, Deepnote streamlines data science workflows, fosters collaboration, and empowers teams to turn data into actionable insights. It’s a popular choice for organizations looking to boost their data science productivity and impact.

Getting Started

The first step involves installing all dependencies

Image description

Step 2: This involves loading the dataset to begin data preprocessing

Image description

Step 3: To better understand the dataset, we view the first 10 rows of the dataset

Image description

we also use the .describe() feature to get insights from the dataset

Image description

Step 4: This code groups the diabetes dataset by the ‘Outcome’ column and then calculates the mean of each group. The ‘Outcome’ column usually represents the categories or classes, in this case, it could be ‘0’ for non-diabetic and ‘1’ for diabetic. By using the mean() function, it computes the average values for each feature or column for each outcome. Therefore, the resulting output would include the average of all the columns of the dataset for each outcome (0 and 1).

Image description

Step 5: This code is used to separate the features and target from the ‘diabetes_dataset’ data frame. The drop() function is used to remove the ‘Outcome’ column from the data frame. The resultant data frame, which contains all columns other than 'Outcome', is assigned to ‘X’, which will serve as a feature matrix for the machine learning model. ‘Y’ is assigned the ‘Outcome’ column from the 'diabetes_dataset', which acts as a target variable. This will be used to train the machine-learning model. The ‘Outcome’ column typically contains the label or result that the model will attempt to predict.

Image description

Step 6: This line of code splits the dataset into a training set and a testing set using the function ‘train_test_split()’ from the sklearn library. The ‘test_size’ parameter is set to 0.2 which means 20% of the data will be used for testing and the rest 80% will be used for training the model. The ‘stratify’ parameter is set to Y which means the train-test split will be made in such a way that the proportion of values in the sample produced will be the same as the proportion of values provided in the ‘Outcome’ column. The ‘random_state’ parameter is set to 2, which ensures that the splits you generate are reproducible and affects the randomness of the training and testing indices produced.

Image description

Image description

From the above table, we can see the pregnacies by age and the effect it has on heart disease.

Step 7: This line of code is printing the shape of three different data frames: ‘X’, ‘X_train’, and ‘X_test’. The shape of a data frame is a tuple that contains the number of rows and columns in the data frame. ‘X’ is the data frame that contains the entire feature set; ‘X_train’ contains the features for the training set; and ‘X_test’ contains the features for the test set. The output would be three tuples, each representing the number of rows and columns for the respective data frame.

Image description

Step 8: The next step involves initiating a Support Vector Machine (SVM) classifier from the'svm' function found in the'sklearn' Python library. The SVM classifier’s kernel is set to ‘linear’. In essence, this piece of code creates a linear SVM model that can be trained using the ‘fit’ function on a labeled dataset to be able to classify new, previously unseen data into pre-defined categories. The performance of the trained classifier can be evaluated using different metrics applicable to classification problems.

Image description

Moving on, the piece of code below is responsible for training the Support Vector Machine (SVM) classifier on the training data. The 'classifier.fit(X_train, Y_train)' method is called, where ‘X_train’ is the set of input features for the training data and ‘Y_train’ is the output label for those input features. The model learns from this data, and this learned model can further be used to make predictions on unseen data.

Image description

Step 9: We then calculate the accuracy of the predictive model on the training data. The ‘classifier.predict(X_train)’ function generates predictions for the training data based on the trained model, and the results are stored in ‘X_train_prediction’. The ‘accuracy_score()’ function from the sklearn library is then utilized to compare these predictions with the actual labels (‘Y_train’) to compute the accuracy of the model. The calculated accuracy score is stored in ‘training_data_accuracy’.

Image description

We can see that the accuracy of our training data is manageable. We can then go on to build the prediction model

We can see that our prediction model works, and the patient here is diabetic.

Image description

Conclusion

In our pursuit to predict heart disease through machine learning, the constructed model stands as a beacon of hope, promising a transformative approach to preventive healthcare. The model’s accuracy, validated through meticulous calibration and training, reflects its efficacy in discerning subtle patterns for early detection. Precision, recall, and F1 scores affirm its reliability in identifying potential cardiovascular issues. However, acknowledging the dynamic nature of healthcare data, ongoing refinement is crucial for the model’s adaptability. Beyond numerical accuracy, the model symbolizes our collective commitment to a future where predictive analytics becomes a transformative force, guiding us toward early intervention and shaping a healthier tomorrow.

Top comments (0)