DEV Community

Dvir Segal
Dvir Segal

Posted on • Edited on • Originally published at Medium

1

ML.NET: Heart disease prediction

[Heart Attack Stroke Disease — Free image on Pixabay](https://pixabay.com/illustrations/heart-attack-stroke-heart-disease-3177360/)


Originally published at Medium on

Microsoft announced ML.NET last May, and as an advocate user of the .NET framework with experience in Machine Learning, I knew that I’d have to give it a try knowingly that Python various frameworks (such as scikit-learn) rule this domain.

ML.NET is a free, cross-platform, open source machine learning framework explicitly made for .NET developers. The preview release includes learners to handle binary classification, multi-class classification, and regression tasks. Additional ML tasks like a recommendation system, clustering, anomaly detection, ranking models, and deep learning architectures have been added.

In this blog-post, my purpose is to play along with ML.NET capabilities and eventually demonstrating its ease of use. My goal is to predict a heart disease for a given patient based on an opensource dataset.

ML.NET logo by Wikimedia Commons

Getting started

Assuming Visual Studio 2017 is installed on your system, open a new console application (.NET core) project:

Afterward, go to the ‘Tools’ menu and choose ‘Manage NuGet Packages for Solution…’:

There, browse for ‘Microsoft.ML’ and choose the latest stable version (at the time of writing these lines, the stable version is 0.11.0) to install it for the current solution. As a side note, if you prefer using the package manager console, just run the following command:

Install-Package Microsoft.ML -Version 0.11.0

Now that all prerequisites are in place, we can start doing some machine learning magic.

The Heart Disease Dataset

For the binary classification task, I used the Heart Disease UCI dataset from Kaggle datasets; It contains 14 columns and 303 records.

Heart Disease Dataset Columns (screenshot is taken from Kaggle)
Heart Disease Dataset Columns (screenshot is taken from Kaggle)

Heart Disease UCI

I’ve manually split the CSV file into two files, smaller one for the test data and a larger one for the training data. Below, the test data CSV as an example.

Test Data CSV
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
48 1 1 130 245 0 0 180 0 0.2 1 0 2 1
45 1 0 104 208 0 0 148 1 3 1 0 2 1
53 0 0 130 264 0 0 143 0 0.4 1 0 2 1
39 1 2 140 321 0 0 182 0 0 2 0 2 1
62 0 0 124 209 0 1 163 0 0 2 0 2 1
54 1 2 120 258 0 0 147 0 0.4 1 0 3 1
51 1 2 94 227 0 1 154 1 0 2 1 3 1
29 1 1 130 204 0 0 202 0 0 2 0 2 1
51 1 0 140 261 0 0 186 1 0 2 0 2 1
43 0 2 122 213 0 1 165 0 0.2 1 0 2 1
55 0 1 135 250 0 0 161 0 1.4 1 0 2 1
51 1 2 125 245 1 0 166 0 2.4 1 0 2 1
59 1 1 140 221 0 1 164 1 0 2 0 2 1
52 1 1 128 205 1 1 184 0 0 2 0 2 1
48 1 0 130 256 1 0 150 1 0 2 2 3 0
63 0 0 150 407 0 0 154 0 4 1 3 3 0
55 1 0 140 217 0 1 111 1 5.6 0 0 3 0
65 1 3 138 282 1 0 174 0 1.4 1 1 2 0
56 0 0 200 288 1 0 133 1 4 0 2 3 0

ML.NET provides mapping attributes to model the dataset structure. It means each header is mapped to a LoadColumn attribute as follows:

public class PatientData
{
[LoadColumn(0)]
public float Age { get; set; }
[LoadColumn(1)]
public float Sex { get; set; }
[LoadColumn(2)]
public float Cp { get; set; }
[LoadColumn(3)]
public float TrestBps { get; set; }
[LoadColumn(4)]
public float Chol { get; set; }
[LoadColumn(5)]
public float Fbs { get; set; }
[LoadColumn(6)]
public float RestEcg { get; set; }
[LoadColumn(7)]
public float Thalac { get; set; }
[LoadColumn(8)]
public float Exang { get; set; }
[LoadColumn(9)]
public float OldPeak { get; set; }
[LoadColumn(10)]
public float Slope { get; set; }
[LoadColumn(11)]
public float Ca { get; set; }
[LoadColumn(12)]
public float Thal { get; set; }
[LoadColumn(13)]
public bool Label { get; set; }
}
view raw PatientData.cs hosted with ❤ by GitHub

Training the model

Training a model based on a given dataset requires to define a context as an entry point or as defined by ML.NET API:

The MLContext is a starting point for all ML.NET operations. It is instantiated by the user, provides mechanisms for logging and entry points for training, prediction, model operations, etc.

Afterward, the training and test datasets are loaded using the context, based on the mapped structure.

IDataView trainingDataView = mlContext.Data.LoadFromTextFile<PatientData>(path: GetAbsolutePath(TrainingDataRelativePath), hasHeader: true, separatorChar: ',');
IDataView testDataView = mlContext.Data.LoadFromTextFile<PatientData>(path: GetAbsolutePath(TestDataRelativePath), hasHeader: true, separatorChar: ',');
view raw LoadData.cs hosted with ❤ by GitHub

Transform the data and add a learning algorithm for this task numeric values are assigned to text because only numbers can be processed during model training. In this case, the problem I try to predict is a type of Binary Classification (two classes, has and hasn’t diseased). The selected model is based on Decision Trees because they can be easily interpreted (by humans) as rules, have excellent performance and don’t require any assumptions on the data.

var pipeline = mlContext.Transforms.Concatenate("Features", "Age", "Sex", "Cp", "TrestBps", "Chol", "Fbs",
"RestEcg", "Thalac", "Exang", "OldPeak", "Slope", "Ca", "Thal")
.Append(mlContext.BinaryClassification.Trainers.FastTree());

Eventually, the model is trained by using the fit method:

Console.WriteLine("=============== Training the model ===============");
trainedModel = pipeline.Fit(trainingDataView);
Console.WriteLine("");
Console.WriteLine("");
Console.WriteLine("=============== Finish the train model. Push Enter ===============");
Console.WriteLine("");
Console.WriteLine("");
view raw Fit.cs hosted with ❤ by GitHub

We’ve trained our model with a few steps, as simple as that.

BTW, the framework provides capabilities for saving your model. According to the tutorial it is recommended to save it as a ZIP file.

Testing

Given the trained model, prediction can be made. 

For that purpose, a sample class containing a list of patients is given as input to the model for prediction.

internal class HeartDiseaseSampleData
{
internal static readonly IList<PatientData> Data = new List<PatientData>()
{
new PatientData()
{
Age = 36.0f,
Sex = 1.0f,
Cp = 4.0f,
TrestBps = 145.0f,
Chol = 210.0f,
Fbs = 0.0f,
RestEcg = 2.0f,
Thalac = 148.0f,
Exang = 1.0f,
OldPeak = 1.9f,
Slope = 2.0f,
Ca = 1.0f,
Thal = 7.0f,
},
new PatientData()
{
Age = 95.0f,
Sex = 1.0f,
Cp = 4.0f,
TrestBps = 145.0f,
Chol = 210.0f,
Fbs = 0.0f,
RestEcg = 2.0f,
Thalac = 148.0f,
Exang = 1.0f,
OldPeak = 1.9f,
Slope = 2.0f,
Ca = 1.0f,
Thal = 7.0f,
},
new PatientData()
{
Age = 46.0f,
Sex = 1.0f,
Cp = 4.0f,
TrestBps = 135.0f,
Chol = 192.0f,
Fbs = 0.0f,
RestEcg = 0.0f,
Thalac = 148.0f,
Exang = 0.0f,
OldPeak = 0.3f,
Slope = 2.0f,
Ca = 0.0f,
Thal = 6.0f,
},
new PatientData()
{
Age = 45.0f,
Sex = 0.0f,
Cp = 1.0f,
TrestBps = 140.0f,
Chol = 221.0f,
Fbs = 1.0f,
RestEcg = 1.0f,
Thalac = 150.0f,
Exang = 0.0f,
OldPeak = 2.3f,
Slope = 3.0f,
Ca = 0.0f,
Thal = 6.0f,
},
new PatientData()
{
Age = 88.0f,
Sex = 0.0f,
Cp = 1.0f,
TrestBps = 140.0f,
Chol = 221.0f,
Fbs = 1.0f,
RestEcg = 1.0f,
Thalac = 150.0f,
Exang = 0.0f,
OldPeak = 2.3f,
Slope = 3.0f,
Ca = 0.0f,
Thal = 6.0f,
},
};
}

The result prediction structure shall be defined too, as follows:

public class HeartDiseasePrediction
{
[ColumnName("PredictedLabel")]
public bool Prediction;
}

In the below gist, I load the model from disk, create a prediction engine based on the resulting structure (defined above) and using the engine I predict the probability for heart disease on the list of patients.

private void Test(MLContext mlContext)
{
ITransformer trainedModel;
// Load the model from disk
using (var stream = new FileStream(GetAbsolutePath(ModelRelativePath), FileMode.Open, FileAccess.Read, FileShare.Read))
{
trainedModel = mlContext.Model.Load(stream);
}
// Create a prediction engine based on the result structure (HeartDiseasePrediction)
var predictionEngine = trainedModel.CreatePredictionEngine<PatientData, Structure.HeartDiseasePrediction>(mlContext);
// Go over each patient and predict for disease
foreach (var heartData in HeartDiseaseSampleData.Data)
{
var prediction = predictionEngine.Predict(heartData);
Console.WriteLine($"=============== Single Prediction ===============");
Console.WriteLine($"Age: {heartData.Age} ");
Console.WriteLine($"Sex: {heartData.Sex} ");
Console.WriteLine($"Cp: {heartData.Cp} ");
Console.WriteLine($"TrestBps: {heartData.TrestBps} ");
Console.WriteLine($"Chol: {heartData.Chol} ");
Console.WriteLine($"Fbs: {heartData.Fbs} ");
Console.WriteLine($"RestEcg: {heartData.RestEcg} ");
Console.WriteLine($"Thalac: {heartData.Thalac} ");
Console.WriteLine($"Exang: {heartData.Exang} ");
Console.WriteLine($"OldPeak: {heartData.OldPeak} ");
Console.WriteLine($"Slope: {heartData.Slope} ");
Console.WriteLine($"Ca: {heartData.Ca} ");
Console.WriteLine($"Thal: {heartData.Thal} ");
Console.WriteLine($"Prediction Value: {prediction.Prediction} ");
Console.WriteLine($"Prediction: {(prediction.Prediction ? "A disease could be present" : "Not present disease")} ");
Console.WriteLine($"==================================================");
Console.WriteLine("");
Console.WriteLine("");
}
Console.ReadKey();
}
view raw Test.cs hosted with ❤ by GitHub

Which returns the following (Prediction Output):

=============== Single Prediction ===============
Age: 36
Sex: 1
Cp: 4
TrestBps: 145
Chol: 210
Fbs: 0
RestEcg: 2
Thalac: 148
Exang: 1
OldPeak: 1.9
Slope: 2
Ca: 1
Thal: 7
Prediction Value: True
Prediction: A disease could be present
==================================================
=============== Single Prediction ===============
Age: 95
Sex: 1
Cp: 4
TrestBps: 145
Chol: 210
Fbs: 0
RestEcg: 2
Thalac: 148
Exang: 1
OldPeak: 1.9
Slope: 2
Ca: 1
Thal: 7
Prediction Value: False
Prediction: Not present disease
==================================================
=============== Single Prediction ===============
Age: 46
Sex: 1
Cp: 4
TrestBps: 135
Chol: 192
Fbs: 0
RestEcg: 0
Thalac: 148
Exang: 0
OldPeak: 0.3
Slope: 2
Ca: 0
Thal: 6
Prediction Value: False
Prediction: Not present disease
==================================================
=============== Single Prediction ===============
Age: 45
Sex: 0
Cp: 1
TrestBps: 140
Chol: 221
Fbs: 1
RestEcg: 1
Thalac: 150
Exang: 0
OldPeak: 2.3
Slope: 3
Ca: 0
Thal: 6
Prediction Value: True
Prediction: A disease could be present
==================================================
=============== Single Prediction ===============
Age: 88
Sex: 0
Cp: 1
TrestBps: 140
Chol: 221
Fbs: 1
RestEcg: 1
Thalac: 150
Exang: 0
OldPeak: 2.3
Slope: 3
Ca: 0
Thal: 6
Prediction Value: True
Prediction: A disease could be present
==================================================
view raw output hosted with ❤ by GitHub

In terms of the result accuracy, the code hits 78.95% accuracy, below are more evaluation parameters generated by the model’s Evaluate method:

************************************************************
* Accuracy: 78.95%
* Auc: 91.43%
* Auprc: 96.67%
* F1Score: 84.62%
* LogLoss: .92
* LogLossReduction: -10.94
* PositivePrecision: .92
* PositiveRecall: .79
* NegativePrecision: .57
* NegativeRecall: 80.00%
************************************************************
view raw Evaluation.txt hosted with ❤ by GitHub

Overall, ML.NET has it all, flexible, robust and supported by a big company that provides the engineering vision behind it. I recommend you to try it too, and for sure I’ll use it again soon.

This blog-post was written based on the Heart disease Classification coding sample with some modification; more ML.NET Samples can be found at:

Machine Learning Samples Samples for ML.NET, an open source and cross-platform machine learning framework

Speedy emails, satisfied customers

Postmark Image

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more