Hi friends!
Today we’re diving into something quietly revolutionary: how AI, when personalized, can drastically improve our digital experiences. As developers, many of us use AI tools daily. Often, they feel like magic — you ask a question, and AI responds with context-aware suggestions. Behind the scenes, this magic is powered by Large Language Models (LLMs).
These models are trained on massive datasets and can learn from feedback. But there's a catch: what’s relevant for one person might not be for another. Take email filtering, for example. One user’s spam is another’s important newsletter. General spam filters can’t always adapt to individual preferences.
Wouldn’t it be great to train your own model — one that evolves with your feedback, works offline, and respects your privacy? Thanks to ML.NET, Microsoft’s machine learning library for .NET, you can. In this article, we'll build a personalized spam detector using C# and ML.NET that learns from your emails and improves over time.
Step 1: Create a New Project
Create a simple Console Application:
dotnet new console -n SpamDetector
Step 2: Add Required Packages
Add the following NuGet packages to your project:
- Microsoft.ML
- Microsoft.ML.FastTree
dotnet add package Microsoft.ML
dotnet add package Microsoft.ML.FastTree
Step 3: Prepare the Dataset
Add a TSV (Tab-Separated Values) file for training data. TSV is preferred over CSV since commas can appear in the body of emails. It is needed for the correct parsing of your data. The full dataset can be found in the source code.
Example: email_dataset.tsv
Sender Subject Body IsSpam
reports@company.com Monthly Report Attached is the report for the current month False
meetings@calendar.com Meeting Tomorrow Don't forget about the meeting tomorrow at 10:00 False
hr@company.com Documents Sending the necessary documents False
pm@projecthub.com Project Ready The project is completed and ready for review False
vacations@company.com Vacation Submitting a vacation request False
win@lottery-prize.com Win a Million! You won a million dollars! Click here! True
loans@fastcash-now.com Online Loan Get a loan without documents in 5 minutes True
deals@superdiscounts.com 90% Discount Incredible 90% discount on all products! Hurry! True
help@urgent-finance.org Urgent Help Urgent financial help without refusals True
homejobs@easyprofit.net Earn at Home Make $5000 at home without leaving the house True
promo@freeiphones.com Free iPhone Get a free iPhone right now True
info@national-lottery.ua Lottery Congratulations! You won $1,000,000 in the lottery True
cash@quickmoney.co Quick Money Quick money without checks and certificates True
jobs@dream-career.net Dream Job Dream job with a salary of $100,000 True
ads@miraclepills.org Miracle Pills Lose 20 kg in a week with our pills True
admin@company.com Meeting Tomorrow at 14:00 there is a meeting in the conference room False
support@company.com Tech Support Your request to tech support has been processed False
sales@onlinestore.com Order Your order #12345 is ready for pickup False
schedule@university.edu Schedule New class schedule for next week False
notices@subscription.com Subscription Your subscription expires in 3 days False
Add to Project File
Optionally, you can add the created TSV file to the cproj file for copy to the output directory.
<ItemGroup>
<None Update="email_dataset.tsv">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>
Step 4: Define Data Models
Once you have added the dataset, you should create the model that has the same columns — the attributes needed for detecting columns and their order.
Input Data Model
public class EmailData
{
[LoadColumn(0)] public string Sender { get; set; } = string.Empty;
[LoadColumn(1)] public string Subject { get; set; } = string.Empty;
[LoadColumn(2)] public string Body { get; set; } = string.Empty;
[LoadColumn(3)] public bool IsSpam { get; set; }
}
Another model is needed to show prediction information.
Prediction Output Model
public class SpamPrediction
{
[ColumnName("PredictedLabel")] public bool IsSpam { get; set; }
[ColumnName("Probability")] public float Probability { get; set; }
[ColumnName("Score")] public float Score { get; set; }
}
Step 5: Define File Paths
You need to add the paths for the entry dataset, the trained model, and the user's correction file.
static class Program
{
private const string DataPath = "email_dataset.tsv";
private const string ModelPath = "spam_model.zip";
private const string FeedbackPath = "feedback.tsv";
static void Main() {}
}
Step 6: Create ML Context
This is the entry point for using ML.NET — similar to a DbContext in EF Core.
static void Main()
{
Console.WriteLine("=== System for checking emails for spam ===\n");
var mlContext = new MLContext();
}
Step 7: Build the Pipeline
The pipeline transforms the text in each column into numerical features. These features are then combined into a single feature vector. At the end of the pipeline, we specify the FastTree algorithm as our trainer. FastTree uses this feature vector along with the target label column, IsSpam, which is not included in the feature vector itself.
While there are other available algorithms, FastTree is a well-suited choice for this task. It is a gradient boosting machine (GBM) algorithm designed for binary classification, regression, and ranking problems. In this context, it is used for binary classification — specifically, to distinguish between spam and non-spam emails.
FastTree is optimized for speed and performs well on tabular data. However, it has an important limitation: it requires a dataset with at least 1,000 samples to achieve acceptable accuracy. Smaller datasets may result in poor model performance.
static void Main()
{
Console.WriteLine("=== System for checking emails for spam ===\n");
var mlContext = new MLContext();
var pipeline = BuildPipeline(mlContext);
}
private static IEstimator<ITransformer> BuildPipeline(MLContext mlContext)
{
return mlContext.Transforms.Text
.FeaturizeText("SenderFeatures", nameof(EmailData.Sender))
.Append(mlContext.Transforms.Text.FeaturizeText("SubjectFeatures", nameof(EmailData.Subject)))
.Append(mlContext.Transforms.Text.FeaturizeText("BodyFeatures", nameof(EmailData.Body)))
.Append(mlContext.Transforms.Concatenate("Features", "SenderFeatures", "SubjectFeatures", "BodyFeatures"))
.Append(mlContext.Transforms.NormalizeLpNorm("Features"))
.Append(mlContext.BinaryClassification.Trainers.FastTree(
labelColumnName: nameof(EmailData.IsSpam), featureColumnName: "Features"));
}
Step 8: Train the Model
Step 8.1: Load or Train
Check for a saved model. If none exists, load the dataset or train a new one. We should avoid retraining unless necessary.
if (File.Exists(ModelPath))
...
Step 8.2: Load Datasets
If no trained model is available, we need to load the text data into a DataView
using the correct separator. Failing to do so will result in an error during training.
In addition to the main email dataset, we also load the user's feedback dataset. This is important for scenarios where the trained model has been deleted and needs to be retrained — ensuring that any user-provided corrections are preserved and included in the new model.
var allData = LoadAllData(mlContext);
...
Step 8.3: Split the Data
The test set ratio determines how much of the data is reserved for evaluation.
In general, a smaller test fraction allows for better training, as more data is used to train the model. However, having a test set is essential for measuring the model’s performance objectively. Without it, the model might simply "memorize" the training data, giving a false impression of accuracy.
The test set helps validate how well the model generalizes to unseen data and is used to compute key performance metrics. For small datasets, it's acceptable to allocate less than 20% to testing. Otherwise, a test split of 20–30% is typically recommended.
var split = mlContext.Data.TrainTestSplit(allDataView, testFraction: 0.2);
...
Step 8.4: Train and Evaluate
After data splitting, we use the training set for training the model and the test set for evaluation.
Console.WriteLine("Training model...");
model = pipeline.Fit(split.TrainSet);
Console.WriteLine("Evaluating model...");
var predictions = model.Transform(split.TestSet);
...
Step 8.5: Generate Metrics
After training, we evaluate the model's performance using several key metrics based on its predictions. These include Accuracy, AUC, and F1 Score:
- Accuracy measures the percentage of correct predictions made by the model.
- AUC (Area Under the ROC Curve) indicates how well the model distinguishes between spam and non-spam emails. A value of 1.0 represents perfect classification.
- F1 Score is the harmonic mean of precision and recall, providing a balanced measure of the model’s ability to correctly identify spam while minimizing false positives and false negatives.
var metrics = mlContext.BinaryClassification.Evaluate(predictions, labelColumnName: nameof(EmailData.IsSpam));
...
Step 8.6: Save the Data
Once you have generated the model and metrics, you should save the trained model for further use.
mlContext.Model.Save(model, allDataView.Schema, ModelPath);
...
Optionally, you can copy the model to the project directory. If you did everything properly, you'll see a zip archive in your project.
CopyFileToProjectDirectory(ModelPath);
...
The ultimate code:
static void Main()
{
Console.WriteLine("=== System for checking emails for spam ===\n");
var mlContext = new MLContext();
var pipeline = BuildPipeline(mlContext);
ITransformer model = LoadOrTrainModel(mlContext, pipeline);
}
private static ITransformer LoadOrTrainModel(MLContext mlContext, IEstimator<ITransformer> pipeline)
{
if (File.Exists(ModelPath))
{
Console.WriteLine("Loading saved model...");
return mlContext.Model.Load(ModelPath, out _);
}
Console.WriteLine("The model is not found. Training the new model...");
var allData = LoadAllData(mlContext);
return TrainEvaluateSaveModel(mlContext, pipeline, allData, saveFeedback: false);
}
private static List<EmailData> LoadAllData(MLContext mlContext)
{
IDataView originalData = mlContext.Data.LoadFromTextFile<EmailData>(
DataPath, separatorChar: '\t', hasHeader: true);
var allExamples = mlContext.Data
.CreateEnumerable<EmailData>(originalData, reuseRowObject: false)
.ToList();
if (File.Exists(FeedbackPath))
{
Console.WriteLine("Found feedback data. Including it in training...");
IDataView feedbackData = mlContext.Data.LoadFromTextFile<EmailData>(
FeedbackPath, separatorChar: '\t', hasHeader: false);
var feedbackList = mlContext.Data
.CreateEnumerable<EmailData>(feedbackData, reuseRowObject: false)
.ToList();
allExamples.AddRange(feedbackList);
}
return allExamples;
}
private static ITransformer TrainEvaluateSaveModel(
MLContext mlContext,
IEstimator<ITransformer> pipeline,
List<EmailData> allData,
bool saveFeedback)
{
var allDataView = mlContext.Data.LoadFromEnumerable(allData);
var split = mlContext.Data.TrainTestSplit(allDataView, testFraction: 0.2);
Console.WriteLine("Training model...");
var model = pipeline.Fit(split.TrainSet);
Console.WriteLine("Evaluating model...");
var predictions = model.Transform(split.TestSet);
var metrics = mlContext.BinaryClassification.Evaluate(predictions, labelColumnName: nameof(EmailData.IsSpam));
Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
Console.WriteLine($"AUC: {metrics.AreaUnderRocCurve:P2}");
Console.WriteLine($"F1 Score: {metrics.F1Score:P2}\n");
mlContext.Model.Save(model, allDataView.Schema, ModelPath);
CopyFileToProjectDirectory(ModelPath);
if (saveFeedback)
CopyFileToProjectDirectory(FeedbackPath);
Console.WriteLine($"The model saved to {ModelPath}\n");
return model;
}
private static void CopyFileToProjectDirectory(string fileName)
{
string currentDir = Directory.GetCurrentDirectory();
string projectDir = Path.GetFullPath(Path.Combine(currentDir, "..", "..", ".."));
string sourcePath = Path.Combine(currentDir, fileName);
string destPath = Path.Combine(projectDir, fileName);
File.Copy(sourcePath, destPath, overwrite: true);
}
Step 9: Make Predictions
Create a PredictionEngine
to test individual emails.
static void Main()
{
Console.WriteLine("=== System for checking emails for spam ===\n");
var mlContext = new MLContext();
var pipeline = BuildPipeline(mlContext);
ITransformer model = LoadOrTrainModel(mlContext, pipeline);
var predictionEngine = mlContext.Model.CreatePredictionEngine<EmailData, SpamPrediction>(model);
}
Step 10: Interactive Input & Feedback Loop
This code allows you to input user data and get a prediction of SPAM/NOT SPAM.
static void Main()
{
Console.WriteLine("=== System for checking emails for spam ===\n");
var mlContext = new MLContext();
var pipeline = BuildPipeline(mlContext);
ITransformer model = LoadOrTrainModel(mlContext, pipeline);
var predictionEngine = mlContext.Model.CreatePredictionEngine<EmailData, SpamPrediction>(model);
RunInteractiveCheck(mlContext, pipeline, ref model, ref predictionEngine);
Console.WriteLine("The app completed successfully. Goodbye!");
}
Step 10.1: Correction
From time to time, you may encounter cases where you disagree with the model’s prediction. For example, an important email might be incorrectly marked as spam, and you manually reclassify it as not spam.
We’ve implemented a similar mechanism: users can correct incorrect predictions, and the system will use this feedback to retrain the model and save the updated version. This helps improve accuracy over time by learning from real-world corrections.
var feedback = PromptInput("Do you agree with the result? (y/n): ", toLower: true);
if (feedback == "n")
...
private static string? PromptInput(string message, bool toLower = false)
{
Console.Write(message);
var input = Console.ReadLine();
if (input?.ToLower() == "q") return null;
return toLower ? input?.ToLower() : input;
}
Step 10.2: Save the Feedback
Once you have made the correction, you should save this data for retraining the model in the future.
SaveFeedback(sender, subject, body, userLabel);
...
Step 12: Testing
Let’s run the application and test it.
After launching and exiting the app, you’ll notice that a new model was trained during the first run. If you check the project directory, you’ll find a .zip
file containing the trained model.
Now, let's run the app again.
As you can see, we loaded a pre-trained model.
Next, let’s input some data. The machine learning model predicts that the email is not spam.
However, if a stranger sends you an email offering to sell you an elephant, it's clear the prediction is incorrect — and you'd likely disagree with it.
In this case, we need to correct the prediction. Type "n" to indicate that the email is spam.
As you can see, the model has been updated accordingly. You can also find the user feedback dataset saved in your project directory.
Now, we have created the new rule for this email.
Now, let's repeat the actions and enter the same data again. Now, this email is detected as SPAM.
Conclusion
The ML.NET library is a powerful tool for training custom machine learning models using your own datasets. You can also find a wide variety of high-quality datasets on platforms like Kaggle.com. Unlike popular AI services such as OpenAI or Claude, which often raise privacy concerns, ML.NET keeps all your data securely on your own server.
However, ML.NET does have some challenges. It has a relatively steep learning curve, requiring a solid understanding of machine learning algorithms. You also need to source or create appropriate datasets, and the quality of your trained model heavily depends on the quality of the data you use.
I hope you found this guide helpful and that it encourages you to implement similar solutions in your own projects.
For your convenience, the complete source code is available on my GitHub repository for reference and further exploration.
Top comments (0)