Ever wondered how AI can help you detect spam messages? In this tutorial, we’ll build SpamShield v3.1, a Python app that classifies SMS messages as SPAM or HAM using machine learning. Even if you’re a beginner, you’ll be able to follow along!
💻 Project on GitHub: SpamShield v3.1
Step 1: Setting Up the Project
First, create a new folder for your project and install the required Python libraries. Open your terminal and run:
pip install pandas scikit-learn joblib ttkbootstrap
pip install tkinterdnd2 # Optional: Enables drag & drop in the GUI
pandas: Handles CSV/TXT data.
scikit-learn: Provides machine learning tools.
joblib: Saves and loads trained models.
ttkbootstrap: Makes your GUI look modern.
tkinterdnd2: Adds drag-and-drop support (optional).
Step 2: Download the SMS Spam Dataset Automatically
We’ll use the SMSSpamCollection dataset from the UCI repository. The script downloads it automatically if it’s missing.
import urllib.request
import zipfile
import os
import sys
def resource_path(file_name):
base_path = getattr(sys, "_MEIPASS", os.path.dirname(os.path.abspath(__file__)))
return os.path.join(base_path, file_name)
def download_dataset():
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip"
zip_path = resource_path("smsspamcollection.zip")
urllib.request.urlretrieve(url, zip_path)
with zipfile.ZipFile(zip_path, 'r') as z:
z.extractall(resource_path(""))
os.remove(zip_path)
print("[INFO] Dataset downloaded successfully!")
This ensures that even if the dataset is missing, the app will fetch it automatically.
Step 3: Train the Machine Learning Model
We’ll use Naive Bayes with TF-IDF vectorization to classify SMS messages.
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import joblib
def train_sms_model():
ds_path = resource_path("SMSSpamCollection")
if not os.path.exists(ds_path):
download_dataset()
df = pd.read_csv(ds_path, sep="\t", header=None, names=["label", "text"])
df["label_num"] = df["label"].map({"ham": 0, "spam": 1})
X_train, X_test, y_train, y_test = train_test_split(df["text"], df["label_num"], test_size=0.2, random_state=42)
model = make_pipeline(TfidfVectorizer(), MultinomialNB())
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"[INFO] Model trained — Test Accuracy: {accuracy_score(y_test, y_pred)*100:.2f}%")
joblib.dump(model, resource_path("sms_spam_model.pkl"))
return model
Tip: The TF-IDF vectorizer converts text into numbers, and Naive Bayes predicts whether a message is spam.
Step 4: Load the Model
We’ll create a helper function to load the model if it already exists, otherwise, it trains a new one.
def load_model():
model_path = resource_path("sms_spam_model.pkl")
if os.path.exists(model_path):
return joblib.load(model_path)
return train_sms_model()
Step 5: Create a Worker to Process SMS Files
For batch classification, we’ll build a SpamWorker class that reads CSV/TXT files and labels messages.
import csv
class SpamWorker:
def __init__(self, files, model):
self.files = files
self.model = model
def run(self):
for path in self.files:
with open(path, newline="", encoding="utf-8", errors="ignore") as f:
reader = csv.reader(f)
texts = [row[0].strip() for row in reader if row]
labels_num = self.model.predict(texts)
labels = ["SPAM" if l == 1 else "HAM" for l in labels_num]
for t, lbl in zip(texts, labels):
print(f"{lbl} | {t}")
This prints each SMS with its predicted label. Later, we’ll connect it to a GUI for a better user experience.
Step 6: Build a GUI with Tkinter
We’ll use ttkbootstrap for styling. This allows drag-and-drop support and batch processing.
import ttkbootstrap as tb
from tkinter import filedialog
class SpamShieldApp:
def __init__(self):
self.root = tb.Window(themename="darkly")
self.root.title("SpamShield v3.1")
self.model = load_model()
self.files = []
self.build_ui()
def build_ui(self):
tb.Label(self.root, text="📩 SpamShield - AI SMS Detector", font=("Segoe UI", 22, "bold")).pack(pady=10)
self.path_input = tb.Entry(self.root, width=80)
self.path_input.pack(pady=5)
tb.Button(self.root, text="📂 Browse Files", bootstyle="info", command=self.browse_files).pack(pady=5)
tb.Button(self.root, text="🚀 Start Classification", bootstyle="success", command=self.start).pack(pady=5)
def browse_files(self):
self.files = filedialog.askopenfilenames(filetypes=[("CSV Files","*.csv"), ("Text Files","*.txt")])
self.path_input.delete(0, "end")
self.path_input.insert(0, f"{len(self.files)} files selected")
def start(self):
worker = SpamWorker(self.files, self.model)
worker.run()
def run(self):
self.root.mainloop()
The GUI lets users select files and classify messages with one click.
Step 7: Run the App
Finally, add the main section to run your app:
if __name__ == "__main__":
app = SpamShieldApp()
app.run()
Now you have a fully functional SMS spam classifier with AI-powered detection and a modern GUI!
✅ What You Learned
Downloading datasets programmatically
Building a machine learning pipeline with TF-IDF + Naive Bayes
Saving/loading ML models with joblib
Creating a GUI for batch processing
Classifying SMS messages as SPAM or HAM
💻 Check out the full code and files here:
https://github.com/rogers-cyber/python-tiny-tools/tree/main/SMS-spam-classifier-app

Top comments (0)