Mate Technologies

Posted on Jan 28

Build a Python SMS Spam Classifier with SpamShield v3.1 🚀

#pandas #python #programming #tutorial

Ever wondered how AI can help you detect spam messages? In this tutorial, we’ll build SpamShield v3.1, a Python app that classifies SMS messages as SPAM or HAM using machine learning. Even if you’re a beginner, you’ll be able to follow along!

💻 Project on GitHub: SpamShield v3.1

Step 1: Setting Up the Project

First, create a new folder for your project and install the required Python libraries. Open your terminal and run:

pip install pandas scikit-learn joblib ttkbootstrap
pip install tkinterdnd2  # Optional: Enables drag & drop in the GUI

pandas: Handles CSV/TXT data.

scikit-learn: Provides machine learning tools.

joblib: Saves and loads trained models.

ttkbootstrap: Makes your GUI look modern.

tkinterdnd2: Adds drag-and-drop support (optional).

Step 2: Download the SMS Spam Dataset Automatically

We’ll use the SMSSpamCollection dataset from the UCI repository. The script downloads it automatically if it’s missing.

import urllib.request
import zipfile
import os
import sys

def resource_path(file_name):
    base_path = getattr(sys, "_MEIPASS", os.path.dirname(os.path.abspath(__file__)))
    return os.path.join(base_path, file_name)

def download_dataset():
    url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip"
    zip_path = resource_path("smsspamcollection.zip")

    urllib.request.urlretrieve(url, zip_path)

    with zipfile.ZipFile(zip_path, 'r') as z:
        z.extractall(resource_path(""))

    os.remove(zip_path)
    print("[INFO] Dataset downloaded successfully!")

This ensures that even if the dataset is missing, the app will fetch it automatically.

Step 3: Train the Machine Learning Model

We’ll use Naive Bayes with TF-IDF vectorization to classify SMS messages.

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import joblib

def train_sms_model():
    ds_path = resource_path("SMSSpamCollection")

    if not os.path.exists(ds_path):
        download_dataset()

    df = pd.read_csv(ds_path, sep="\t", header=None, names=["label", "text"])
    df["label_num"] = df["label"].map({"ham": 0, "spam": 1})

    X_train, X_test, y_train, y_test = train_test_split(df["text"], df["label_num"], test_size=0.2, random_state=42)

    model = make_pipeline(TfidfVectorizer(), MultinomialNB())
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    print(f"[INFO] Model trained — Test Accuracy: {accuracy_score(y_test, y_pred)*100:.2f}%")

    joblib.dump(model, resource_path("sms_spam_model.pkl"))
    return model

Tip: The TF-IDF vectorizer converts text into numbers, and Naive Bayes predicts whether a message is spam.

Step 4: Load the Model

We’ll create a helper function to load the model if it already exists, otherwise, it trains a new one.

def load_model():
    model_path = resource_path("sms_spam_model.pkl")
    if os.path.exists(model_path):
        return joblib.load(model_path)
    return train_sms_model()

Step 5: Create a Worker to Process SMS Files

For batch classification, we’ll build a SpamWorker class that reads CSV/TXT files and labels messages.

import csv

class SpamWorker:
    def __init__(self, files, model):
        self.files = files
        self.model = model

    def run(self):
        for path in self.files:
            with open(path, newline="", encoding="utf-8", errors="ignore") as f:
                reader = csv.reader(f)
                texts = [row[0].strip() for row in reader if row]

                labels_num = self.model.predict(texts)
                labels = ["SPAM" if l == 1 else "HAM" for l in labels_num]

                for t, lbl in zip(texts, labels):
                    print(f"{lbl} | {t}")

This prints each SMS with its predicted label. Later, we’ll connect it to a GUI for a better user experience.

Step 6: Build a GUI with Tkinter

We’ll use ttkbootstrap for styling. This allows drag-and-drop support and batch processing.

import ttkbootstrap as tb
from tkinter import filedialog

class SpamShieldApp:
    def __init__(self):
        self.root = tb.Window(themename="darkly")
        self.root.title("SpamShield v3.1")
        self.model = load_model()
        self.files = []

        self.build_ui()

    def build_ui(self):
        tb.Label(self.root, text="📩 SpamShield - AI SMS Detector", font=("Segoe UI", 22, "bold")).pack(pady=10)

        self.path_input = tb.Entry(self.root, width=80)
        self.path_input.pack(pady=5)

        tb.Button(self.root, text="📂 Browse Files", bootstyle="info", command=self.browse_files).pack(pady=5)
        tb.Button(self.root, text="🚀 Start Classification", bootstyle="success", command=self.start).pack(pady=5)

    def browse_files(self):
        self.files = filedialog.askopenfilenames(filetypes=[("CSV Files","*.csv"), ("Text Files","*.txt")])
        self.path_input.delete(0, "end")
        self.path_input.insert(0, f"{len(self.files)} files selected")

    def start(self):
        worker = SpamWorker(self.files, self.model)
        worker.run()

    def run(self):
        self.root.mainloop()

The GUI lets users select files and classify messages with one click.

Step 7: Run the App

Finally, add the main section to run your app:

if __name__ == "__main__":
    app = SpamShieldApp()
    app.run()

Now you have a fully functional SMS spam classifier with AI-powered detection and a modern GUI!

✅ What You Learned

Downloading datasets programmatically

Building a machine learning pipeline with TF-IDF + Naive Bayes

Saving/loading ML models with joblib

Creating a GUI for batch processing

Classifying SMS messages as SPAM or HAM

💻 Check out the full code and files here:
https://github.com/rogers-cyber/python-tiny-tools/tree/main/SMS-spam-classifier-app

Python #MachineLearning #SMSClassifier #DataScience #AI #SpamDetection #OpenSource #PythonProjects #DevTutorial

DEV Community

Build a Python SMS Spam Classifier with SpamShield v3.1 🚀

Python #MachineLearning #SMSClassifier #DataScience #AI #SpamDetection #OpenSource #PythonProjects #DevTutorial

Top comments (0)