A beginner-friendly walkthrough of SmartExcelGuardian โ a professional Excel data cleaning and reporting tool built with Python.
What youโll build:
A desktop GUI app for Excel cleanup
Automatic missing value handling
Duplicate detection
Heuristic โdata healthโ scoring
Export results to Excel, PDF, JSON, and TXT
GitHub repo (full script):
๐ https://github.com/rogers-cyber/SmartExcelGuardian
๐งฐ Prerequisites
You should have:
Python 3.9+
Basic Python knowledge
Installed packages:
pip install pandas numpy openpyxl ttkbootstrap reportlab
๐ Project Structure
SmartExcelGuardian/
โโโ main.py
โโโ logo.ico
โโโ excelguardian.log
1๏ธโฃ Import Required Libraries
We start by importing everything we need.
import os, sys, threading, json
import tkinter as tk
from tkinter import filedialog
import ttkbootstrap as tb
from ttkbootstrap.constants import *
from datetime import datetime
Why?
tkinter โ GUI foundation
ttkbootstrap โ modern dark UI
threading โ keep UI responsive during cleanup
Data & Excel Libraries
import pandas as pd
import numpy as np
import re
from openpyxl import Workbook
from openpyxl.styles import PatternFill, Font
from openpyxl.utils.dataframe import dataframe_to_rows
These handle:
Data cleaning (pandas)
Excel export & formatting (openpyxl)
Column name normalization (re)
PDF Export Tools
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import mm
from reportlab.lib.colors import red, orange, green, black
Used for generating professional PDF reports.
2๏ธโฃ Global State & Logging
stop_event = threading.Event()
cleanup_results = {}
log_file = os.path.join(os.getcwd(), "excelguardian.log")
Purpose
stop_event โ allows canceling cleanup
cleanup_results โ shared export data
log_file โ error tracking
3๏ธโฃ Utility Helper Functions
Resource Loader (for packaged apps)
def resource_path(file_name):
base_path = getattr(sys, "_MEIPASS", os.path.dirname(os.path.abspath(__file__)))
return os.path.join(base_path, file_name)
This helps when bundling with PyInstaller.
Column Name Cleaner
def clean_column_name(name):
name = name.strip().lower()
name = re.sub(r"[^\w\s]", "", name)
name = re.sub(r"\s+", "_", name)
return name
Example:
Original Cleaned
Total Sales ($) total_sales
NumPy โ JSON Converter
def convert_numpy(obj):
if isinstance(obj, np.integer): return int(obj)
if isinstance(obj, np.floating): return float(obj)
if isinstance(obj, np.ndarray): return obj.tolist()
raise TypeError
Needed because JSON canโt serialize NumPy types.
4๏ธโฃ Creating the Main Window
app = tb.Window(themename="darkly")
app.title("SmartExcelGuardian v1.1.0")
app.geometry("1100x650")
Why ttkbootstrap?
Modern styling
Dark mode support
Responsive layouts
5๏ธโฃ Title Section
tb.Label(app, text="SmartExcelGuardian",
font=("Segoe UI", 22, "bold")).pack(pady=(10, 2))
tb.Label(app, text="Professional Excel Data Guardian Tool",
font=("Segoe UI", 10, "italic"),
foreground="#9ca3af").pack(pady=(0, 8))
This creates the app header.
6๏ธโฃ Excel File Selector
file_path = tk.StringVar()
tb.Entry(row1, textvariable=file_path, width=60).pack(side="left", padx=6)
tb.Button(
row1,
text="๐ Excel File",
command=lambda: file_path.set(
filedialog.askopenfilename(filetypes=[("Excel Files", "*.xlsx *.xls")])
)
).pack(side="left")
This lets users choose their Excel workbook.
7๏ธโฃ Cleanup Control Buttons
start_btn = tb.Button(row2, text="๐ก CLEAN DATA", bootstyle="success")
stop_btn = tb.Button(row2, text="๐ STOP", bootstyle="danger-outline", state="disabled")
CLEAN DATA โ starts background thread
STOP โ safely halts processing
8๏ธโฃ Results Table (TreeView)
cols = (
"column", "original_type", "suggested_type",
"cleaned_type", "missing_values",
"duplicates_detected", "heuristic_score",
"rename_suggestion"
)
tree = tb.Treeview(row3, columns=cols, show="headings")
This displays column-by-column health analysis.
9๏ธโฃ Heuristic Scoring System
def heuristic_score(missing, duplicates, type_issue):
score = 0
score += min(30, missing * 2)
score += min(30, duplicates * 2)
score += 40 if type_issue else 0
return min(score, 100)
Score Meaning
Score Risk
0โ30 Healthy ๐ข
31โ70 Moderate ๐
71โ100 High Risk ๐ด
๐ Data Cleaning Engine
def assess_and_clean(df):
for col in df.columns:
series = df[col]
Numeric Columns
coerced = pd.to_numeric(series, errors="coerce")
cleaned_series = coerced.fillna(coerced.mean())
โ Converts text โ numbers
โ Fills missing values with mean
Text Columns
cleaned_series = series.astype("string").fillna(series.mode()[0])
โ Converts to strings
โ Fills missing values with most common value
1๏ธโฃ1๏ธโฃ Threaded Cleanup Execution
threading.Thread(
target=run_cleanup,
daemon=True
).start()
Why threading?
Keeps UI responsive
Prevents freezing on large Excel files
1๏ธโฃ2๏ธโฃ Excel Export with Formulas
sum_formula = f"=SUM(A2:A{ws.max_row})"
mean_formula = f"=AVERAGE(A2:A{ws.max_row})"
Automatically adds:
SUM
AVERAGE
to numeric columns.
1๏ธโฃ3๏ธโฃ Conditional Formatting
fill = PatternFill(start_color="FF9999", fill_type="solid")
cell.font = Font(bold=True)
High-risk columns are:
Highlighted ๐ด
Bolded for visibility
1๏ธโฃ4๏ธโฃ PDF Report Export
def score_color(score):
if score >= 71: return red
elif score >= 31: return orange
else: return green
Creates a multi-page PDF audit report with:
Color-coded scores
Column summaries
Page numbers
1๏ธโฃ5๏ธโฃ About & Help Window
tb.Label(frame, text="How to Use", font=("Segoe UI", 12, "bold"))
Provides:
Feature overview
Usage steps
Developer info
๐ Final Result
You now have:
A professional Excel cleaner
A desktop GUI
A heuristic scoring system
Multi-format export
๐ Next Improvements
Add per-sheet selection
Add charts (data health trends)
Save user presets
Package as .exe
๐ Full Source Code

Top comments (0)