A beginner-friendly walkthrough of SmartExcelGuardian β a professional Excel data cleaning and reporting tool built with Python.
What youβll build:
A desktop GUI app for Excel cleanup
Automatic missing value handling
Duplicate detection
Heuristic βdata healthβ scoring
Export results to Excel, PDF, JSON, and TXT
GitHub repo (full script):
π https://github.com/rogers-cyber/SmartExcelGuardian
π§° Prerequisites
You should have:
Python 3.9+
Basic Python knowledge
Installed packages:
pip install pandas numpy openpyxl ttkbootstrap reportlab
π Project Structure
SmartExcelGuardian/
βββ main.py
βββ logo.ico
βββ excelguardian.log
1οΈβ£ Import Required Libraries
We start by importing everything we need.
import os, sys, threading, json
import tkinter as tk
from tkinter import filedialog
import ttkbootstrap as tb
from ttkbootstrap.constants import *
from datetime import datetime
Why?
tkinter β GUI foundation
ttkbootstrap β modern dark UI
threading β keep UI responsive during cleanup
Data & Excel Libraries
import pandas as pd
import numpy as np
import re
from openpyxl import Workbook
from openpyxl.styles import PatternFill, Font
from openpyxl.utils.dataframe import dataframe_to_rows
These handle:
Data cleaning (pandas)
Excel export & formatting (openpyxl)
Column name normalization (re)
PDF Export Tools
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import mm
from reportlab.lib.colors import red, orange, green, black
Used for generating professional PDF reports.
2οΈβ£ Global State & Logging
stop_event = threading.Event()
cleanup_results = {}
log_file = os.path.join(os.getcwd(), "excelguardian.log")
Purpose
stop_event β allows canceling cleanup
cleanup_results β shared export data
log_file β error tracking
3οΈβ£ Utility Helper Functions
Resource Loader (for packaged apps)
def resource_path(file_name):
base_path = getattr(sys, "_MEIPASS", os.path.dirname(os.path.abspath(__file__)))
return os.path.join(base_path, file_name)
This helps when bundling with PyInstaller.
Column Name Cleaner
def clean_column_name(name):
name = name.strip().lower()
name = re.sub(r"[^\w\s]", "", name)
name = re.sub(r"\s+", "_", name)
return name
Example:
Original Cleaned
Total Sales ($) total_sales
NumPy β JSON Converter
def convert_numpy(obj):
if isinstance(obj, np.integer): return int(obj)
if isinstance(obj, np.floating): return float(obj)
if isinstance(obj, np.ndarray): return obj.tolist()
raise TypeError
Needed because JSON canβt serialize NumPy types.
4οΈβ£ Creating the Main Window
app = tb.Window(themename="darkly")
app.title("SmartExcelGuardian v1.1.0")
app.geometry("1100x650")
Why ttkbootstrap?
Modern styling
Dark mode support
Responsive layouts
5οΈβ£ Title Section
tb.Label(app, text="SmartExcelGuardian",
font=("Segoe UI", 22, "bold")).pack(pady=(10, 2))
tb.Label(app, text="Professional Excel Data Guardian Tool",
font=("Segoe UI", 10, "italic"),
foreground="#9ca3af").pack(pady=(0, 8))
This creates the app header.
6οΈβ£ Excel File Selector
file_path = tk.StringVar()
tb.Entry(row1, textvariable=file_path, width=60).pack(side="left", padx=6)
tb.Button(
row1,
text="π Excel File",
command=lambda: file_path.set(
filedialog.askopenfilename(filetypes=[("Excel Files", "*.xlsx *.xls")])
)
).pack(side="left")
This lets users choose their Excel workbook.
7οΈβ£ Cleanup Control Buttons
start_btn = tb.Button(row2, text="π‘ CLEAN DATA", bootstyle="success")
stop_btn = tb.Button(row2, text="π STOP", bootstyle="danger-outline", state="disabled")
CLEAN DATA β starts background thread
STOP β safely halts processing
8οΈβ£ Results Table (TreeView)
cols = (
"column", "original_type", "suggested_type",
"cleaned_type", "missing_values",
"duplicates_detected", "heuristic_score",
"rename_suggestion"
)
tree = tb.Treeview(row3, columns=cols, show="headings")
This displays column-by-column health analysis.
9οΈβ£ Heuristic Scoring System
def heuristic_score(missing, duplicates, type_issue):
score = 0
score += min(30, missing * 2)
score += min(30, duplicates * 2)
score += 40 if type_issue else 0
return min(score, 100)
Score Meaning
Score Risk
0β30 Healthy π’
31β70 Moderate π
71β100 High Risk π΄
π Data Cleaning Engine
def assess_and_clean(df):
for col in df.columns:
series = df[col]
Numeric Columns
coerced = pd.to_numeric(series, errors="coerce")
cleaned_series = coerced.fillna(coerced.mean())
β Converts text β numbers
β Fills missing values with mean
Text Columns
cleaned_series = series.astype("string").fillna(series.mode()[0])
β Converts to strings
β Fills missing values with most common value
1οΈβ£1οΈβ£ Threaded Cleanup Execution
threading.Thread(
target=run_cleanup,
daemon=True
).start()
Why threading?
Keeps UI responsive
Prevents freezing on large Excel files
1οΈβ£2οΈβ£ Excel Export with Formulas
sum_formula = f"=SUM(A2:A{ws.max_row})"
mean_formula = f"=AVERAGE(A2:A{ws.max_row})"
Automatically adds:
SUM
AVERAGE
to numeric columns.
1οΈβ£3οΈβ£ Conditional Formatting
fill = PatternFill(start_color="FF9999", fill_type="solid")
cell.font = Font(bold=True)
High-risk columns are:
Highlighted π΄
Bolded for visibility
1οΈβ£4οΈβ£ PDF Report Export
def score_color(score):
if score >= 71: return red
elif score >= 31: return orange
else: return green
Creates a multi-page PDF audit report with:
Color-coded scores
Column summaries
Page numbers
1οΈβ£5οΈβ£ About & Help Window
tb.Label(frame, text="How to Use", font=("Segoe UI", 12, "bold"))
Provides:
Feature overview
Usage steps
Developer info
π Final Result
You now have:
A professional Excel cleaner
A desktop GUI
A heuristic scoring system
Multi-format export
π Next Improvements
Add per-sheet selection
Add charts (data health trends)
Save user presets
Package as .exe
π Full Source Code

Top comments (0)