Mate Technologies

Posted on Jan 21

🧠 Building a Topic Modeling & Word Cloud App in Python (Step-by-Step)

#tutorial #python #modeling #beginners

In this tutorial, we’ll build TopicVis, a desktop application that:

Loads multiple text files

Uses LDA topic modeling

Visualizes topics as word clouds

Exports results to CSV

Saves & loads projects

The app is built with Tkinter, ttkbootstrap, and scikit-learn, making it perfect for beginners who want to mix machine learning + GUI apps.

🛠 Prerequisites

Install the required libraries first:

pip install ttkbootstrap tkinterdnd2 scikit-learn wordcloud pillow

1️⃣ Importing Required Libraries

We start by importing everything we need:
GUI tools, ML tools, file handling, and visualization libraries.

import sys, os, json, csv
import tkinter as tk
from tkinter import filedialog, messagebox
import ttkbootstrap as tb
from tkinterdnd2 import TkinterDnD, DND_FILES

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

from wordcloud import WordCloud
from PIL import Image, ImageTk

🔍 What’s happening?

Tkinter / ttkbootstrap → UI

scikit-learn → topic modeling (LDA)

WordCloud + PIL → image generation

json / csv → saving & exporting results

2️⃣ App Metadata (Professional Touch)

Define product-level metadata for versioning and licensing.

APP_NAME = "TopicVis"
APP_VERSION = "2.1"
COMPANY = "Mate Technologies"

LICENSE_TEXT = (
    "This software is licensed for personal, academic, and commercial use.\n\n"
    "THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY."
)

This makes your app feel commercial-ready.

3️⃣ Creating the Main Application Window

Now we initialize the main window using TkinterDnD.

app = TkinterDnD.Tk()
app.title(f"{APP_NAME} v{APP_VERSION}")
app.geometry("1250x720")

tb.Style("darkly")

💡 Why ttkbootstrap?

It gives us modern themes without writing custom CSS.

4️⃣ Global App State

We store app-wide data here.

documents = []
topics_cache = []
wordcloud_images = []

These lists hold:

File paths

Topic modeling results

WordCloud images (to prevent garbage collection)

5️⃣ Utility Dialog Functions

Reusable helpers for showing messages.

def show_error(title, msg):
    messagebox.showerror(title, msg)

def show_info(title, msg):
    messagebox.showinfo(title, msg)

6️⃣ App Header UI

A subtle header for branding.

tb.Label(
    app,
    text="Commercial Topic Modeling & Visualization Tool",
    font=("Segoe UI", 10, "italic"),
    foreground="#AAAAAA"
).pack(fill="x", padx=12, pady=(6, 8))

7️⃣ Model Settings Panel

Users control the topic model here.

row1 = tb.Labelframe(app, text="Model Settings", padding=10)
row1.pack(fill="x", padx=10)

Reusable Input Field Helper

def field(parent, label, default, width=10):
    tb.Label(parent, text=label, width=14, anchor="w").pack(side="left")
    entry = tb.Entry(parent, width=width)
    entry.insert(0, default)
    entry.pack(side="left", padx=(0, 8))
    return entry

Fields

num_topics = field(row1, "Num Topics", "5", 6)
max_words  = field(row1, "Words / Topic", "10", 6)
theme_name = field(row1, "Theme Name", "superhero", 16)

8️⃣ Action Buttons

Buttons for loading files, running the model, and exporting data.

row2 = tb.Labelframe(app, text="Actions", padding=10)
row2.pack(fill="x", padx=10, pady=6)

Add Text Files

tb.Button(
    row2,
    text="Add Text Files",
    command=lambda: add_documents(
        filedialog.askopenfilenames(filetypes=[("Text Files","*.txt")])
    )
).pack(side="left", padx=4)

Run Model Button

tb.Button(
    row2,
    text="RUN MODEL",
    bootstyle="success",
    command=lambda: run_model()
).pack(side="right", padx=6)

9️⃣ Scrollable Word Cloud Gallery

This section displays generated topic visuals.

gallery = tb.Labelframe(app, text="Topic Word Clouds", padding=10)
gallery.pack(fill="both", expand=True, padx=10)

Canvas + Scrollbar Setup

canvas = tk.Canvas(gallery)
scroll = tk.Scrollbar(gallery, command=canvas.yview)
frame = tk.Frame(canvas)

canvas.create_window((0,0), window=frame, anchor="nw")
canvas.configure(yscrollcommand=scroll.set)

This lets us show many topics without resizing the window.

🔟 Loading Documents

def add_documents(paths):
    for p in app.tk.splitlist(paths):
        if p not in documents:
            documents.append(p)

Each selected .txt file is added once.

1️⃣1️⃣ Running Topic Modeling (Core Logic)
Read Text Files

texts = [open(p, encoding="utf-8").read() for p in documents]

Convert Text → Numbers

vectorizer = CountVectorizer(stop_words="english")
X = vectorizer.fit_transform(texts)

Train LDA Model

lda = LatentDirichletAllocation(
    n_components=int(num_topics.get()),
    random_state=42
)
lda.fit(X)

1️⃣2️⃣ Generating Word Clouds

for idx, topic in enumerate(lda.components_):
    words = {
        vectorizer.get_feature_names_out()[i]: topic[i]
        for i in topic.argsort()[-int(max_words.get()):]
    }

Create WordCloud Image

wc = WordCloud(
    width=420,
    height=300,
    background_color="white"
).generate_from_frequencies(words)

Display in UI

img = wc.to_image()
img.thumbnail((260, 200))
tk_img = ImageTk.PhotoImage(img)

lbl = tb.Label(
    frame,
    image=tk_img,
    text=f"Topic {idx+1}",
    compound="top"
)
lbl.grid(row=idx//4, column=idx%4, padx=6, pady=6)

1️⃣3️⃣ Export Topics to CSV

def export_csv():
    path = filedialog.asksaveasfilename(defaultextension=".csv")
    if not path: return

    with open(path, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow(["Topic", "Word", "Weight"])

        for i, topic in enumerate(topics_cache, 1):
            for word, weight in topic.items():
                writer.writerow([i, word, round(weight, 4)])

1️⃣4️⃣ Save & Load Projects
Save

json.dump({
    "documents": documents,
    "theme": theme_name.get(),
    "topics": topics_cache
}, open(path, "w", encoding="utf-8"), indent=2)

Load

data = json.load(open(path, encoding="utf-8"))
documents[:] = data.get("documents", [])

1️⃣5️⃣ About & License Window

def show_about():
    win = tb.Toplevel(app)
    win.title("About")
    win.geometry("560x460")

This adds professional polish and licensing clarity.

🚀 Final Step: Run the App

app.mainloop()

🎉 What You Built

✅ A real desktop ML app
✅ Topic modeling with LDA
✅ Visual word clouds
✅ CSV export
✅ Save/load projects

DEV Community

🧠 Building a Topic Modeling & Word Cloud App in Python (Step-by-Step)

Top comments (0)