DEV Community

Mate Technologies
Mate Technologies

Posted on

🧠 Building a Topic Modeling & Word Cloud App in Python (Step-by-Step)

In this tutorial, we’ll build TopicVis, a desktop application that:

Loads multiple text files

Uses LDA topic modeling

Visualizes topics as word clouds

Exports results to CSV

Saves & loads projects

The app is built with Tkinter, ttkbootstrap, and scikit-learn, making it perfect for beginners who want to mix machine learning + GUI apps.

🛠 Prerequisites

Install the required libraries first:

pip install ttkbootstrap tkinterdnd2 scikit-learn wordcloud pillow

1️⃣ Importing Required Libraries

We start by importing everything we need:
GUI tools, ML tools, file handling, and visualization libraries.

import sys, os, json, csv
import tkinter as tk
from tkinter import filedialog, messagebox
import ttkbootstrap as tb
from tkinterdnd2 import TkinterDnD, DND_FILES

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

from wordcloud import WordCloud
from PIL import Image, ImageTk
Enter fullscreen mode Exit fullscreen mode

🔍 What’s happening?

Tkinter / ttkbootstrap → UI

scikit-learn → topic modeling (LDA)

WordCloud + PIL → image generation

json / csv → saving & exporting results

2️⃣ App Metadata (Professional Touch)

Define product-level metadata for versioning and licensing.

APP_NAME = "TopicVis"
APP_VERSION = "2.1"
COMPANY = "Mate Technologies"

LICENSE_TEXT = (
    "This software is licensed for personal, academic, and commercial use.\n\n"
    "THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY."
)
Enter fullscreen mode Exit fullscreen mode

This makes your app feel commercial-ready.

3️⃣ Creating the Main Application Window

Now we initialize the main window using TkinterDnD.

app = TkinterDnD.Tk()
app.title(f"{APP_NAME} v{APP_VERSION}")
app.geometry("1250x720")
Enter fullscreen mode Exit fullscreen mode

tb.Style("darkly")

💡 Why ttkbootstrap?

It gives us modern themes without writing custom CSS.

4️⃣ Global App State

We store app-wide data here.

documents = []
topics_cache = []
wordcloud_images = []
Enter fullscreen mode Exit fullscreen mode

These lists hold:

File paths

Topic modeling results

WordCloud images (to prevent garbage collection)

5️⃣ Utility Dialog Functions

Reusable helpers for showing messages.

def show_error(title, msg):
    messagebox.showerror(title, msg)

def show_info(title, msg):
    messagebox.showinfo(title, msg)
Enter fullscreen mode Exit fullscreen mode

6️⃣ App Header UI

A subtle header for branding.

tb.Label(
    app,
    text="Commercial Topic Modeling & Visualization Tool",
    font=("Segoe UI", 10, "italic"),
    foreground="#AAAAAA"
).pack(fill="x", padx=12, pady=(6, 8))
Enter fullscreen mode Exit fullscreen mode

7️⃣ Model Settings Panel

Users control the topic model here.

row1 = tb.Labelframe(app, text="Model Settings", padding=10)
row1.pack(fill="x", padx=10)
Enter fullscreen mode Exit fullscreen mode

Reusable Input Field Helper

def field(parent, label, default, width=10):
    tb.Label(parent, text=label, width=14, anchor="w").pack(side="left")
    entry = tb.Entry(parent, width=width)
    entry.insert(0, default)
    entry.pack(side="left", padx=(0, 8))
    return entry
Enter fullscreen mode Exit fullscreen mode

Fields

num_topics = field(row1, "Num Topics", "5", 6)
max_words  = field(row1, "Words / Topic", "10", 6)
theme_name = field(row1, "Theme Name", "superhero", 16)
Enter fullscreen mode Exit fullscreen mode

8️⃣ Action Buttons

Buttons for loading files, running the model, and exporting data.

row2 = tb.Labelframe(app, text="Actions", padding=10)
row2.pack(fill="x", padx=10, pady=6)
Enter fullscreen mode Exit fullscreen mode

Add Text Files

tb.Button(
    row2,
    text="Add Text Files",
    command=lambda: add_documents(
        filedialog.askopenfilenames(filetypes=[("Text Files","*.txt")])
    )
).pack(side="left", padx=4)
Enter fullscreen mode Exit fullscreen mode

Run Model Button

tb.Button(
    row2,
    text="RUN MODEL",
    bootstyle="success",
    command=lambda: run_model()
).pack(side="right", padx=6)
Enter fullscreen mode Exit fullscreen mode

9️⃣ Scrollable Word Cloud Gallery

This section displays generated topic visuals.

gallery = tb.Labelframe(app, text="Topic Word Clouds", padding=10)
gallery.pack(fill="both", expand=True, padx=10)
Enter fullscreen mode Exit fullscreen mode

Canvas + Scrollbar Setup

canvas = tk.Canvas(gallery)
scroll = tk.Scrollbar(gallery, command=canvas.yview)
frame = tk.Frame(canvas)

canvas.create_window((0,0), window=frame, anchor="nw")
canvas.configure(yscrollcommand=scroll.set)
Enter fullscreen mode Exit fullscreen mode

This lets us show many topics without resizing the window.

🔟 Loading Documents

def add_documents(paths):
    for p in app.tk.splitlist(paths):
        if p not in documents:
            documents.append(p)
Enter fullscreen mode Exit fullscreen mode

Each selected .txt file is added once.

1️⃣1️⃣ Running Topic Modeling (Core Logic)
Read Text Files

texts = [open(p, encoding="utf-8").read() for p in documents]
Enter fullscreen mode Exit fullscreen mode

Convert Text → Numbers

vectorizer = CountVectorizer(stop_words="english")
X = vectorizer.fit_transform(texts)
Enter fullscreen mode Exit fullscreen mode

Train LDA Model

lda = LatentDirichletAllocation(
    n_components=int(num_topics.get()),
    random_state=42
)
lda.fit(X)
Enter fullscreen mode Exit fullscreen mode

1️⃣2️⃣ Generating Word Clouds

for idx, topic in enumerate(lda.components_):
    words = {
        vectorizer.get_feature_names_out()[i]: topic[i]
        for i in topic.argsort()[-int(max_words.get()):]
    }
Enter fullscreen mode Exit fullscreen mode

Create WordCloud Image

wc = WordCloud(
    width=420,
    height=300,
    background_color="white"
).generate_from_frequencies(words)
Enter fullscreen mode Exit fullscreen mode

Display in UI

img = wc.to_image()
img.thumbnail((260, 200))
tk_img = ImageTk.PhotoImage(img)

lbl = tb.Label(
    frame,
    image=tk_img,
    text=f"Topic {idx+1}",
    compound="top"
)
lbl.grid(row=idx//4, column=idx%4, padx=6, pady=6)
Enter fullscreen mode Exit fullscreen mode

1️⃣3️⃣ Export Topics to CSV

def export_csv():
    path = filedialog.asksaveasfilename(defaultextension=".csv")
    if not path: return

    with open(path, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow(["Topic", "Word", "Weight"])

        for i, topic in enumerate(topics_cache, 1):
            for word, weight in topic.items():
                writer.writerow([i, word, round(weight, 4)])
Enter fullscreen mode Exit fullscreen mode

1️⃣4️⃣ Save & Load Projects
Save

json.dump({
    "documents": documents,
    "theme": theme_name.get(),
    "topics": topics_cache
}, open(path, "w", encoding="utf-8"), indent=2)
Enter fullscreen mode Exit fullscreen mode

Load

data = json.load(open(path, encoding="utf-8"))
documents[:] = data.get("documents", [])
Enter fullscreen mode Exit fullscreen mode

1️⃣5️⃣ About & License Window

def show_about():
    win = tb.Toplevel(app)
    win.title("About")
    win.geometry("560x460")
Enter fullscreen mode Exit fullscreen mode

This adds professional polish and licensing clarity.

🚀 Final Step: Run the App

app.mainloop()
Enter fullscreen mode Exit fullscreen mode

🎉 What You Built

✅ A real desktop ML app
✅ Topic modeling with LDA
✅ Visual word clouds
✅ CSV export
✅ Save/load projects

Top comments (0)