Mate Technologies

Posted on Jan 14

How to Build DupCleaner PRO – A Python Duplicate File Finder & Cleaner

#python #desktopapp #duplicatefiles #filemanagement

DupCleaner PRO v1.0.0 is a professional Python desktop app that helps you find, preview, and safely delete duplicate files. In this tutorial, we’ll walk through how it works, step by step, with code snippets and explanations.

Step 1: Project Setup

First, clone the repository from GitHub:

git clone https://github.com/rogers-cyber/DupCleanerPRO.git
cd DupCleanerPRO

Then, create a Python virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate     # Windows

Install the required Python packages:

pip install ttkbootstrap pillow send2trash

ttkbootstrap: Modern Tkinter themes for your GUI

Pillow: Image handling for thumbnails

send2trash: Safe deletion (moves files to Recycle Bin)

Step 2: Basic GUI Setup

We’ll use ttkbootstrap and Tkinter to create the main window:

import ttkbootstrap as tb

# Create main application window
app = tb.Window(themename="darkly")
app.title("DupCleaner PRO")
app.geometry("1200x680")

# Add title label
tb.Label(app, text="DupCleaner PRO", font=("Segoe UI", 22, "bold")).pack(pady=(10, 2))

app.mainloop()

Explanation:

tb.Window creates the main window with a dark theme

tb.Label adds a title

app.mainloop() starts the GUI event loop

Step 3: Add File and Folder Selection

Users need to choose files or folders to scan. Let’s add a listbox with buttons:

from tkinter import filedialog

target_paths = []

def add_files():
    files = filedialog.askopenfilenames()
    if files:
        for f in files:
            if f not in target_paths:
                target_paths.append(f)
                target_listbox.insert("end", f)

def add_folder():
    folder = filedialog.askdirectory()
    if folder and folder not in target_paths:
        target_paths.append(folder)
        target_listbox.insert("end", folder)

askopenfilenames() lets users pick multiple files

askdirectory() lets users select a folder

target_paths stores all selected paths

Add a Listbox to show selected paths:

import tkinter as tk

target_listbox = tk.Listbox(app, height=5, width=60)
target_listbox.pack(pady=10)

Step 4: Scanning for Duplicates

We’ll scan files using size first, then hash for accuracy:

import os, hashlib
from collections import defaultdict

def file_hash(path, chunk_size=65536):
    md5 = hashlib.md5()
    with open(path, "rb") as f:
        while True:
            data = f.read(chunk_size)
            if not data:
                break
            md5.update(data)
    return md5.hexdigest()

def scan_duplicates(files):
    size_map = defaultdict(list)
    for f in files:
        try:
            size_map[os.path.getsize(f)].append(f)
        except Exception:
            continue

    duplicates = []
    for group in size_map.values():
        if len(group) > 1:
            hash_map = defaultdict(list)
            for f in group:
                h = file_hash(f)
                hash_map[h].append(f)
            for dup_group in hash_map.values():
                if len(dup_group) > 1:
                    duplicates.append(dup_group)
    return duplicates

Explanation:

size_map groups files by size (fast pre-check)

hash_map groups files by MD5 hash (accurate duplicate detection)

Only groups with more than one file are considered duplicates

Step 5: Display Results in a Treeview

We can show duplicate groups using a Treeview widget:

from ttkbootstrap.constants import *

tree = tb.Treeview(app, columns=("group", "count"), show="headings")
tree.heading("group", text="Group")
tree.heading("count", text="Count")
tree.pack(fill="both", expand=True, padx=10, pady=10)

def show_duplicates(duplicates):
    tree.delete(*tree.get_children())
    for i, group in enumerate(duplicates, 1):
        tree.insert("", "end", values=(f"Group {i}", len(group)))

Explanation:

Treeview shows duplicate groups in a table format

Clicking a group can later show thumbnails or file list

Step 6: Preview Files with Thumbnails

For image files, we can display thumbnails:

from PIL import Image, ImageTk

thumbnail_cache = []

def show_thumbnails(files):
    for widget in preview_frame.winfo_children():
        widget.destroy()
    thumbnail_cache.clear()
    for f in files:
        if f.lower().endswith((".png", ".jpg", ".jpeg", ".gif", ".bmp")):
            img = Image.open(f)
            img.thumbnail((120, 120))
            tk_img = ImageTk.PhotoImage(img)
            thumbnail_cache.append(tk_img)
            lbl = tk.Label(preview_frame, image=tk_img, text=os.path.basename(f), compound="top")
            lbl.pack(side="left", padx=5, pady=5)

thumbnail_cache keeps references to images to prevent garbage collection

Only image files are displayed as thumbnails

Step 7: Safe Deletion

We can delete selected duplicates using send2trash:

from send2trash import send2trash

def delete_files(file_list):
    for f in file_list:
        try:
            send2trash(f)
        except Exception as e:
            print(f"Failed to delete {f}: {e}")

Explanation:

send2trash moves files to the Recycle Bin instead of permanent deletion

It’s safer for accidental deletion

Step 8: Export Results

Users can save duplicate reports to JSON or TXT:

import json

def export_json(duplicates, path="duplicates.json"):
    data = {f"Group {i+1}": lst for i, lst in enumerate(duplicates)}
    with open(path, "w") as f:
        json.dump(data, f, indent=2)

def export_txt(duplicates, path="duplicates.txt"):
    with open(path, "w") as f:
        for i, group in enumerate(duplicates, 1):
            f.write(f"Group {i} ({len(group)} files)\n")
            for file in group:
                f.write(f"{file}\n")
            f.write("\n")

Step 9: Putting It All Together

Combine all the steps into the main app:

GUI layout

File selection

Scan duplicates button

Preview panel

Delete duplicates button

Export buttons

At the end, call:

app.mainloop()

And your DupCleaner PRO is ready to use!

Step 10: Next Steps / Enhancements

Add keep newest/first file logic

Add progress bar and ETA during scanning

Add custom settings save/load

Add About / Help section

✅ Summary:
This step-by-step guide walked through building DupCleaner PRO, covering file selection, duplicate detection, thumbnails, safe deletion, and export. With these basics, you can extend the app with features like threaded scanning, more file previews, or enhanced UI.

GitHub Repository: https://github.com/rogers-cyber/DupCleanerPRO

DEV Community

How to Build DupCleaner PRO – A Python Duplicate File Finder & Cleaner

Top comments (0)