DEV Community

Mate Technologies
Mate Technologies

Posted on

πŸ”— Build a Professional Link Extractor GUI in Python (Step-by-Step)

In this tutorial, we’ll build LinkVault, a professional desktop application that:

βœ… Recursively scans folders
βœ… Extracts URLs from .txt, .pdf, and .html files
βœ… Removes duplicate links automatically
βœ… Shows a smooth animated progress bar
βœ… Lets users export or copy links
βœ… Uses a modern PySide6 GUI

This guide is beginner-friendly and explains why each part exists.

🧰 Requirements

Before starting, install the dependencies:

pip install PySide6 PyPDF2

πŸ“ Project Structure
linkvault/
β”œβ”€β”€ main.py
β”œβ”€β”€ logo.ico

🧠 Step 1: Import Required Modules

We start by importing Python’s built-in modules and PySide6 components.

import os
import sys
import re
import subprocess
import platform
Enter fullscreen mode Exit fullscreen mode

Why these?

os, sys β†’ file system access

re β†’ extract URLs using regex

platform, subprocess β†’ open links cross-platform

PySide6 GUI Imports
from PySide6.QtWidgets import (
    QApplication, QWidget, QFileDialog, QVBoxLayout, QHBoxLayout,
    QPushButton, QLabel, QLineEdit, QListWidget, QProgressBar,
    QMessageBox, QCheckBox
)
from PySide6.QtCore import Qt, QThread, Signal, QTimer
from PySide6.QtGui import QIcon, QGuiApplication
Enter fullscreen mode Exit fullscreen mode

These components allow us to build a modern desktop interface.

PDF Support

import PyPDF2
Enter fullscreen mode Exit fullscreen mode

This library lets us extract text (and links) from PDF files.

πŸ“¦ Step 2: Handle Bundled App Resources

When packaging with PyInstaller, files like icons need special handling.

def resource_path(file_name):
    base_path = getattr(sys, '_MEIPASS', os.path.dirname(os.path.abspath(__file__)))
    return os.path.join(base_path, file_name)
Enter fullscreen mode Exit fullscreen mode

This ensures logo.ico works both in development and packaged builds.

🧡 Step 3: Create a Worker Thread (Very Important!)
Why a Worker Thread?

If we scan files on the main GUI thread, the app will freeze.
We fix this using QThread.

Worker Thread Skeleton

class LinkExtractWorker(QThread):
    found = Signal(str)
    progress = Signal(int)
    finished = Signal()
Enter fullscreen mode Exit fullscreen mode

Signals allow safe communication from the worker to the UI.

Worker Initialization

def __init__(self, folder, file_types):
    super().__init__()
    self.folder = folder
    self.file_types = file_types
    self._running = True
    self.seen_links = set()
Enter fullscreen mode Exit fullscreen mode

seen_links prevents duplicates

_running allows cancellation

Stop the Worker Safely

def stop(self):
    self._running = False
Enter fullscreen mode Exit fullscreen mode

Walk the Folder Recursively

for root, dirs, files in os.walk(self.folder):
    for f in files:
        ext = os.path.splitext(f)[1].lower()
Enter fullscreen mode Exit fullscreen mode

We scan every subfolder automatically.

Filter Files by Type

if (ext == '.txt' and self.file_types['txt']) or \
   (ext == '.pdf' and self.file_types['pdf']) or \
   (ext in ['.html', '.htm'] and self.file_types['html']):
    all_files.append(os.path.join(root, f))
Enter fullscreen mode Exit fullscreen mode

Checkboxes control what gets scanned.

Extract URLs from Text & HTML

urls = re.findall(r'https?://[^\s"\'>]+', text)
Enter fullscreen mode Exit fullscreen mode

This regex matches most valid web URLs.

Extract URLs from PDFs

reader = PyPDF2.PdfReader(f)
for page in reader.pages:
    text = page.extract_text()

Enter fullscreen mode Exit fullscreen mode

We scan each page safely.

Emit Found Links

if url not in self.seen_links:
    self.seen_links.add(url)
    self.found.emit(url)
Enter fullscreen mode Exit fullscreen mode

No duplicates. Clean output.

Update Progress

percent = int((i + 1) / total_files * 100)
self.progress.emit(percent)
Enter fullscreen mode Exit fullscreen mode

πŸ–₯ Step 4: Create the Main Application Window

class LinkExtractorApp(QWidget):
    def __init__(self):
        super().__init__()
Enter fullscreen mode Exit fullscreen mode

This class controls everything the user sees.

Window Setup

self.setWindowTitle("LinkVault – Professional Link Extractor")
self.setMinimumSize(1200, 680)
self.setWindowIcon(QIcon(resource_path("logo.ico")))
Enter fullscreen mode Exit fullscreen mode

πŸŽ› Step 5: Build the User Interface
Folder Selection

self.path_input = QLineEdit()
self.path_input.setReadOnly(True)
Enter fullscreen mode Exit fullscreen mode

Users can’t type paths manuallyβ€”only browse.

Buttons

browse_btn = QPushButton("πŸ“‚ Browse Folder")
self.start_btn = QPushButton("πŸš€ Extract Links")
self.cancel_btn = QPushButton("⏹ Cancel")
Enter fullscreen mode Exit fullscreen mode

Clear, emoji-based UX πŸ‘

File Type Filters

self.txt_checkbox = QCheckBox(".txt")
self.pdf_checkbox = QCheckBox(".pdf")
self.html_checkbox = QCheckBox(".html/.htm")
Enter fullscreen mode Exit fullscreen mode

Results List

self.results_list = QListWidget()
self.results_list.itemDoubleClicked.connect(self.open_item)
Enter fullscreen mode Exit fullscreen mode

Double-click opens links in the browser.

✨ Step 6: Animated Progress Bar
Why Not Default?

We want a smooth glowing animation, not a jumpy bar.

Smooth Progress Logic

def update_progress_smooth(self):
    if self.smooth_value < self.target_progress:
        self.smooth_value += 1
Enter fullscreen mode Exit fullscreen mode

Glowing Gradient Effect

QProgressBar::chunk {
    background: qlineargradient(
        stop:0 #2563eb,
        stop:0.5 #60a5fa,
        stop:1 #2563eb
    );
}
Enter fullscreen mode Exit fullscreen mode

Looks professional and modern.

πŸ“€ Step 7: Export & Clipboard Support
Export to TXT

with open(path, 'w', encoding='utf-8') as f:
    f.write(link + "\n")
Enter fullscreen mode Exit fullscreen mode

Copy to Clipboard

QGuiApplication.clipboard().setText(links)
Enter fullscreen mode Exit fullscreen mode

Works instantly across platforms.

🌐 Step 8: Open Links Cross-Platform

if platform.system() == "Windows":
    os.startfile(url)
elif platform.system() == "Darwin":
    subprocess.Popen(["open", url])
else:
    subprocess.Popen(["xdg-open", url])
Enter fullscreen mode Exit fullscreen mode

🎨 Step 9: Apply Modern Styling

QWidget {
    background-color: #0f172a;
    color: #e5e7eb;
}
Enter fullscreen mode Exit fullscreen mode

Dark mode, rounded buttons, and soft colors.

β–Ά Step 10: Run the App

if __name__ == "__main__":
    app = QApplication(sys.argv)
    window = LinkExtractorApp()
    window.show()
    sys.exit(app.exec())
Enter fullscreen mode Exit fullscreen mode

βœ… Final Features Recap

βœ” Recursive folder scanning
βœ” URL extraction from multiple formats
βœ” Duplicate removal
βœ” Cancelable background processing
βœ” Animated progress bar
βœ” Export & clipboard support
βœ” Modern UI

πŸš€ Next Improvements (Optional)

CSV export

Domain grouping

Regex customization

Drag-and-drop folders

URL validation

LinkVault

Top comments (0)