In this tutorial, weβll build LinkVault, a professional desktop application that:
β
Recursively scans folders
β
Extracts URLs from .txt, .pdf, and .html files
β
Removes duplicate links automatically
β
Shows a smooth animated progress bar
β
Lets users export or copy links
β
Uses a modern PySide6 GUI
This guide is beginner-friendly and explains why each part exists.
π§° Requirements
Before starting, install the dependencies:
pip install PySide6 PyPDF2
π Project Structure
linkvault/
βββ main.py
βββ logo.ico
π§ Step 1: Import Required Modules
We start by importing Pythonβs built-in modules and PySide6 components.
import os
import sys
import re
import subprocess
import platform
Why these?
os, sys β file system access
re β extract URLs using regex
platform, subprocess β open links cross-platform
PySide6 GUI Imports
from PySide6.QtWidgets import (
QApplication, QWidget, QFileDialog, QVBoxLayout, QHBoxLayout,
QPushButton, QLabel, QLineEdit, QListWidget, QProgressBar,
QMessageBox, QCheckBox
)
from PySide6.QtCore import Qt, QThread, Signal, QTimer
from PySide6.QtGui import QIcon, QGuiApplication
These components allow us to build a modern desktop interface.
PDF Support
import PyPDF2
This library lets us extract text (and links) from PDF files.
π¦ Step 2: Handle Bundled App Resources
When packaging with PyInstaller, files like icons need special handling.
def resource_path(file_name):
base_path = getattr(sys, '_MEIPASS', os.path.dirname(os.path.abspath(__file__)))
return os.path.join(base_path, file_name)
This ensures logo.ico works both in development and packaged builds.
π§΅ Step 3: Create a Worker Thread (Very Important!)
Why a Worker Thread?
If we scan files on the main GUI thread, the app will freeze.
We fix this using QThread.
Worker Thread Skeleton
class LinkExtractWorker(QThread):
found = Signal(str)
progress = Signal(int)
finished = Signal()
Signals allow safe communication from the worker to the UI.
Worker Initialization
def __init__(self, folder, file_types):
super().__init__()
self.folder = folder
self.file_types = file_types
self._running = True
self.seen_links = set()
seen_links prevents duplicates
_running allows cancellation
Stop the Worker Safely
def stop(self):
self._running = False
Walk the Folder Recursively
for root, dirs, files in os.walk(self.folder):
for f in files:
ext = os.path.splitext(f)[1].lower()
We scan every subfolder automatically.
Filter Files by Type
if (ext == '.txt' and self.file_types['txt']) or \
(ext == '.pdf' and self.file_types['pdf']) or \
(ext in ['.html', '.htm'] and self.file_types['html']):
all_files.append(os.path.join(root, f))
Checkboxes control what gets scanned.
Extract URLs from Text & HTML
urls = re.findall(r'https?://[^\s"\'>]+', text)
This regex matches most valid web URLs.
Extract URLs from PDFs
reader = PyPDF2.PdfReader(f)
for page in reader.pages:
text = page.extract_text()
We scan each page safely.
Emit Found Links
if url not in self.seen_links:
self.seen_links.add(url)
self.found.emit(url)
No duplicates. Clean output.
Update Progress
percent = int((i + 1) / total_files * 100)
self.progress.emit(percent)
π₯ Step 4: Create the Main Application Window
class LinkExtractorApp(QWidget):
def __init__(self):
super().__init__()
This class controls everything the user sees.
Window Setup
self.setWindowTitle("LinkVault β Professional Link Extractor")
self.setMinimumSize(1200, 680)
self.setWindowIcon(QIcon(resource_path("logo.ico")))
π Step 5: Build the User Interface
Folder Selection
self.path_input = QLineEdit()
self.path_input.setReadOnly(True)
Users canβt type paths manuallyβonly browse.
Buttons
browse_btn = QPushButton("π Browse Folder")
self.start_btn = QPushButton("π Extract Links")
self.cancel_btn = QPushButton("βΉ Cancel")
Clear, emoji-based UX π
File Type Filters
self.txt_checkbox = QCheckBox(".txt")
self.pdf_checkbox = QCheckBox(".pdf")
self.html_checkbox = QCheckBox(".html/.htm")
Results List
self.results_list = QListWidget()
self.results_list.itemDoubleClicked.connect(self.open_item)
Double-click opens links in the browser.
β¨ Step 6: Animated Progress Bar
Why Not Default?
We want a smooth glowing animation, not a jumpy bar.
Smooth Progress Logic
def update_progress_smooth(self):
if self.smooth_value < self.target_progress:
self.smooth_value += 1
Glowing Gradient Effect
QProgressBar::chunk {
background: qlineargradient(
stop:0 #2563eb,
stop:0.5 #60a5fa,
stop:1 #2563eb
);
}
Looks professional and modern.
π€ Step 7: Export & Clipboard Support
Export to TXT
with open(path, 'w', encoding='utf-8') as f:
f.write(link + "\n")
Copy to Clipboard
QGuiApplication.clipboard().setText(links)
Works instantly across platforms.
π Step 8: Open Links Cross-Platform
if platform.system() == "Windows":
os.startfile(url)
elif platform.system() == "Darwin":
subprocess.Popen(["open", url])
else:
subprocess.Popen(["xdg-open", url])
π¨ Step 9: Apply Modern Styling
QWidget {
background-color: #0f172a;
color: #e5e7eb;
}
Dark mode, rounded buttons, and soft colors.
βΆ Step 10: Run the App
if __name__ == "__main__":
app = QApplication(sys.argv)
window = LinkExtractorApp()
window.show()
sys.exit(app.exec())
β Final Features Recap
β Recursive folder scanning
β URL extraction from multiple formats
β Duplicate removal
β Cancelable background processing
β Animated progress bar
β Export & clipboard support
β Modern UI
π Next Improvements (Optional)
CSV export
Domain grouping
Regex customization
Drag-and-drop folders
URL validation

Top comments (0)