In this tutorial, weβll build a desktop app that:
β
Extracts links from files (.txt, .pdf, .html)
β
Filters links (include/exclude keywords)
β
Checks if links are broken
β
Displays results with colors (π’ working / π΄ broken)
β
Uses a modern GUI with PySide6
π¦ Step 1: Install Dependencies
First, install required packages:
pip install PySide6 requests PyPDF2
π§ Step 2: Import Required Libraries
We start by importing everything we need:
import os
import sys
import re
import requests
import time
import platform
import subprocess
from PySide6.QtWidgets import *
from PySide6.QtCore import Qt, QThread, Signal, QTimer
from PySide6.QtGui import QColor, QIcon, QGuiApplication
import PyPDF2
π‘ Explanation:
os, re β file handling + regex
requests β check links
PySide6 β GUI framework
PyPDF2 β extract text from PDFs
π§΅ Step 3: Create a Background Worker (QThread)
We use a thread so the UI doesnβt freeze while scanning.
class LinkWorker(QThread):
found = Signal(str, bool)
progress = Signal(int)
finished = Signal()
π‘ Why?
GUI apps must stay responsive, so heavy work runs in a thread.
π Step 3.1: Initialize Worker
def __init__(self, folder, file_types, check_broken, include_words=None, exclude_words=None):
super().__init__()
self.folder = folder
self.file_types = file_types
self.check_broken = check_broken
self.include_words = include_words or []
self.exclude_words = exclude_words or []
self.seen_links = set()
self._running = True
π‘ Features:
Avoid duplicate links
Support include/exclude filters
Allow stopping process
π Step 3.2: Scan Files
def run(self):
all_files = []
for root, _, files in os.walk(self.folder):
for f in files:
ext = os.path.splitext(f)[1].lower()
if (ext == '.txt' and self.file_types['txt']) or \
(ext == '.pdf' and self.file_types['pdf']) or \
(ext in ['.html', '.htm'] and self.file_types['html']):
all_files.append(os.path.join(root, f))
π‘ What happens:
Recursively scans folders
Filters only selected file types
π Step 3.3: Extract Links
urls = re.findall(r'https?://[^\s"\'>]+', text)
π‘ Regex explained:
Matches http:// or https://
Stops at spaces or quotes
π Handle PDF Files
reader = PyPDF2.PdfReader(f)
for page in reader.pages:
text = page.extract_text()
π― Step 3.4: Apply Filters
if self.include_words and not any(w in url for w in self.include_words):
continue
if self.exclude_words and any(w in url for w in self.exclude_words):
continue
π‘ Example:
Include: google
Exclude: facebook
π Step 3.5: Check Broken Links
def check_link(self, url):
try:
res = requests.get(url, timeout=10)
return not (200 <= res.status_code < 400)
except:
return True
π‘ Logic:
200β399 β OK
400+ β broken
π₯οΈ Step 4: Build the GUI
Create the main window:
class LinkApp(QWidget):
def __init__(self):
super().__init__()
self.setWindowTitle("LinkGuardian")
self.setMinimumSize(1000, 600)
π Step 4.1: Folder Selection
self.path_input = QLineEdit()
self.path_input.setReadOnly(True)
browse_btn = QPushButton("Browse")
browse_btn.clicked.connect(self.browse_folder)
def browse_folder(self):
folder = QFileDialog.getExistingDirectory(self)
if folder:
self.path_input.setText(folder)
self.folder = folder
βοΈ Step 4.2: Options (Checkboxes)
self.txt_checkbox = QCheckBox(".txt")
self.pdf_checkbox = QCheckBox(".pdf")
self.html_checkbox = QCheckBox(".html")
self.check_broken_checkbox = QCheckBox("Check Broken Links")
π Step 4.3: Filters
self.include_input = QLineEdit()
self.include_input.setPlaceholderText("Include words")
self.exclude_input = QLineEdit()
self.exclude_input.setPlaceholderText("Exclude words")
βΆοΈ Step 4.4: Start Scan
def start_scan(self):
self.worker = LinkWorker(
self.folder,
{
'txt': self.txt_checkbox.isChecked(),
'pdf': self.pdf_checkbox.isChecked(),
'html': self.html_checkbox.isChecked()
},
self.check_broken_checkbox.isChecked(),
self.include_input.text().split(","),
self.exclude_input.text().split(",")
)
self.worker.found.connect(self.add_link)
self.worker.start()
π¨ Step 5: Display Results
def add_link(self, link, is_broken):
item = QListWidgetItem(link)
color = QColor("red") if is_broken else QColor("green")
item.setForeground(color)
self.results_list.addItem(item)
π‘ Result:
π’ Green β Working link
π΄ Red β Broken link
π Step 6: Progress Bar
self.progress_bar = QProgressBar()
self.progress_bar.setMaximum(100)
Update it from the worker:
self.worker.progress.connect(self.progress_bar.setValue)
π Step 7: Copy All Links
def copy_all_links(self):
links = "\n".join(
self.results_list.item(i).text()
for i in range(self.results_list.count())
)
QGuiApplication.clipboard().setText(links)
π Step 8: Open Links on Double Click
def open_item(self, item):
url = item.text()
if platform.system() == "Windows":
os.startfile(url)
else:
subprocess.Popen(["xdg-open", url])
π Step 9: Run the App
if __name__ == "__main__":
app = QApplication(sys.argv)
window = LinkApp()
window.show()
sys.exit(app.exec())
π Final Result
You now have a professional desktop tool that:
β Extracts links from files
β Filters intelligently
β Detects broken links
β Displays results beautifully
β Runs smoothly with threads
π‘ Bonus Ideas
Want to upgrade it further?
Export results to CSV
Add domain grouping
Add link preview
Add multi-threaded link checking (faster π)
Top comments (0)