Bulk URL Checker with uv: Validate Website Accessibility in Python

#python #uv #programming

Building a powerful URL validation tool has never been easier! With Python's uv package manager, you can create an enterprise-level URL checker that validates hundreds of websites concurrently with zero setup required.

🚀 What Makes This Special?

Zero Configuration : Run immediately with uv - no virtual environments or dependency management
Concurrent Processing : Check multiple URLs simultaneously using ThreadPoolExecutor
Smart Error Detection : Categorizes timeouts, connection errors, and HTTP status codes
Detailed Reporting : Response times, status codes, and comprehensive error analysis
File I/O : Read URLs from files and save problematic URLs for review
Cross-Platform : Works seamlessly on macOS, Windows, and Linux

🛠️ The Complete Script

Save this as url_checker.py and run with uv run url_checker.py :

#!/usr/bin/env -S uv run
# /// script
# dependencies = [
#     "requests",
# ]
# ///

import requests
from urllib.parse import urlparse
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import sys

def check_url(url, timeout=10):
    """Check if a URL is accessible and return status information."""
    if not url.startswith(('http://', 'https://')):
        url = 'https://' + url

    start_time = time.time()

    try:
        response = requests.get(url, timeout=timeout, allow_redirects=True)
        response_time = time.time() - start_time

        return {
            'url': url,
            'status': 'OK',
            'status_code': response.status_code,
            'error_type': None,
            'response_time': round(response_time, 2)
        }

    except requests.exceptions.Timeout:
        return {
            'url': url,
            'status': 'TIMEOUT',
            'status_code': None,
            'error_type': 'Connection timeout',
            'response_time': timeout
        }

    except requests.exceptions.ConnectionError as e:
        return {
            'url': url,
            'status': 'CONNECTION_ERROR',
            'status_code': None,
            'error_type': f'Connection error: {str(e)[:100]}...',
            'response_time': time.time() - start_time
        }

    except requests.exceptions.RequestException as e:
        return {
            'url': url,
            'status': 'ERROR',
            'status_code': None,
            'error_type': f'Request error: {str(e)[:100]}...',
            'response_time': time.time() - start_time
        }

def check_urls_batch(urls, timeout=10, max_workers=10):
    """Check multiple URLs concurrently."""
    results = []

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_url = {executor.submit(check_url, url, timeout): url for url in urls}

        for i, future in enumerate(as_completed(future_to_url), 1):
            result = future.result()
            results.append(result)
            print(f"Checked {i}/{len(urls)} URLs: {result['url']} - {result['status']}")

    return results

# Full script available in the complete article

📊 Perfect For:

Website Audits : Validate all external links on your site
SEO Monitoring : Check backlinks and partner sites
API Health Checks : Monitor endpoint availability
Competitor Analysis : Track competitor website status
CI/CD Integration : Automated link validation in pipelines ## 🎯 Sample Output

URL Connection Checker
==============================
Found 8 URLs to check.
Using timeout: 10 seconds

Checked 1/8 URLs: https://www.google.com - OK
Checked 2/8 URLs: https://nonexistent-site.com - CONNECTION_ERROR

==================================================
SUMMARY
==================================================
Total URLs checked: 8
Working URLs: 7
Problematic URLs: 1

WORKING URLs (7):
  ✓ https://www.google.com (Status: 200, Time: 0.15s)
  ✓ https://www.github.com (Status: 200, Time: 0.23s)

🚀 Why uv?

Instant execution : No setup, no virtual environments
Automatic dependency management : uv handles everything
Lightning fast : Faster than pip and conda
Modern Python tooling : The future of Python package management ## 💡 Advanced Features
Custom timeouts and worker counts
Automatic HTTPS protocol addition
Progress tracking with real-time updates
Categorized error reporting
Export problematic URLs to files
Support for comments in URL lists This tool has saved me countless hours in website maintenance and SEO auditing. The concurrent processing makes it incredibly fast, and the detailed error categorization helps prioritize which issues to fix first.

Try it yourself : Create a urls.txt file with your links and run uv run url_checker.py !

Want the complete script with all advanced features?

Check out the full article with detailed explanations, troubleshooting tips, and integration examples.