DEV Community

Dragos Balota
Dragos Balota

Posted on

Bulk URL Checker with uv: Validate Website Accessibility in Python

Building a powerful URL validation tool has never been easier! With Python's uv package manager, you can create an enterprise-level URL checker that validates hundreds of websites concurrently with zero setup required.

🚀 What Makes This Special?

  • Zero Configuration : Run immediately with uv - no virtual environments or dependency management
  • Concurrent Processing : Check multiple URLs simultaneously using ThreadPoolExecutor
  • Smart Error Detection : Categorizes timeouts, connection errors, and HTTP status codes
  • Detailed Reporting : Response times, status codes, and comprehensive error analysis
  • File I/O : Read URLs from files and save problematic URLs for review
  • Cross-Platform : Works seamlessly on macOS, Windows, and Linux

🛠️ The Complete Script

Save this as url_checker.py and run with uv run url_checker.py :

#!/usr/bin/env -S uv run
# /// script
# dependencies = [
#     "requests",
# ]
# ///

import requests
from urllib.parse import urlparse
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import sys

def check_url(url, timeout=10):
    """Check if a URL is accessible and return status information."""
    if not url.startswith(('http://', 'https://')):
        url = 'https://' + url

    start_time = time.time()

    try:
        response = requests.get(url, timeout=timeout, allow_redirects=True)
        response_time = time.time() - start_time

        return {
            'url': url,
            'status': 'OK',
            'status_code': response.status_code,
            'error_type': None,
            'response_time': round(response_time, 2)
        }

    except requests.exceptions.Timeout:
        return {
            'url': url,
            'status': 'TIMEOUT',
            'status_code': None,
            'error_type': 'Connection timeout',
            'response_time': timeout
        }

    except requests.exceptions.ConnectionError as e:
        return {
            'url': url,
            'status': 'CONNECTION_ERROR',
            'status_code': None,
            'error_type': f'Connection error: {str(e)[:100]}...',
            'response_time': time.time() - start_time
        }

    except requests.exceptions.RequestException as e:
        return {
            'url': url,
            'status': 'ERROR',
            'status_code': None,
            'error_type': f'Request error: {str(e)[:100]}...',
            'response_time': time.time() - start_time
        }

def check_urls_batch(urls, timeout=10, max_workers=10):
    """Check multiple URLs concurrently."""
    results = []

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_url = {executor.submit(check_url, url, timeout): url for url in urls}

        for i, future in enumerate(as_completed(future_to_url), 1):
            result = future.result()
            results.append(result)
            print(f"Checked {i}/{len(urls)} URLs: {result['url']} - {result['status']}")

    return results

# Full script available in the complete article
Enter fullscreen mode Exit fullscreen mode

📊 Perfect For:

  • Website Audits : Validate all external links on your site
  • SEO Monitoring : Check backlinks and partner sites
  • API Health Checks : Monitor endpoint availability
  • Competitor Analysis : Track competitor website status
  • CI/CD Integration : Automated link validation in pipelines ## 🎯 Sample Output
URL Connection Checker
==============================
Found 8 URLs to check.
Using timeout: 10 seconds

Checked 1/8 URLs: https://www.google.com - OK
Checked 2/8 URLs: https://nonexistent-site.com - CONNECTION_ERROR

==================================================
SUMMARY
==================================================
Total URLs checked: 8
Working URLs: 7
Problematic URLs: 1

WORKING URLs (7):
  ✓ https://www.google.com (Status: 200, Time: 0.15s)
  ✓ https://www.github.com (Status: 200, Time: 0.23s)
Enter fullscreen mode Exit fullscreen mode

🚀 Why uv?

  • Instant execution : No setup, no virtual environments
  • Automatic dependency management : uv handles everything
  • Lightning fast : Faster than pip and conda
  • Modern Python tooling : The future of Python package management ## 💡 Advanced Features
  • Custom timeouts and worker counts
  • Automatic HTTPS protocol addition
  • Progress tracking with real-time updates
  • Categorized error reporting
  • Export problematic URLs to files
  • Support for comments in URL lists This tool has saved me countless hours in website maintenance and SEO auditing. The concurrent processing makes it incredibly fast, and the detailed error categorization helps prioritize which issues to fix first.

Try it yourself : Create a urls.txt file with your links and run uv run url_checker.py !

Want the complete script with all advanced features?

Check out the full article with detailed explanations, troubleshooting tips, and integration examples.

Top comments (0)