Stop Writing Messy Web Scrapers — A Clean Python Template That Actually Works

#webdev #beginners #python #automation

Most web scraping tutorials give you 20 lines of spaghetti code that breaks the moment the website changes. Here's a better way.

The Problem

You search "Python web scraping tutorial", copy the code, and end up with something like this:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://example.com")
soup = BeautifulSoup(r.content, 'html.parser')
data = soup.find_all('div', class_='product')
# ... 50 more lines of hardcoded mess

It works once. Then the URL changes. Or you need to scrape a different site. You rewrite everything from scratch.

The Fix: CONFIG-Based Template

The solution is simple: separate configuration from logic. Put everything that might change at the top, keep the functions clean.

# ─── CONFIG — change only this section ───
CONFIG = {
    'url': 'https://example.com',
    'output_file': 'results.csv',
    'timeout': 10,
    'headers': {
        'User-Agent': 'Mozilla/5.0 (compatible; MyBot/1.0)'
    }
}
# ─────────────────────────────────────────

def get_page(url):
    response = requests.get(url, headers=CONFIG['headers'], timeout=CONFIG['timeout'])
    response.raise_for_status()
    return BeautifulSoup(response.content, 'html.parser')

def extract_data(soup):
    title = soup.title.string.strip() if soup.title else "N/A"
    links = [a['href'] for a in soup.find_all('a', href=True)]
    text  = ' '.join(p.get_text(strip=True) for p in soup.find_all('p'))
    return {'title': title, 'links': ' | '.join(links), 'text': text[:500]}

def save_to_csv(data, path):
    import csv
    with open(path, 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=data.keys())
        writer.writeheader()
        writer.writerow(data)

Why This Pattern Works

Change target URL → edit 1 line in CONFIG
Different output file → edit 1 line in CONFIG
Add rate limiting → add 'delay': 2 to CONFIG
Hand it to a teammate → they only need to read CONFIG

No hunting through 200 lines of code to find what to change.

Running It

pip install requests beautifulsoup4
python scraper.py

Output: a clean CSV with title, all links, and page text. Ready to open in Excel or feed into your next script.

What's Next

Once you have the base template, extending it is easy:

Add pagination: loop through page=1, page=2, etc. in CONFIG
Add error retries: wrap get_page() in a retry decorator
Add scheduling: use schedule library to run it daily

I packaged the complete version (with error handling, CSV output, retry logic, and full comments) as a ready-to-run script. If you want to skip the setup: grab the full template here — it's $15 and runs out of the box.

Otherwise, the pattern above is enough to get you started. Happy scraping.