DEV Community

AxonCraft
AxonCraft

Posted on

Stop Writing Messy Web Scrapers — A Clean Python Template That Actually Works

Most web scraping tutorials give you 20 lines of spaghetti code that breaks the moment the website changes. Here's a better way.

The Problem

You search "Python web scraping tutorial", copy the code, and end up with something like this:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://example.com")
soup = BeautifulSoup(r.content, 'html.parser')
data = soup.find_all('div', class_='product')
# ... 50 more lines of hardcoded mess
Enter fullscreen mode Exit fullscreen mode

It works once. Then the URL changes. Or you need to scrape a different site. You rewrite everything from scratch.

The Fix: CONFIG-Based Template

The solution is simple: separate configuration from logic. Put everything that might change at the top, keep the functions clean.

# ─── CONFIG — change only this section ───
CONFIG = {
    'url': 'https://example.com',
    'output_file': 'results.csv',
    'timeout': 10,
    'headers': {
        'User-Agent': 'Mozilla/5.0 (compatible; MyBot/1.0)'
    }
}
# ─────────────────────────────────────────

def get_page(url):
    response = requests.get(url, headers=CONFIG['headers'], timeout=CONFIG['timeout'])
    response.raise_for_status()
    return BeautifulSoup(response.content, 'html.parser')

def extract_data(soup):
    title = soup.title.string.strip() if soup.title else "N/A"
    links = [a['href'] for a in soup.find_all('a', href=True)]
    text  = ' '.join(p.get_text(strip=True) for p in soup.find_all('p'))
    return {'title': title, 'links': ' | '.join(links), 'text': text[:500]}

def save_to_csv(data, path):
    import csv
    with open(path, 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=data.keys())
        writer.writeheader()
        writer.writerow(data)
Enter fullscreen mode Exit fullscreen mode

Why This Pattern Works

  1. Change target URL → edit 1 line in CONFIG
  2. Different output file → edit 1 line in CONFIG
  3. Add rate limiting → add 'delay': 2 to CONFIG
  4. Hand it to a teammate → they only need to read CONFIG

No hunting through 200 lines of code to find what to change.

Running It

pip install requests beautifulsoup4
python scraper.py
Enter fullscreen mode Exit fullscreen mode

Output: a clean CSV with title, all links, and page text. Ready to open in Excel or feed into your next script.

What's Next

Once you have the base template, extending it is easy:

  • Add pagination: loop through page=1, page=2, etc. in CONFIG
  • Add error retries: wrap get_page() in a retry decorator
  • Add scheduling: use schedule library to run it daily

I packaged the complete version (with error handling, CSV output, retry logic, and full comments) as a ready-to-run script. If you want to skip the setup: grab the full template here — it's $15 and runs out of the box.

Otherwise, the pattern above is enough to get you started. Happy scraping.

Top comments (0)