Most web scraping tutorials give you 20 lines of spaghetti code that breaks the moment the website changes. Here's a better way.
The Problem
You search "Python web scraping tutorial", copy the code, and end up with something like this:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://example.com")
soup = BeautifulSoup(r.content, 'html.parser')
data = soup.find_all('div', class_='product')
# ... 50 more lines of hardcoded mess
It works once. Then the URL changes. Or you need to scrape a different site. You rewrite everything from scratch.
The Fix: CONFIG-Based Template
The solution is simple: separate configuration from logic. Put everything that might change at the top, keep the functions clean.
# ─── CONFIG — change only this section ───
CONFIG = {
'url': 'https://example.com',
'output_file': 'results.csv',
'timeout': 10,
'headers': {
'User-Agent': 'Mozilla/5.0 (compatible; MyBot/1.0)'
}
}
# ─────────────────────────────────────────
def get_page(url):
response = requests.get(url, headers=CONFIG['headers'], timeout=CONFIG['timeout'])
response.raise_for_status()
return BeautifulSoup(response.content, 'html.parser')
def extract_data(soup):
title = soup.title.string.strip() if soup.title else "N/A"
links = [a['href'] for a in soup.find_all('a', href=True)]
text = ' '.join(p.get_text(strip=True) for p in soup.find_all('p'))
return {'title': title, 'links': ' | '.join(links), 'text': text[:500]}
def save_to_csv(data, path):
import csv
with open(path, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=data.keys())
writer.writeheader()
writer.writerow(data)
Why This Pattern Works
- Change target URL → edit 1 line in CONFIG
- Different output file → edit 1 line in CONFIG
-
Add rate limiting → add
'delay': 2to CONFIG - Hand it to a teammate → they only need to read CONFIG
No hunting through 200 lines of code to find what to change.
Running It
pip install requests beautifulsoup4
python scraper.py
Output: a clean CSV with title, all links, and page text. Ready to open in Excel or feed into your next script.
What's Next
Once you have the base template, extending it is easy:
- Add pagination: loop through
page=1,page=2, etc. in CONFIG - Add error retries: wrap
get_page()in a retry decorator - Add scheduling: use
schedulelibrary to run it daily
I packaged the complete version (with error handling, CSV output, retry logic, and full comments) as a ready-to-run script. If you want to skip the setup: grab the full template here — it's $15 and runs out of the box.
Otherwise, the pattern above is enough to get you started. Happy scraping.
Top comments (0)