Every time I start a new scraping project, I rebuild the same things: retry logic, rate limiting, proxy rotation, error tracking.
So I built a template. Now I clone it and start writing selectors in 5 minutes.
python-web-scraper-template — open source, MIT licensed.
What's Inside
- Async scraping with aiohttp (10x faster than requests)
- Exponential backoff retries (don't get banned)
- Rate limiting (configurable requests/sec)
- Proxy rotation (round-robin through proxy list)
- User-Agent rotation (5 realistic browser UAs)
- Pydantic models (validate data before export)
- 4 exporters — CSV, JSON, SQLite, PostgreSQL
- Docker support (run anywhere)
- Error tracking (success rate, error breakdown)
Quick Start
git clone https://github.com/spinov001-art/python-web-scraper-template.git
cd python-web-scraper-template
pip install -r requirements.txt
python scraper.py --url "https://example.com" --output results.csv
How to Customize
The key file is scraper.py. Override the parse() method:
from scraper import Scraper
from models import Product
class ProductScraper(Scraper):
async def parse(self, html, url):
soup = self.get_soup(html)
products = []
for item in soup.select('.product-card'):
products.append({
"name": item.select_one('.title').text.strip(),
"price": item.select_one('.price').text,
"url": item.select_one('a')['href'],
})
return products
scraper = ProductScraper(
urls=["https://shop.example.com/page/1"],
rate_limit=2,
output="products.csv"
)
scraper.run()
That's it. The template handles retries, rate limiting, error tracking, and export.
Architecture
scraper.py — Main scraper (customize parse() method)
config.py — Rate limits, timeouts, concurrency
middleware.py — Rate limiter, retry, proxy rotation
models.py — Pydantic data models (Product, Job, Article)
exporters.py — CSV, JSON, SQLite, PostgreSQL exporters
Dockerfile — Container support
Why Not Just Use Scrapy?
Scrapy is excellent for large-scale crawling. But for targeted scraping of 1-10 sites:
- Scrapy has a steep learning curve (middlewares, pipelines, settings)
- This template is 200 lines total — you can read it in 10 minutes
- aiohttp gives you the same async performance
- You control everything — no framework magic
If you need crawling 10,000+ pages with sitemap discovery, use Scrapy. For everything else, a clean template is faster to customize.
Related Tools
Building a scraper? You'll need these:
- Awesome Web Scraping 2026 — 250+ scraping tools, proxies, anti-detection
- Awesome Free APIs 2026 — 150+ APIs without auth keys
- Awesome Data Engineering 2026 — 150+ tools for the pipeline after scraping
What do you scrape? I'm curious what projects people are building. Drop a comment with your use case.
Star the repo if it's useful: github.com/spinov001-art/python-web-scraper-template
Need a custom scraper built? I maintain 285+ repos and 77 Apify actors.
spinov001-art.github.io | Spinov001@gmail.com
Top comments (0)