I Built a Production Web Scraper Template That Handles Everything (Open Source)

#tutorial #python #opensource #webdev

Every time I start a new scraping project, I rebuild the same things: retry logic, rate limiting, proxy rotation, error tracking.

So I built a template. Now I clone it and start writing selectors in 5 minutes.

python-web-scraper-template — open source, MIT licensed.

What's Inside

Async scraping with aiohttp (10x faster than requests)
Exponential backoff retries (don't get banned)
Rate limiting (configurable requests/sec)
Proxy rotation (round-robin through proxy list)
User-Agent rotation (5 realistic browser UAs)
Pydantic models (validate data before export)
4 exporters — CSV, JSON, SQLite, PostgreSQL
Docker support (run anywhere)
Error tracking (success rate, error breakdown)

Quick Start

git clone https://github.com/spinov001-art/python-web-scraper-template.git
cd python-web-scraper-template
pip install -r requirements.txt
python scraper.py --url "https://example.com" --output results.csv

How to Customize

The key file is scraper.py. Override the parse() method:

from scraper import Scraper
from models import Product

class ProductScraper(Scraper):
    async def parse(self, html, url):
        soup = self.get_soup(html)
        products = []
        for item in soup.select('.product-card'):
            products.append({
                "name": item.select_one('.title').text.strip(),
                "price": item.select_one('.price').text,
                "url": item.select_one('a')['href'],
            })
        return products

scraper = ProductScraper(
    urls=["https://shop.example.com/page/1"],
    rate_limit=2,
    output="products.csv"
)
scraper.run()

That's it. The template handles retries, rate limiting, error tracking, and export.

Architecture

scraper.py      — Main scraper (customize parse() method)
config.py       — Rate limits, timeouts, concurrency
middleware.py   — Rate limiter, retry, proxy rotation
models.py       — Pydantic data models (Product, Job, Article)
exporters.py    — CSV, JSON, SQLite, PostgreSQL exporters
Dockerfile      — Container support

Why Not Just Use Scrapy?

Scrapy is excellent for large-scale crawling. But for targeted scraping of 1-10 sites:

Scrapy has a steep learning curve (middlewares, pipelines, settings)
This template is 200 lines total — you can read it in 10 minutes
aiohttp gives you the same async performance
You control everything — no framework magic

If you need crawling 10,000+ pages with sitemap discovery, use Scrapy. For everything else, a clean template is faster to customize.

Related Tools

Building a scraper? You'll need these:

Awesome Web Scraping 2026 — 250+ scraping tools, proxies, anti-detection
Awesome Free APIs 2026 — 150+ APIs without auth keys
Awesome Data Engineering 2026 — 150+ tools for the pipeline after scraping

What do you scrape? I'm curious what projects people are building. Drop a comment with your use case.

Star the repo if it's useful: github.com/spinov001-art/python-web-scraper-template

Need a custom scraper built? I maintain 285+ repos and 77 Apify actors.
spinov001-art.github.io | Spinov001@gmail.com

Need data from the web without writing scrapers? Check my *Apify actors** — ready-made scrapers for HN, Reddit, LinkedIn, and 75+ more sites. Or email: spinov001@gmail.com*

DEV Community