DEV Community

Alex Spinov
Alex Spinov

Posted on

I Built a Production Web Scraper Template That Handles Everything (Open Source)

Every time I start a new scraping project, I rebuild the same things: retry logic, rate limiting, proxy rotation, error tracking.

So I built a template. Now I clone it and start writing selectors in 5 minutes.

python-web-scraper-template — open source, MIT licensed.


What's Inside

  • Async scraping with aiohttp (10x faster than requests)
  • Exponential backoff retries (don't get banned)
  • Rate limiting (configurable requests/sec)
  • Proxy rotation (round-robin through proxy list)
  • User-Agent rotation (5 realistic browser UAs)
  • Pydantic models (validate data before export)
  • 4 exporters — CSV, JSON, SQLite, PostgreSQL
  • Docker support (run anywhere)
  • Error tracking (success rate, error breakdown)

Quick Start

git clone https://github.com/spinov001-art/python-web-scraper-template.git
cd python-web-scraper-template
pip install -r requirements.txt
python scraper.py --url "https://example.com" --output results.csv
Enter fullscreen mode Exit fullscreen mode

How to Customize

The key file is scraper.py. Override the parse() method:

from scraper import Scraper
from models import Product

class ProductScraper(Scraper):
    async def parse(self, html, url):
        soup = self.get_soup(html)
        products = []
        for item in soup.select('.product-card'):
            products.append({
                "name": item.select_one('.title').text.strip(),
                "price": item.select_one('.price').text,
                "url": item.select_one('a')['href'],
            })
        return products

scraper = ProductScraper(
    urls=["https://shop.example.com/page/1"],
    rate_limit=2,
    output="products.csv"
)
scraper.run()
Enter fullscreen mode Exit fullscreen mode

That's it. The template handles retries, rate limiting, error tracking, and export.

Architecture

scraper.py      — Main scraper (customize parse() method)
config.py       — Rate limits, timeouts, concurrency
middleware.py   — Rate limiter, retry, proxy rotation
models.py       — Pydantic data models (Product, Job, Article)
exporters.py    — CSV, JSON, SQLite, PostgreSQL exporters
Dockerfile      — Container support
Enter fullscreen mode Exit fullscreen mode

Why Not Just Use Scrapy?

Scrapy is excellent for large-scale crawling. But for targeted scraping of 1-10 sites:

  • Scrapy has a steep learning curve (middlewares, pipelines, settings)
  • This template is 200 lines total — you can read it in 10 minutes
  • aiohttp gives you the same async performance
  • You control everything — no framework magic

If you need crawling 10,000+ pages with sitemap discovery, use Scrapy. For everything else, a clean template is faster to customize.

Related Tools

Building a scraper? You'll need these:


What do you scrape? I'm curious what projects people are building. Drop a comment with your use case.

Star the repo if it's useful: github.com/spinov001-art/python-web-scraper-template


Need a custom scraper built? I maintain 285+ repos and 77 Apify actors.
spinov001-art.github.io | Spinov001@gmail.com

Top comments (0)