DEV Community

Cover image for Build a Production-Ready Web Scraper in Python (Anti-Detection Included)
Xinglin Ming
Xinglin Ming

Posted on

Build a Production-Ready Web Scraper in Python (Anti-Detection Included)

Build a Production-Ready Web Scraper in Python (Anti-Detection Included)

Web scraping is one of the most valuable Python skills. Here is a production-ready scraper with anti-detection.

The Complete Web Scraper

import requests, time, random
from bs4 import BeautifulSoup
import pandas as pd

class SmartScraper:
    USER_AGENTS = [
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/120.0",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) Safari/605.1",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Firefox/121.0",
    ]
    def __init__(self):
        self.session = requests.Session()
        self.session.headers["User-Agent"] = random.choice(self.USER_AGENTS)
    def scrape(self, url, delay=2):
        time.sleep(delay + random.random())
        r = self.session.get(url, timeout=30)
        return r.text
    def scrape_to_csv(self, urls, selector, filename):
        results = []
        for url in urls:
            soup = BeautifulSoup(self.scrape(url), "html.parser")
            for el in soup.select(selector):
                results.append({"url": url, "text": el.text.strip()})
        pd.DataFrame(results).to_csv(filename, index=False)
        return len(results)
Enter fullscreen mode Exit fullscreen mode

Features

  • Auto-rotate User Agents
  • Configurable delay
  • CSV/JSON/Excel export
  • Error handling and retry
  • Cookie/session management

Follow for more Python automation!

Top comments (0)