DEV Community

Alex Spinov
Alex Spinov

Posted on

How I Run Web Scrapers for Free Using GitHub Actions (Complete Setup)

I needed to scrape pricing data from 5 websites every day. A VPS would cost $5-20/month. GitHub Actions costs $0.

Here's my exact setup — including cron scheduling, data storage, and error alerts.


Why GitHub Actions for Scraping?

  • Free: 2,000 minutes/month on free tier (enough for most scrapers)
  • Scheduled: Cron syntax, runs automatically
  • No server: No VPS, no Docker, no deployment
  • Built-in storage: Commit results to the repo
  • Error alerts: GitHub notifies you if a run fails

Step 1: The Scraper Script

Create scraper.py in your repo:

import httpx
import json
from datetime import datetime
from pathlib import Path

def scrape_prices():
    targets = [
        {'name': 'Product A', 'url': 'https://api.example.com/product/123'},
        {'name': 'Product B', 'url': 'https://api.example.com/product/456'},
    ]

    results = []
    for target in targets:
        try:
            response = httpx.get(target['url'], timeout=30)
            data = response.json()
            results.append({
                'name': target['name'],
                'price': data.get('price'),
                'timestamp': datetime.utcnow().isoformat(),
            })
        except Exception as e:
            results.append({
                'name': target['name'],
                'error': str(e),
                'timestamp': datetime.utcnow().isoformat(),
            })

    # Save to data/YYYY-MM-DD.json
    Path('data').mkdir(exist_ok=True)
    filename = f"data/{datetime.utcnow().strftime('%Y-%m-%d')}.json"

    existing = []
    if Path(filename).exists():
        existing = json.loads(Path(filename).read_text())

    existing.extend(results)
    Path(filename).write_text(json.dumps(existing, indent=2))
    print(f"Saved {len(results)} results to {filename}")

if __name__ == '__main__':
    scrape_prices()
Enter fullscreen mode Exit fullscreen mode

Step 2: GitHub Actions Workflow

Create .github/workflows/scrape.yml:

name: Daily Scraper

on:
  schedule:
    - cron: '0 8 * * *'    # Every day at 8 AM UTC
    - cron: '0 20 * * *'   # Every day at 8 PM UTC
  workflow_dispatch:         # Manual trigger button

jobs:
  scrape:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: pip install httpx

      - name: Run scraper
        run: python scraper.py

      - name: Commit results
        run: |
          git config user.name 'GitHub Actions'
          git config user.email 'actions@github.com'
          git add data/
          git diff --staged --quiet || git commit -m "Data update $(date -u +'%Y-%m-%d %H:%M')"
          git push
Enter fullscreen mode Exit fullscreen mode

Key line: git diff --staged --quiet || git commit — only commits if there are actual changes.


Step 3: Price Change Alerts

Add a comparison step to detect changes:

def check_price_changes():
    today = datetime.utcnow().strftime('%Y-%m-%d')
    yesterday = (datetime.utcnow() - timedelta(days=1)).strftime('%Y-%m-%d')

    today_file = Path(f'data/{today}.json')
    yesterday_file = Path(f'data/{yesterday}.json')

    if not yesterday_file.exists():
        return

    today_prices = {r['name']: r['price'] for r in json.loads(today_file.read_text()) if 'price' in r}
    yesterday_prices = {r['name']: r['price'] for r in json.loads(yesterday_file.read_text()) if 'price' in r}

    for name, price in today_prices.items():
        old_price = yesterday_prices.get(name)
        if old_price and price != old_price:
            change = ((price - old_price) / old_price) * 100
            print(f"ALERT: {name} changed {change:+.1f}% (${old_price} → ${price})")
Enter fullscreen mode Exit fullscreen mode

Step 4: Add Browser Scraping (Optional)

For JavaScript-heavy sites, add Playwright:

      - name: Install Playwright
        run: |
          pip install playwright
          playwright install chromium

      - name: Run browser scraper
        run: python browser_scraper.py
Enter fullscreen mode Exit fullscreen mode
from playwright.sync_api import sync_playwright

def scrape_with_browser(url):
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()
        page.goto(url, wait_until='networkidle')

        price = page.locator('.price').text_content()
        browser.close()
        return price
Enter fullscreen mode Exit fullscreen mode

Cost Breakdown

What GitHub Actions VPS Alternative
Monthly cost $0 $5-20
Runs per day 2 (cron) + manual Unlimited
Minutes per run ~1-2 min N/A
Monthly usage ~60 min / 2000 free 720 hrs
Setup time 5 minutes 30+ minutes

For most scraping projects, GitHub Actions is the better choice. You only need a VPS when you're scraping >100 pages per run or need sub-hourly scheduling.


Common Pitfalls

  1. Cron timezone: GitHub Actions cron uses UTC. Convert your local time.
  2. Rate limits: Don't run more than every 15 minutes — GitHub may throttle.
  3. Secrets: Never hardcode API keys. Use Settings → Secrets → Actions.
  4. Large files: Don't commit 100MB+ JSON files. Use Git LFS or external storage.
  5. Failing silently: Add continue-on-error: false and check notification settings.

More scraping tools and infrastructure: awesome-web-scraping-2026 — 130+ tools organized by category.

Are you running scrapers on GitHub Actions? What's your setup? Comments below 👇

Top comments (0)