DEV Community

Alex Spinov
Alex Spinov

Posted on

5 GitHub Actions Workflows I Use to Run Free Web Scrapers, Monitors, and Data Pipelines

GitHub gives you 2,000 free CI/CD minutes per month. Most developers use them only for tests and deploys. I use them to run web scrapers, data pipelines, and monitoring scripts.

Here are 5 workflows you can steal.

1. Daily Data Scraper

Scrape any public data source and commit results to your repo:

name: Daily Scrape
on:
  schedule:
    - cron: "0 6 * * *"  # 6 AM UTC daily
  workflow_dispatch:

jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install httpx
      - run: python scraper.py
      - name: Commit data
        run: |
          git config user.name "Bot"
          git config user.email "bot@example.com"
          git add data/
          git diff --cached --quiet || git commit -m "data: $(date -u +%Y-%m-%d)"
          git push
Enter fullscreen mode Exit fullscreen mode

Your scraped data lives in the repo's git history. Free version control for your data.

2. Multi-Source Aggregator

Scrape 5 sources in parallel using matrix strategy:

name: Aggregate Sources
on:
  schedule:
    - cron: "0 */4 * * *"  # Every 4 hours

jobs:
  scrape:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        source: [hackernews, reddit, producthunt, devto, lobsters]
      fail-fast: false
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install httpx
      - run: python scrapers/${{ matrix.source }}.py
      - uses: actions/upload-artifact@v4
        with:
          name: data-${{ matrix.source }}
          path: output/
Enter fullscreen mode Exit fullscreen mode

5 scrapers run simultaneously. Total time equals slowest scraper, not sum of all.

3. Website Change Detector

Get notified when a page changes:

name: Monitor Changes
on:
  schedule:
    - cron: "0 */2 * * *"

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check for changes
        run: |
          NEW_HASH=$(curl -s https://example.com/pricing | sha256sum | cut -d' ' -f1)
          OLD_HASH=$(cat hashes/pricing.txt 2>/dev/null || echo "none")

          if [ "$NEW_HASH" != "$OLD_HASH" ]; then
            echo "$NEW_HASH" > hashes/pricing.txt
            curl -s -X POST "https://api.telegram.org/bot${{ secrets.TG_TOKEN }}/sendMessage" \
              -d "chat_id=${{ secrets.TG_CHAT_ID }}&text=Pricing page changed!"
          fi
      - name: Commit
        run: |
          git config user.name "Bot"
          git config user.email "bot@example.com"
          git add hashes/
          git diff --cached --quiet || git commit -m "hash update" && git push
Enter fullscreen mode Exit fullscreen mode

4. API Health Monitor

Check if your APIs are responding:

name: Health Check
on:
  schedule:
    - cron: "*/15 * * * *"  # Every 15 minutes

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - name: Check endpoints
        run: |
          ENDPOINTS="https://api.yourapp.com/health https://api.yourapp.com/v2/status"
          for url in $ENDPOINTS; do
            STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$url")
            if [ "$STATUS" != "200" ]; then
              curl -s -X POST "https://api.telegram.org/bot${{ secrets.TG_TOKEN }}/sendMessage" \
                -d "chat_id=${{ secrets.TG_CHAT_ID }}&text=ALERT: $url returned $STATUS"
            fi
          done
Enter fullscreen mode Exit fullscreen mode

Free uptime monitoring. No Pingdom, no Datadog, no monthly bill.

5. Weekly Report Generator

Compile data from the week and send a summary:

name: Weekly Report
on:
  schedule:
    - cron: "0 18 * * 5"  # Friday 6 PM UTC

jobs:
  report:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install httpx jinja2
      - name: Generate report
        run: python generate_report.py
      - name: Send via Telegram
        run: |
          curl -s -X POST "https://api.telegram.org/bot${{ secrets.TG_TOKEN }}/sendMessage" \
            -d "chat_id=${{ secrets.TG_CHAT_ID }}" \
            --data-urlencode "text=$(cat report.txt)"
Enter fullscreen mode Exit fullscreen mode

Cost Calculator

Frequency Minutes/Run Monthly Minutes Monthly Cost
4x/day 2 min 240 Free
Hourly 1 min 720 Free
Every 15 min 0.5 min 1,440 Free
Total free 2,000 $0

You can run 8 scrapers at 4x/day completely within the free tier.

Tips

  1. Use workflow_dispatch — adds a manual "Run" button for debugging
  2. Set fail-fast: false — one failed scraper should not kill others
  3. Cache pip packages — saves 30-60 seconds per run
  4. Use secrets — never hardcode API keys in workflow files
  5. Check run history — GitHub keeps logs for 90 days

📧 spinov001@gmail.com — I automate data collection and monitoring pipelines.

Related: 77 Scrapers on a Schedule | 10 Dev Tools I Use Daily | 150+ Free APIs

Top comments (0)