Alex Spinov

Posted on Mar 26

10 Developer Tools I Use Every Day After Building 77 Web Scrapers

#webdev #programming #productivity #python

After publishing 600+ articles and building 77 web scrapers, I have a clear picture of which tools and APIs actually get used day after day. Here are the ones I keep coming back to.

1. httpx (Python HTTP Client)

Forget requests. httpx supports async, HTTP/2, and has a cleaner API:

import httpx

# Sync
resp = httpx.get("https://api.github.com/repos/encode/httpx")
print(resp.json()["stargazers_count"])

# Async
async with httpx.AsyncClient() as client:
    resp = await client.get("https://api.github.com/repos/encode/httpx")

Why I use it: Every scraper I build starts with httpx. It handles redirects, cookies, and timeouts better than requests.

2. jq (Command-Line JSON Processor)

The single most useful tool for working with API responses:

# Pretty print
curl -s https://api.github.com/users/torvalds | jq .

# Extract specific fields
curl -s https://api.github.com/users/torvalds | jq '{name, followers, repos: .public_repos}'

# Filter arrays
curl -s https://api.github.com/users/torvalds/repos | jq '[.[] | select(.stargazers_count > 1000)] | length'

Why I use it: I pipe every API response through jq first. It saves me from writing Python scripts for simple data exploration.

3. DuckDB (In-Process Analytics Database)

SQLite for analytics. Reads CSV, Parquet, and JSON directly:

-- Query a CSV file without importing
SELECT country, COUNT(*) as users
FROM read_csv_auto('users.csv')
GROUP BY country
ORDER BY users DESC
LIMIT 10;

-- Query JSON API response saved to file
SELECT title, score
FROM read_json_auto('hn_stories.json')
WHERE score > 100
ORDER BY score DESC;

Why I use it: When a scraper outputs 100K rows, I analyze them with DuckDB instead of loading everything into pandas.

4. GitHub Actions (Free CI/CD)

I run 8 scrapers on GitHub Actions for $0/month:

on:
  schedule:
    - cron: "0 */6 * * *"
jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install httpx
      - run: python scraper.py
      - run: |
          git add data/
          git diff --cached --quiet || git commit -m "data update"
          git push

Why I use it: 2,000 free minutes per month. No server to maintain.

5. SQLite + FTS5 (Full-Text Search)

Built-in full-text search that handles millions of documents:

import sqlite3

conn = sqlite3.connect("articles.db")
conn.execute("CREATE VIRTUAL TABLE articles USING fts5(title, body)")
conn.execute("INSERT INTO articles VALUES (?, ?)", ("How to scrape", "Tutorial about web scraping..."))

# Search
results = conn.execute("SELECT * FROM articles WHERE articles MATCH 'scraping'").fetchall()

Why I use it: For any project that needs search, I start with SQLite FTS5 before considering Elasticsearch.

6. Hacker News Firebase API

Real-time access to every HN story, comment, and user — no API key needed:

# Top stories
curl -s https://hacker-news.firebaseio.com/v0/topstories.json | jq '.[0:5]'

# Get a story
curl -s https://hacker-news.firebaseio.com/v0/item/1.json | jq .

Why I use it: I monitor HN for trending topics in my niche. When a relevant post hits the front page, I comment with useful context.

7. Telegram Bot API

The simplest notification system for any automated workflow:

import httpx

def notify(message: str):
    httpx.post(
        f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage",
        json={"chat_id": CHAT_ID, "text": message}
    )

# Use in any scraper
notify("Scraper completed: 1,234 items collected")

Why I use it: Every scraper, every cron job, every GitHub Action sends me a Telegram message on completion or failure.

8. Open-Meteo API

Weather data for any location, no API key:

curl -s "https://api.open-meteo.com/v1/forecast?latitude=55.75&longitude=37.62&current_weather=true" | jq .current_weather

Why I use it: Free, fast, no authentication. Perfect for any project that needs weather data.

9. ripgrep (rg)

grep but 10x faster. Essential for searching through scraped data:

# Search through all JSON files
rg "error" data/ --type json

# Count matches
rg -c "404" logs/

# Search with context
rg -C 2 "timeout" scraper_*.py

10. Makefiles

I put a Makefile in every project:

.PHONY: scrape test deploy

scrape:
    python scraper.py

test:
    python -m pytest tests/ -v

deploy:
    git push origin main

Why I use it: make scrape is easier to remember than python -m scrapers.main --config prod.yaml --output data/.

The Pattern

All 10 tools share the same traits:

Free (open source or generous free tier)
Single-purpose (do one thing well)
Composable (work together via stdin/stdout/files)
No vendor lock-in (can switch anytime)

The best developer tools are boring. They just work.

📧 spinov001@gmail.com — I build custom scrapers and data tools. Tell me what you need.

More: 130+ Web Scraping Tools | 150+ Free APIs
Also: Neon Free Postgres | Vercel Free API | Hetzner 4x More Server
NEW: I Ran an AI Agent for 16 Days — What Actually Works

Need data from the web without writing scrapers? Check my *Apify actors** — ready-made scrapers for HN, Reddit, LinkedIn, and 75+ more sites. Or email: spinov001@gmail.com*

DEV Community