Websites change all the time — pricing pages, docs, terms of service. I wanted a simple CLI to tell me what changed, like git log but for any URL.
So I built crawldiff.
How it works
pip install crawldiff
# Snapshot a site
crawldiff crawl https://stripe.com/pricing
# Come back later — see what changed
crawldiff diff https://stripe.com/pricing --since 7d
It uses Cloudflare's new /crawl endpoint to snapshot pages, stores everything locally in SQLite, and produces git-style unified diffs — with optional AI summaries via Claude, GPT, or Cloudflare Workers AI.
Features
- Git-style diffs — colored unified diffs in your terminal
- AI summaries — "Pricing increased from $25 to $30, new Enterprise tier added"
-
Watch mode —
crawldiff watch https://competitor.com --every 1h - Multiple outputs — terminal, JSON (pipe to jq/Slack), Markdown reports
-
Incremental crawling — only fetches changed pages via Cloudflare's
modifiedSince -
Local storage — everything in SQLite at
~/.crawldiff/
The tech stack
- Python 3.12, typer, rich, httpx
- Cloudflare Browser Rendering /crawl API
- difflib for unified diffs
- SQLite for local snapshot storage
- 96 tests, mypy strict, CI on GitHub Actions
Why not existing tools?
Most website monitoring tools are SaaS dashboards built for marketing teams. crawldiff is for developers — it's a CLI, it diffs like git, it pipes to anything, and it stores everything locally.
Only requirement is a free Cloudflare account.
GitHub: github.com/GeoRouv/crawldiff
PyPI: pip install crawldiff
Happy to answer any questions!
Top comments (0)