The problem
You spend hours writing a scraper, run it, and immediately get a 403.
Or you build it with requests, only to realize the site needs JavaScript to render.
I got tired of this loop, so I built scrapalyser — a Python library that scans
any website before you write a single line of scraper code.
Install
pip install scrapalyser
Usage
import scrapalyser
report = scrapalyser.scan(
url="https://example.com",
output="txt",
lang="en",
)
What it detects
- Anti-bot: Cloudflare, DataDome, PerimeterX, Akamai, Kasada, reCAPTCHA, hCaptcha
- Tech stack: React, Vue, Angular, Next.js, Nuxt, WordPress, Shopify...
- JS required: so you know if requests is enough or if you need Playwright
- API endpoints: via CSP headers, inline scripts, or XHR interception (Playwright mode)
- robots.txt & sitemap
- Login wall: form, redirect, button, OAuth
Two engines
curl_cffi (default): fast, no browser, one HTTP request.
playwright: full browser with XHR interception and screenshot.
report = scrapalyser.scan(
url="https://example.com",
engine="playwright",
headless=False,
screenshot="capture.png",
)
If you get blocked
If the site returns a 403 or a captcha page, the report immediately tells you
which antibot blocked you — all other fields return "blocked by antibot".
No guessing.
Links
- GitHub: https://github.com/codesme34/scrapalyser
- PyPI: https://pypi.org/project/scrapalyser/ -YouTube (french): https://www.youtube.com/@CodesMe
Top comments (0)