_____ _--_ _
| __ \ | _ \ | |
| |__) | _| |_) |__ _ __| | ___ ___ _ _ _ __
| ___/ | | | _ // _` | '__| |/ / _ \| | | | '__|
| | | |_| | | \ \ (_| | | | < (_) | |_| | |
|_| \__, |_| \_\__,_|_| |_|\_\___/ \__,_|_|
__/ |
|___/
Version: 2.0.0
Author: zinzied (zinzied@gmail.com)
π Py-Parkour: The Hybrid Scraper Framework
Py-Parkour is a lightweight automation utility designed to solve the biggest annoyances in modern web scraping:
- πͺ Cookie Consents: Detecting and destroying GDPR/modal popups.
- π§ Pagination: Auto-detecting "Next" buttons or infinite scroll.
- π Verification Gates: Generating temporary identities (Email/SMS) for signups.
- π» Hybrid Scraping: Break in with the browser, then steal the session for fast API calls.
- π‘ API Discovery: Automatically detect hidden JSON APIs.
It turns your scraper into a workflow automation platform.
π¦ Installation
pip install py-parkour[full]
Or for development:
pip install -r requirements.txt
playwright install
π How to Use It
1. The "Unified" Bot
The ParkourBot is your main entry point. It wraps a Playwright browser and gives you access to all gadgets.
import asyncio
from py_parkour import ParkourBot
async def main():
bot = ParkourBot(headless=False)
await bot.start()
await bot.goto("https://target-website.com")
# ... use gadgets here ...
await bot.close()
asyncio.run(main())
2. πͺ Gadget: Crusher (Cookie Bypasser)
Don't write brittle selectors for every "Accept Cookies" button.
await bot.crush_cookies()
3. π§ Gadget: Compass (Auto-Pagination)
Stop guessing if the site uses ?page=2 or a "Next >" button.
async for page_number in bot.crawl(max_pages=10):
print(f"Scraping Page {page_number}: {bot.current_url}")
4. π Gadget: Disguises (Temp Identity)
Need to sign up to view data? Generate a burner identity.
identity = await bot.identity.generate_identity(country="US")
print(f"Using email: {identity.email}")
code = await bot.identity.wait_for_code()
await bot.driver.page.fill("#otp-input", code)
5. β¨ The "Magic" Auto-Setup
Try to automate the entire signup flow (Experimental).
await bot.auto_setup_identity("https://example.com/signup")
6. π» Gadget: Shadow (Session Bridge) β NEW
Stop choosing between "fast" (requests) and "capable" (browser). Use both.
Break in with the browser, then steal the session for high-speed API calls.
# 1. Login with the browser
await bot.goto("https://target.com/login")
# ... do login stuff ...
# 2. Transfer the session to a fast aiohttp client
async with await bot.shadow.create_session() as session:
async with session.get("https://target.com/api/secret-data") as resp:
print(await resp.json())
7. π‘ Gadget: Radar (API Detector) β NEW
Why scrape HTML if there's a hidden API? Radar listens to background traffic.
await bot.goto("https://complex-spa-site.com")
# Check what we found
print(f"Latest JSON found: {bot.radar.latest_json}")
# Replay captured requests
for req in bot.radar.requests:
if "api/v1/users" in req['url']:
print(f"Found User API: {req['url']}")
π― Where to use it?
Py-Parkour is best for:
- Complex Scraping: Sites that require login or have heavy popups.
- QA Automation: Testing "User Registration" flows without using real email addresses.
- Bot Development: Quickly spinning up bots that need to pass "verify your email" checks.
- API Hunting: Discovering undocumented APIs behind SPAs.
π Architecture
- Core: Async Playwright wrapper.
-
Gadgets: Modular tools attached to the bot (
.crusher,.compass,.identity,.shadow,.radar).
Built with β€οΈ for Scrapers who hate boilerplate.
Top comments (1)
Nice little project! ππ₯
Creative ideas like this make learning Python way more fun.