_____ _--_ _
| __ \ | _ \ | |
| |__) | _| |_) |__ _ __| | ___ ___ _ _ _ __
| ___/ | | | _ // _` | '__| |/ / _ \| | | | '__|
| | | |_| | | \ \ (_| | | | < (_) | |_| | |
|_| \__, |_| \_\__,_|_| |_|\_\___/ \__,_|_|
__/ |
|___/
Version: 2.0.0
Author: zinzied (zinzied@gmail.com)
π Py-Parkour: The Hybrid Scraper Framework
Py-Parkour is a lightweight automation utility designed to solve the biggest annoyances in modern web scraping:
- πͺ Cookie Consents: Detecting and destroying GDPR/modal popups.
- π§ Pagination: Auto-detecting "Next" buttons or infinite scroll.
- π Verification Gates: Generating temporary identities (Email/SMS) for signups.
- π» Hybrid Scraping: Break in with the browser, then steal the session for fast API calls.
- π‘ API Discovery: Automatically detect hidden JSON APIs.
It turns your scraper into a workflow automation platform.
π¦ Installation
pip install py-parkour[full]
Or for development:
pip install -r requirements.txt
playwright install
π How to Use It
1. The "Unified" Bot
The ParkourBot is your main entry point. It wraps a Playwright browser and gives you access to all gadgets.
import asyncio
from py_parkour import ParkourBot
async def main():
bot = ParkourBot(headless=False)
await bot.start()
await bot.goto("https://target-website.com")
# ... use gadgets here ...
await bot.close()
asyncio.run(main())
2. πͺ Gadget: Crusher (Cookie Bypasser)
Don't write brittle selectors for every "Accept Cookies" button.
await bot.crush_cookies()
3. π§ Gadget: Compass (Auto-Pagination)
Stop guessing if the site uses ?page=2 or a "Next >" button.
async for page_number in bot.crawl(max_pages=10):
print(f"Scraping Page {page_number}: {bot.current_url}")
4. π Gadget: Disguises (Temp Identity)
Need to sign up to view data? Generate a burner identity.
identity = await bot.identity.generate_identity(country="US")
print(f"Using email: {identity.email}")
code = await bot.identity.wait_for_code()
await bot.driver.page.fill("#otp-input", code)
5. β¨ The "Magic" Auto-Setup
Try to automate the entire signup flow (Experimental).
await bot.auto_setup_identity("https://example.com/signup")
6. π» Gadget: Shadow (Session Bridge) β NEW
Stop choosing between "fast" (requests) and "capable" (browser). Use both.
Break in with the browser, then steal the session for high-speed API calls.
# 1. Login with the browser
await bot.goto("https://target.com/login")
# ... do login stuff ...
# 2. Transfer the session to a fast aiohttp client
async with await bot.shadow.create_session() as session:
async with session.get("https://target.com/api/secret-data") as resp:
print(await resp.json())
7. π‘ Gadget: Radar (API Detector) β NEW
Why scrape HTML if there's a hidden API? Radar listens to background traffic.
await bot.goto("https://complex-spa-site.com")
# Check what we found
print(f"Latest JSON found: {bot.radar.latest_json}")
# Replay captured requests
for req in bot.radar.requests:
if "api/v1/users" in req['url']:
print(f"Found User API: {req['url']}")
π― Where to use it?
Py-Parkour is best for:
- Complex Scraping: Sites that require login or have heavy popups.
- QA Automation: Testing "User Registration" flows without using real email addresses.
- Bot Development: Quickly spinning up bots that need to pass "verify your email" checks.
- API Hunting: Discovering undocumented APIs behind SPAs.
π Architecture
- Core: Async Playwright wrapper.
-
Gadgets: Modular tools attached to the bot (
.crusher,.compass,.identity,.shadow,.radar).
Built with β€οΈ for Scrapers who hate boilerplate.
Top comments (4)
This Py-Parkour tool looks incredibly useful for my daily web scraping and bot development work! I especially love how it solves the pain point of handling email verification steps and undocumented APIs in SPAs. Could you share some basic examples of how to get started with the .compass or .radar gadgets? Iβm eager to test it out for my upcoming projects.
π§ Py-Parkour Gadgets Guide
This guide focuses on two powerful gadgets in the
py-parkourlibrary: Compass and Radar. These tools help you navigate complex websites and discover hidden APIs.1. π‘ Radar (.radar)
The Passive API Detector
What is it?
The Radar gadget automatically listens to background network traffic while your bot navigates. It specifically looks for JSON responses, which often indicate an underlying API that powers the website.
Why use it?
Modern websites (SPAs) often load data via APIs. Instead of parsing messy HTML with selectors, you can often just grab the clean JSON data that the Radar captured.
Basic Usage
The Radar starts automatically when you run
await bot.start().Advanced: Inspecting All Requests
You can also inspect the history of all JSON requests captured during the session. This is useful for finding specific API endpoints (e.g., user lists, product catalogs).
2. π§ Compass (.compass)
The Auto-Navigator
What is it?
The Compass gadget handles pagination for you. It tries to automatically detect how to get to the next page, whether it's by clicking a "Next" button or scrolling down (infinite scroll).
Strategies
The Compass uses two main strategies:
Basic Usage
The easiest way to use the Compass is through the
bot.crawl()helper methods.Manual Control
You can also access the compass directly if you need more granular control, though
bot.crawl()is recommended for most cases.π How they relate to the Library
Both
RadarandCompassare Gadgets attached to theParkourBot.bot.start()and attached tobot.radarandbot.compass.on("response")event to passively collect data without interrupting your flow.bot.crawl().They are designed to work together: you might use Compass to move to the next page, and Radar to collect the API data triggered by that navigation!
you are very good!
Nice little project! ππ₯
Creative ideas like this make learning Python way more fun.