DEV Community

Zied
Zied

Posted on

Py-Parkour

  _____       _--_          _                   
 |  __ \     |  _ \        | |                  
 | |__) |   _| |_) |__ _ __| | ___ ___  _   _ _ __ 
 |  ___/ | | |  _ // _` | '__| |/ / _ \| | | | '__|
 | |   | |_| | | \ \ (_| | |  |   < (_) | |_| | |  
 |_|    \__, |_|  \_\__,_|_|  |_|\_\___/ \__,_|_|  
         __/ |                                     
        |___/                                      
Enter fullscreen mode Exit fullscreen mode

Version: 2.0.0

Author: zinzied (zinzied@gmail.com)

PyPI version

πŸƒ Py-Parkour: The Hybrid Scraper Framework

Py-Parkour is a lightweight automation utility designed to solve the biggest annoyances in modern web scraping:

  1. πŸͺ Cookie Consents: Detecting and destroying GDPR/modal popups.
  2. 🧭 Pagination: Auto-detecting "Next" buttons or infinite scroll.
  3. 🎭 Verification Gates: Generating temporary identities (Email/SMS) for signups.
  4. πŸ‘» Hybrid Scraping: Break in with the browser, then steal the session for fast API calls.
  5. πŸ“‘ API Discovery: Automatically detect hidden JSON APIs.

It turns your scraper into a workflow automation platform.


πŸ“¦ Installation

pip install py-parkour[full]
Enter fullscreen mode Exit fullscreen mode

Or for development:

pip install -r requirements.txt
playwright install
Enter fullscreen mode Exit fullscreen mode

πŸš€ How to Use It

1. The "Unified" Bot

The ParkourBot is your main entry point. It wraps a Playwright browser and gives you access to all gadgets.

import asyncio
from py_parkour import ParkourBot

async def main():
    bot = ParkourBot(headless=False)
    await bot.start()
    await bot.goto("https://target-website.com")
    # ... use gadgets here ...
    await bot.close()

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

2. πŸͺ Gadget: Crusher (Cookie Bypasser)

Don't write brittle selectors for every "Accept Cookies" button.

await bot.crush_cookies()
Enter fullscreen mode Exit fullscreen mode

3. 🧭 Gadget: Compass (Auto-Pagination)

Stop guessing if the site uses ?page=2 or a "Next >" button.

async for page_number in bot.crawl(max_pages=10):
    print(f"Scraping Page {page_number}: {bot.current_url}")
Enter fullscreen mode Exit fullscreen mode

4. 🎭 Gadget: Disguises (Temp Identity)

Need to sign up to view data? Generate a burner identity.

identity = await bot.identity.generate_identity(country="US")
print(f"Using email: {identity.email}")

code = await bot.identity.wait_for_code()
await bot.driver.page.fill("#otp-input", code)
Enter fullscreen mode Exit fullscreen mode

5. ✨ The "Magic" Auto-Setup

Try to automate the entire signup flow (Experimental).

await bot.auto_setup_identity("https://example.com/signup")
Enter fullscreen mode Exit fullscreen mode

6. πŸ‘» Gadget: Shadow (Session Bridge) ⭐ NEW

Stop choosing between "fast" (requests) and "capable" (browser). Use both.
Break in with the browser, then steal the session for high-speed API calls.

# 1. Login with the browser
await bot.goto("https://target.com/login")
# ... do login stuff ...

# 2. Transfer the session to a fast aiohttp client
async with await bot.shadow.create_session() as session:
    async with session.get("https://target.com/api/secret-data") as resp:
        print(await resp.json())
Enter fullscreen mode Exit fullscreen mode

7. πŸ“‘ Gadget: Radar (API Detector) ⭐ NEW

Why scrape HTML if there's a hidden API? Radar listens to background traffic.

await bot.goto("https://complex-spa-site.com")

# Check what we found
print(f"Latest JSON found: {bot.radar.latest_json}")

# Replay captured requests
for req in bot.radar.requests:
    if "api/v1/users" in req['url']:
        print(f"Found User API: {req['url']}")
Enter fullscreen mode Exit fullscreen mode

🎯 Where to use it?

Py-Parkour is best for:

  1. Complex Scraping: Sites that require login or have heavy popups.
  2. QA Automation: Testing "User Registration" flows without using real email addresses.
  3. Bot Development: Quickly spinning up bots that need to pass "verify your email" checks.
  4. API Hunting: Discovering undocumented APIs behind SPAs.

πŸ— Architecture

  • Core: Async Playwright wrapper.
  • Gadgets: Modular tools attached to the bot (.crusher, .compass, .identity, .shadow, .radar).

Built with ❀️ for Scrapers who hate boilerplate.

Top comments (1)

Collapse
 
shahrouzlogs profile image
Shahrouz Nikseresht

Nice little project! 🐍πŸ”₯

Creative ideas like this make learning Python way more fun.