DEV Community

Zied
Zied

Posted on

Py-Parkour

  _____       _--_          _                   
 |  __ \     |  _ \        | |                  
 | |__) |   _| |_) |__ _ __| | ___ ___  _   _ _ __ 
 |  ___/ | | |  _ // _` | '__| |/ / _ \| | | | '__|
 | |   | |_| | | \ \ (_| | |  |   < (_) | |_| | |  
 |_|    \__, |_|  \_\__,_|_|  |_|\_\___/ \__,_|_|  
         __/ |                                     
        |___/                                      
Enter fullscreen mode Exit fullscreen mode

Version: 2.0.0

Author: zinzied (zinzied@gmail.com)

PyPI version

πŸƒ Py-Parkour: The Hybrid Scraper Framework

Py-Parkour is a lightweight automation utility designed to solve the biggest annoyances in modern web scraping:

  1. πŸͺ Cookie Consents: Detecting and destroying GDPR/modal popups.
  2. 🧭 Pagination: Auto-detecting "Next" buttons or infinite scroll.
  3. 🎭 Verification Gates: Generating temporary identities (Email/SMS) for signups.
  4. πŸ‘» Hybrid Scraping: Break in with the browser, then steal the session for fast API calls.
  5. πŸ“‘ API Discovery: Automatically detect hidden JSON APIs.

It turns your scraper into a workflow automation platform.


πŸ“¦ Installation

pip install py-parkour[full]
Enter fullscreen mode Exit fullscreen mode

Or for development:

pip install -r requirements.txt
playwright install
Enter fullscreen mode Exit fullscreen mode

πŸš€ How to Use It

1. The "Unified" Bot

The ParkourBot is your main entry point. It wraps a Playwright browser and gives you access to all gadgets.

import asyncio
from py_parkour import ParkourBot

async def main():
    bot = ParkourBot(headless=False)
    await bot.start()
    await bot.goto("https://target-website.com")
    # ... use gadgets here ...
    await bot.close()

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

2. πŸͺ Gadget: Crusher (Cookie Bypasser)

Don't write brittle selectors for every "Accept Cookies" button.

await bot.crush_cookies()
Enter fullscreen mode Exit fullscreen mode

3. 🧭 Gadget: Compass (Auto-Pagination)

Stop guessing if the site uses ?page=2 or a "Next >" button.

async for page_number in bot.crawl(max_pages=10):
    print(f"Scraping Page {page_number}: {bot.current_url}")
Enter fullscreen mode Exit fullscreen mode

4. 🎭 Gadget: Disguises (Temp Identity)

Need to sign up to view data? Generate a burner identity.

identity = await bot.identity.generate_identity(country="US")
print(f"Using email: {identity.email}")

code = await bot.identity.wait_for_code()
await bot.driver.page.fill("#otp-input", code)
Enter fullscreen mode Exit fullscreen mode

5. ✨ The "Magic" Auto-Setup

Try to automate the entire signup flow (Experimental).

await bot.auto_setup_identity("https://example.com/signup")
Enter fullscreen mode Exit fullscreen mode

6. πŸ‘» Gadget: Shadow (Session Bridge) ⭐ NEW

Stop choosing between "fast" (requests) and "capable" (browser). Use both.
Break in with the browser, then steal the session for high-speed API calls.

# 1. Login with the browser
await bot.goto("https://target.com/login")
# ... do login stuff ...

# 2. Transfer the session to a fast aiohttp client
async with await bot.shadow.create_session() as session:
    async with session.get("https://target.com/api/secret-data") as resp:
        print(await resp.json())
Enter fullscreen mode Exit fullscreen mode

7. πŸ“‘ Gadget: Radar (API Detector) ⭐ NEW

Why scrape HTML if there's a hidden API? Radar listens to background traffic.

await bot.goto("https://complex-spa-site.com")

# Check what we found
print(f"Latest JSON found: {bot.radar.latest_json}")

# Replay captured requests
for req in bot.radar.requests:
    if "api/v1/users" in req['url']:
        print(f"Found User API: {req['url']}")
Enter fullscreen mode Exit fullscreen mode

🎯 Where to use it?

Py-Parkour is best for:

  1. Complex Scraping: Sites that require login or have heavy popups.
  2. QA Automation: Testing "User Registration" flows without using real email addresses.
  3. Bot Development: Quickly spinning up bots that need to pass "verify your email" checks.
  4. API Hunting: Discovering undocumented APIs behind SPAs.

πŸ— Architecture

  • Core: Async Playwright wrapper.
  • Gadgets: Modular tools attached to the bot (.crusher, .compass, .identity, .shadow, .radar).

Built with ❀️ for Scrapers who hate boilerplate.

Top comments (4)

Collapse
 
arrebol profile image
Aurora

This Py-Parkour tool looks incredibly useful for my daily web scraping and bot development work! I especially love how it solves the pain point of handling email verification steps and undocumented APIs in SPAs. Could you share some basic examples of how to get started with the .compass or .radar gadgets? I’m eager to test it out for my upcoming projects.

Collapse
 
zinzied profile image
Zied

🧭 Py-Parkour Gadgets Guide

This guide focuses on two powerful gadgets in the py-parkour library: Compass and Radar. These tools help you navigate complex websites and discover hidden APIs.

1. πŸ“‘ Radar (.radar)

The Passive API Detector

What is it?

The Radar gadget automatically listens to background network traffic while your bot navigates. It specifically looks for JSON responses, which often indicate an underlying API that powers the website.

Why use it?

Modern websites (SPAs) often load data via APIs. Instead of parsing messy HTML with selectors, you can often just grab the clean JSON data that the Radar captured.

Basic Usage

The Radar starts automatically when you run await bot.start().

import asyncio
from py_parkour import ParkourBot

async def example_radar():
    bot = ParkourBot(headless=True)
    await bot.start()

    print("Visiting a site...")
    # Navigate to a site that loads data dynamically
    await bot.goto("https://jsonplaceholder.typicode.com/posts/1")

    # Wait a moment for network requests to finish
    await asyncio.sleep(2)

    # Access the most recent JSON captured
    latest_data = bot.radar.latest_json
    if latest_data:
        print("πŸŽ‰ Capture Success!")
        print(f"Data: {latest_data}")
    else:
        print("No JSON found.")

    await bot.close()
Enter fullscreen mode Exit fullscreen mode

Advanced: Inspecting All Requests

You can also inspect the history of all JSON requests captured during the session. This is useful for finding specific API endpoints (e.g., user lists, product catalogs).

# After navigation...
for req in bot.radar.requests:
    print(f"URL: {req['url']}")
    print(f"Method: {req['method']}")
    print(f"Payload Preview: {req['payload_preview']}")
    print("---")
Enter fullscreen mode Exit fullscreen mode

2. 🧭 Compass (.compass)

The Auto-Navigator

What is it?

The Compass gadget handles pagination for you. It tries to automatically detect how to get to the next page, whether it's by clicking a "Next" button or scrolling down (infinite scroll).

Strategies

The Compass uses two main strategies:

  1. Keyword Matching: Looks for buttons with text like "Next", "Older", ">", ">>", etc.
  2. Infinite Scroll: Checks if the page height increases after scrolling to the bottom.

Basic Usage

The easiest way to use the Compass is through the bot.crawl() helper methods.

import asyncio
from py_parkour import ParkourBot

async def example_compass():
    bot = ParkourBot(headless=False)
    await bot.start()
    await bot.goto("https://books.toscrape.com/")

    # Scroll through up to 5 pages
    # The loop yields control back to you for each page found
    async for page_num in bot.crawl(max_pages=5):
        current_url = bot.current_url
        print(f"πŸ“ Scraping Page {page_num} at {current_url}")

        # ... Perform your scraping logic here (e.g. finding products) ...
        # items = await bot.driver.page.locator(".product_pod").all_text_contents()
        # print(f"Found {len(items)} items")

    await bot.close()
Enter fullscreen mode Exit fullscreen mode

Manual Control

You can also access the compass directly if you need more granular control, though bot.crawl() is recommended for most cases.

# Inside your main loop
await bot.compass.crawl(max_pages=3) 
Enter fullscreen mode Exit fullscreen mode

πŸ”— How they relate to the Library

Both Radar and Compass are Gadgets attached to the ParkourBot.

  • Initialization: They are initialized in bot.start() and attached to bot.radar and bot.compass.
  • Integration:
    • Radar hooks into the Playwright page's on("response") event to passively collect data without interrupting your flow.
    • Compass actively interacts with the page (clicking or scrolling) when you iterate through it using bot.crawl().

They are designed to work together: you might use Compass to move to the next page, and Radar to collect the API data triggered by that navigation!

Collapse
 
213123213213 profile image
123

you are very good!

Collapse
 
shahrouzlogs profile image
Shahrouz Nikseresht

Nice little project! 🐍πŸ”₯

Creative ideas like this make learning Python way more fun.