Tired of building yet another RSS client or web crawler?
Don't worry - Crawler Buddy is here to save the day! This project makes it easy to crawl web pages and return digestible responses in JSON format.
Key Features:
- No more reliance on external tools: Forget about yt-dlp or Beautiful Soup for link metadata extraction.
- Standardized metadata: Get consistent fields like title, description, date_published, and more.
- Bot protection? No problem: Access RSS feeds—even on sites with tricky bot protection—without custom HTTP wrappers.
- Automatic feed detection: It can automatically discover RSS feed URLs for websites and YouTube channels in many cases.
- Simplified data handling: Skip parsing RSS files. Just consume easy-to-use JSON.
- Unified interface: Access all metadata from a single, simple interface.
- Containerized Docker environment: Isolate problems from your host OS for seamless operation.
- Scalability: Whether you're running a single server or multiple, Crawler Buddy fits your needs.
- UTF-8 encoding: Say goodbye to encoding issues—everything is in UTF.
Available Crawlers:
- RequestsCrawler: Python requests
- CrawleeScript: Crawlee with BeautifulSoup
- PlaywrightScript: Crawlee with Playwright
- SeleniumUndetected: Undetected Selenium
- SeleniumChromeHeadless: Selenium in headless mode
- SeleniumChromeFull: Full Selenium mode
- StealthRequestsCrawler: Stealthy requests
Want to learn more?
Check out the official repository: Crawler Buddy GitHub
Top comments (0)