Web Crawling and RSS Reading Made Easy

#webscraping #rss #python

Tired of building yet another RSS client or web crawler?

Don't worry - Crawler Buddy is here to save the day! This project makes it easy to crawl web pages and return digestible responses in JSON format.

Key Features:

No more reliance on external tools: Forget about yt-dlp or Beautiful Soup for link metadata extraction.
Standardized metadata: Get consistent fields like title, description, date_published, and more.
Bot protection? No problem: Access RSS feeds—even on sites with tricky bot protection—without custom HTTP wrappers.
Automatic feed detection: It can automatically discover RSS feed URLs for websites and YouTube channels in many cases.
Simplified data handling: Skip parsing RSS files. Just consume easy-to-use JSON.
Unified interface: Access all metadata from a single, simple interface.
Containerized Docker environment: Isolate problems from your host OS for seamless operation.
Scalability: Whether you're running a single server or multiple, Crawler Buddy fits your needs.
UTF-8 encoding: Say goodbye to encoding issues—everything is in UTF.

Available Crawlers: