DEV Community

Piotr
Piotr

Posted on

Web Crawling and RSS Reading Made Easy

Tired of building yet another RSS client or web crawler?

Don't worry - Crawler Buddy is here to save the day! This project makes it easy to crawl web pages and return digestible responses in JSON format.

Key Features:

  • No more reliance on external tools: Forget about yt-dlp or Beautiful Soup for link metadata extraction.
  • Standardized metadata: Get consistent fields like title, description, date_published, and more.
  • Bot protection? No problem: Access RSS feeds—even on sites with tricky bot protection—without custom HTTP wrappers.
  • Automatic feed detection: It can automatically discover RSS feed URLs for websites and YouTube channels in many cases.
  • Simplified data handling: Skip parsing RSS files. Just consume easy-to-use JSON.
  • Unified interface: Access all metadata from a single, simple interface.
  • Containerized Docker environment: Isolate problems from your host OS for seamless operation.
  • Scalability: Whether you're running a single server or multiple, Crawler Buddy fits your needs.
  • UTF-8 encoding: Say goodbye to encoding issues—everything is in UTF.

Available Crawlers:

  • RequestsCrawler: Python requests
  • CrawleeScript: Crawlee with BeautifulSoup
  • PlaywrightScript: Crawlee with Playwright
  • SeleniumUndetected: Undetected Selenium
  • SeleniumChromeHeadless: Selenium in headless mode
  • SeleniumChromeFull: Full Selenium mode
  • StealthRequestsCrawler: Stealthy requests

Want to learn more?
Check out the official repository: Crawler Buddy GitHub

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more