DEV Community

Piotr
Piotr

Posted on

Web Crawling and RSS Reading Made Easy

Tired of building yet another RSS client or web crawler?

Don't worry - Crawler Buddy is here to save the day! This project makes it easy to crawl web pages and return digestible responses in JSON format.

Key Features:

  • No more reliance on external tools: Forget about yt-dlp or Beautiful Soup for link metadata extraction.
  • Standardized metadata: Get consistent fields like title, description, date_published, and more.
  • Bot protection? No problem: Access RSS feeds—even on sites with tricky bot protection—without custom HTTP wrappers.
  • Automatic feed detection: It can automatically discover RSS feed URLs for websites and YouTube channels in many cases.
  • Simplified data handling: Skip parsing RSS files. Just consume easy-to-use JSON.
  • Unified interface: Access all metadata from a single, simple interface.
  • Containerized Docker environment: Isolate problems from your host OS for seamless operation.
  • Scalability: Whether you're running a single server or multiple, Crawler Buddy fits your needs.
  • UTF-8 encoding: Say goodbye to encoding issues—everything is in UTF.

Available Crawlers:

  • RequestsCrawler: Python requests
  • CrawleeScript: Crawlee with BeautifulSoup
  • PlaywrightScript: Crawlee with Playwright
  • SeleniumUndetected: Undetected Selenium
  • SeleniumChromeHeadless: Selenium in headless mode
  • SeleniumChromeFull: Full Selenium mode
  • StealthRequestsCrawler: Stealthy requests

Want to learn more?
Check out the official repository: Crawler Buddy GitHub

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more