DEV Community

Vasiliy
Vasiliy

Posted on

Introducing QCrawl — A Modern Async Web Crawler Framework for Python

Hi everyone, I’ve released an open-source project I’ve been building: https://github.com/crawlcore/qcrawl

qCrawl features

  • Async architecture - High-performance concurrent crawling based on asyncio
  • Performance optimized - Queue backend on Redis with direct delivery, messagepack serialization, connection pooling, DNS caching
  • Powerful parsing - CSS/XPath selectors with lxml
  • Middleware system - Customizable request/response processing
  • Flexible export - Multiple output formats including JSON, CSV, XML
  • Flexible queue backends - Memory or Redis-based (+disk) schedulers for different scale requirements
  • Item pipelines - Data transformation, validation, and processing pipeline
  • Pluggable downloaders - HTTP (aiohttp), Camoufox (stealth browser) for JavaScript rendering and anti-bot evasion

If it is something you find interesting, I’d really appreciate:

  • early technical feedback
  • a star ⭐ on GitHub to help with visibility.

Thank you!

Top comments (0)