Over the years of web scraping for many clients, and over billions of pages scraped at DataHen, I realized that we kept on doing the same things over and over again with regards to scalability, unblockability and general problems that web scraping typically face.
So, I built Till, a companion tool that integrates with any scraper in 5 minutes, without much code changes.
All you need to do is connect to Till via the proxy protocol, and Till handles things such as:
- User agent generation and randomization
- Proxy IP randomization
- Cookie management
- HTTP Caching
- HTTP Request interceptions
- Sticky Sessions
- Request Logging
When you use Till, you don't need to build many of the repetitive logics required to scale and unblock scrapers, you can simply focus on the main scraping steps/tasks itself.