Over the years of web scraping for many clients, and over billions of pages scraped at DataHen, I realized that we kept on doing the same things over and over again with regards to scalability, unblockability and general problems that web scraping typically face.
So, I built Till, a companion tool that integrates with any scraper in 5 minutes, without much code changes.
It works as a man-in-the-middle proxy, that your scraper can connect to.
All you need to do is connect to Till via the proxy protocol, and Till handles things such as:
- User agent generation and randomization
- Proxy IP randomization
- Cookie management
- HTTP Caching
- HTTP Request interceptions
- Sticky Sessions
- Request Logging
When you use Till, you don't need to build many of the repetitive logics required to scale and unblock scrapers, you can simply focus on the main scraping steps/tasks itself.
Let me know of any feedback, or comments etc.
Here is the Github link. Please give it a star, if you find it useful.
And here is the product link
Thanks
Top comments (1)
Amazing article, It's impressive how it simplifies scalability and unblockability issues that many scrapers face. With features like user agent randomization and proxy IP management, Till takes the hassle out of repetitive tasks, letting you focus on scraping efficiently. Please do explore and checkout the Crawlbase and give your reviews