I built a web scraping companion tool to instantly make any scrapers scalable and unblockable

#webscraping #scrapingtools

Over the years of web scraping for many clients, and over billions of pages scraped at DataHen, I realized that we kept on doing the same things over and over again with regards to scalability, unblockability and general problems that web scraping typically face.

So, I built Till, a companion tool that integrates with any scraper in 5 minutes, without much code changes.

It works as a man-in-the-middle proxy, that your scraper can connect to.

All you need to do is connect to Till via the proxy protocol, and Till handles things such as:

User agent generation and randomization
Proxy IP randomization
Cookie management
HTTP Caching
HTTP Request interceptions
Sticky Sessions
Request Logging

When you use Till, you don't need to build many of the repetitive logics required to scale and unblock scrapers, you can simply focus on the main scraping steps/tasks itself.

Let me know of any feedback, or comments etc.
Here is the Github link. Please give it a star, if you find it useful.
And here is the product link

Thanks

Top comments (1)

Crawlbase • Mar 4 '24 • Edited

Amazing article, It's impressive how it simplifies scalability and unblockability issues that many scrapers face. With features like user agent randomization and proxy IP management, Till takes the hassle out of repetitive tasks, letting you focus on scraping efficiently. Please do explore and checkout the Crawlbase and give your reviews