DEV Community

idakballardp
idakballardp

Posted on

Zyte Proxy: Smart rotating proxy for web scraping

Struggling with managing your proxies when Web Scraping? Try Zyte! The Zyte by developed by Scrapinghub.com

Let’s face it, managing your proxy pool is an absolute pain! Nothing annoys developers more than crawlers failing because their proxies are continuously getting banned.

Not only do you constantly find yourself firefighting proxy fires, the people who rely on this web data just get increasingly frustrated with you because of the unreliability of the data feed.

We were in the same boat for years, until we hit our breaking point and decided to solve this problem forever.

At the time, Scrapinghub was about 3 years in business, providing web scraping consultancy services to companies looking to outsource their data extraction.

Then along came this one project…

The client wanted us to build a web scraping infrastructure to scape product data from 20 e-commerce sites, about 1 million requests per day. Which at the time was a big deal!.

Everything started off great. We developed the spiders, done a number of pilot crawls and delivered the data to the customer.

However, we ran into serious problems scaling the crawls.

Although our spiders were well designed and configured to crawl at a polite speed, when we moved the project from proof of concept to production our proxies we being banned at an alarming rate.

Eventually, it got to the point that we couldn’t scale the crawl anymore as we couldn’t put out the proxy fires fast enough.

Initially, we told the client that we’d have the issue fixed in 1 or 2 days “as it was just a matter of swapping out the banned IPs”.

However, the days kept ticking by and we still hadn’t found a permanent solution.

Finally, nearly a month later. We fixed it!

The solution…

We stopped focusing on the underlying IPs and put all our energy into intelligently managing the IPs so that we could scrape reliably without the fear of being banned.

This breakthrough was a game-changer for us. With this new proxy management layer, we were able to scale our crawls nearly 100X and completely remove the headache of managing proxies.

This new proxy management layer would automatically select the best proxy to use for the target website and manage all the proxy rotation, throttling, blacklist, etc. ensuring that we could reliably extract the data we need.

All without any manual intervention from our engineers!

As we continued to scale, our customers increasingly were asking us how were we achieving such reliability with our proxies.

So in 2012, we decided to make this technology available to everyone in the form of Crawlera.

Zyte: The smartest rotating proxy for web scraping

Specially designed for web scraping, zyte allows you to crawl quickly and reliably, managing thousands of proxies internally, so you don’t have to. You never need to rotate a proxy again.

Since then zyte has undergone numerous redesigns and improvements to keep pace with the changes in web scraping technologies and cope with the ever more complex challenges experienced when scraping the web.

Top comments (1)

Collapse
 
crawlbase profile image
Crawlbase • Edited

Zyte is great for proxy management but if you want a cheaper proxy management service then I'd recommend Crawlbase. It has more than 2M rotating proxies. Say goodbye to manual rotation with Crawlbase Smart Proxy. 🚀 #WebScraping #ProxyManagement #Crawlbase