DEV Community

Discussion on: How To Scrape Amazon at Scale With Python Scrapy, And Never Get Banned

Collapse
 
patarapolw profile image
Pacharapol Withayasakpunt • Edited

As discussed, at the start of this article Scraper API is a proxy API designed to take the hassle out of using web scraping proxies.

This is probably the most important thing for web scraping website that doesn't have robots.txt; or you want to go beyond that (therefore proxy rotating and User Agent spoofing).

I can see that web scraping is good when the web admin does not provide a public API, but as an admin myself, I can see that security and server load control comes first, even rather than access by end users (therefore poor human user experience sometimes).

I can see that there is Javascript rendering as well, which is nice for web automation, like handling JavaScript forms.

429 responses

When without proxy, this is as simple as knowing how to rate limit, though. This is very important when you access a public API as well. (That is web admin totally allows you to access, but that don't want their server overloaded. Which is not web scraping.)

I very recently have to sent ~500 PUT requests (not GET) to the API server, but I still have to wait 10 minutes for them to finish...