DEV Community

Discussion on: DOs and DON'Ts of Web Scraping

Collapse
 
baukereg profile image
Bauke Regnerus

Great article, thanks.

I'm working on a scraper that gathers news articles from my favorite sites and present them in a dashboard hosted on a local sever. The idea is to update every 5 or 10 minutes, so I guess rotating IP's and user agents is overkill. But I'll implement it anyway as good practice.

Bdw, I notice user_agents isn't used anywhere in the example.

Collapse
 
anderrv profile image
Ander Rodriguez

Hi, thanks!!

It's an overkill if your are sure that they won't block you. But depending on the sites, you cannot be 100% sure.

By user_agents you mean the library. As far as I know, it will detect capabilities based on an User Agent string. For our case, we would need a generator.