I'm working on a scraper that gathers news articles from my favorite sites and present them in a dashboard hosted on a local sever. The idea is to update every 5 or 10 minutes, so I guess rotating IP's and user agents is overkill. But I'll implement it anyway as good practice.
Bdw, I notice user_agents isn't used anywhere in the example.
Web developer who has been working for several startups for more than 10 years, having worked with a wide variety of sectors and technologies. Engineer turned entrepreneur.
It's an overkill if your are sure that they won't block you. But depending on the sites, you cannot be 100% sure.
By user_agents you mean the library. As far as I know, it will detect capabilities based on an User Agent string. For our case, we would need a generator.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Great article, thanks.
I'm working on a scraper that gathers news articles from my favorite sites and present them in a dashboard hosted on a local sever. The idea is to update every 5 or 10 minutes, so I guess rotating IP's and user agents is overkill. But I'll implement it anyway as good practice.
Bdw, I notice
user_agents
isn't used anywhere in the example.Hi, thanks!!
It's an overkill if your are sure that they won't block you. But depending on the sites, you cannot be 100% sure.
By
user_agents
you mean the library. As far as I know, it will detect capabilities based on an User Agent string. For our case, we would need a generator.