DEV Community

loading...

Discussion on: If you would need to scrape many different websites nowdays, which tool/language combo would you pick?

Collapse
patarapolw profile image
Pacharapol Withayasakpunt

Node.js +/- Puppetteer would probably be the first natural choice; although I am not accustomed to Puppetteer that much.

I used to use Selenium API with Python, if I need to scrape dynamic websites. But async in Python does not seems to be as natural as Node.js

I don't know much about Golang. How often is it used for web scraping?

Collapse
davcevski profile image
Mario Davchevski Author

But async in Python does not seems to be as natural as Node.js

This is one of the reasons I listed Go in the tags. Still learning it, but it feels that well thought concurrent code can go a long way in scraping at scale.

Basically I want to crawl simple blogs and extract their blog posts. The biggest challenge here would probably be the parsing of the data and understanding different content parts within a blogpost