DEV Community

Cover image for What is the best way for web scraping?
Nishant Mittal
Nishant Mittal

Posted on

What is the best way for web scraping?

Hi Guys,

I know that Scrapy can be used to scrape data. But also I want the code to be presentable on Github. I want to know what are best practices for web scraping using Python.

Also, if you guys know any web scraping project on Github please provide me the link to it.

Latest comments (29)

Collapse
 
webscrapingx profile image
RedAndGreen Web Scraping

github.com/RGGH

Scrapy repos x 6

Scrapy has steeper learning curve, but that means it's better once you've learned it!

Collapse
 
nishantwrp profile image
Nishant Mittal

Thanks for sharing your code!

Collapse
 
patarapolw profile image
Pacharapol Withayasakpunt • Edited

Indeed, in the past, I used Python.

  • Concurrency -- try ThreadPoolExecutor, or some kinds of coroutines. It may speed up things a lot.
  • GET the content. I guess requests is OK.
  • Locating the content. I now prefer lxml to BeautifulSoup.

As you have noticed in some of the comments, you might try Node.js, where you can use Cheerio, which is jQuery-ish; but has no problem with CORS. (You may still need to fetch with axios or Node-fetch, though.)

Collapse
 
nishantwrp profile image
Nishant Mittal

Yup, but still I'm wondering why almost no one is in favour of Scrapy.

Collapse
 
andrewbrown profile image
Andrew Brown 🇨🇦

The best way is to not get caught 🙃
I just use Nokogiri

Collapse
 
nishantwrp profile image
Nishant Mittal

Haha

Collapse
 
mrakonja profile image
Mladen Milosavljevic

Scrapy is fastests but hardest to master.
Beautiful Soup and Selenium are better for beginners.

Collapse
 
nishantwrp profile image
Nishant Mittal

I agree.

Collapse
 
jldohmann profile image
Jesse

BeautifulSoup is great, and I've had a good experience with it. lxml (XPath) is my go-to though, and I like it!

Collapse
 
nishantwrp profile image
Nishant Mittal

Thanks for sharing!

Collapse
 
nishantwrp profile image
Nishant Mittal

Actually both!

Collapse
 
leonlafa profile image
Leon Lafayette

I've found puppeteer and cheerio to be a good combo.

Collapse
 
nishantwrp profile image
Nishant Mittal • Edited

This is the first time I've heard about cheerio. It looks nice but honestly I'm not a big fan of jquery.

Collapse
 
leonlafa profile image
Leon Lafayette

Me either.

I was pushed for time tbf and found cheerio made things a little less verbose allowing me to get things done quickly :D

Thread Thread
 
nishantwrp profile image
Nishant Mittal

Oh, ok.

Collapse
 
john9088 profile image
Jason Britto

I would suggest Selenium, its very easy with few methods, which can be used to explore DOM and fetch data.
I myself have made few projects with the help of youtube.
Try any project tutorial from youtube provided if you are alright familiar with basic python you will understand without any problem

Collapse
 
nishantwrp profile image
Nishant Mittal

Thanks for the suggestion but I don't think selenium would be the best fit for me.

Collapse
 
mxdws profile image
Martin Dawson

There are more than one ways to scrape with Python, but Beautiful Soup is definitely a stable, well documented, tried and tested library to use. I made a video about how to use it if that might help. Also wrote an article too 😀

Collapse
 
nishantwrp profile image
Nishant Mittal

Thanks! Will definitely take a look!

Collapse
 
maheshthedev profile image
Mahesh Sv

I created a Telegram automated Bot. I used Webscraping in the project. You can check the project here github.com/maheshthedev/DataScienc...

Collapse
 
nishantwrp profile image
Nishant Mittal

I see you've used BeautifulSoup. Thanks for the code!