DEV Community

Discussion on: Beginner: Web Scraping in Mostly Pure Python

Collapse
 
allanlrh profile image
AllanLRH

Regexex might work just fine if you're just interested in titles, headers and links, but otherwise you should consider using an XML-parser, like Beautifulsoup with the lxml library for backend.

Though it's worth noting that you won't be able to scrape elements which are rendered with javascript, since there's no javascript engine in a pure python scraper.

If you're after performance, you can use some async-stuff or just threading... I think they do approximately the same under the hood.
Or use grequests in lieu of requests, which handles the async-stuff for you, while mostly retaining the same API (it's now a generator, but otherwise I think it's the same). And it's by the same author as the requests library.

Happy scraping!