DEV Community

Cover image for Web scraping with PyQuery
petercour
petercour

Posted on

Web scraping with PyQuery

How do you do webscraping these days? You may be working with beautifulsoup or automate the web browser with selenium.

If you have very basic scraping needs, you could consider pyquery.

pyquery allows you to make jquery queries on xml documents. That's great, because you can use it on HTML.

First intsall PyQuery with pip. Then you can use it like this:

#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://pythonbasics.org")
print( doc('title').text() )
Enter fullscreen mode Exit fullscreen mode

That will grab the title from the web page.
Want to get all links from a web page?

#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://dev.to")

for link in doc('a'):
    print(link.attrib['href'])
Enter fullscreen mode Exit fullscreen mode

Easy right?

Do you prefer getting images?

#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://dev.to")

for link in doc('img'):
    print(link.attrib['src'])
Enter fullscreen mode Exit fullscreen mode

Related links:

Discussion (6)

Collapse
prashantsengar profile image
Prashant Sengar

How is it different from BeautifulSoup?

Collapse
kaelscion profile image
kaelscion • Edited on

As far as what it does? Doesn't seem to be different at all. It just looks like the HTML parsing and web element selector syntax would be more comfortable for developers coming from the front end than from the back. IME, web scrapers are either front end devs that want to use web scraping for automated front end QA, or are back end devs that use it to collect data for a data set or API they want to build. BeautifulSoup is very pythonic in it's use and, if you're new to Python from the front end, using it might be a bit of a tough gear change. This library simply looks like a bridge to allow JS/JQuery folks to more easily break into Python web scraping comfortably. Sure there are ways to perform web scraping tasks in JS, but this fills a nice little niche for "new transfers" 😁😁

Collapse
petercour profile image
petercour Author

PyQuery is as much as possible the similar to jquery. Functionally you can do the same thing, but a different syntax.

Collapse
brukzjames profile image
James Brukz • Edited on

Interesting library but you still need to render JS to get the content on most of the sites. I usually go with Puppeteer or webscraping.ai/ API for it.

Collapse
raunanza profile image
Raunanza

Brother, what to type if I want a custom user agent?

Collapse
m_waqarsikandar profile image
Comment marked as low quality/non-constructive by the community. View Code of Conduct
Software Craftsman

Check out I'm using Scrapy tool in python!!!
fiverr.com/m_waqarsikandar