DEV Community

petercour
petercour

Posted on

2

Web Scraping HN with Python

You may know HN. A news aggregator with tech articles.

Let's scrape that with Python. The PyQuery module allows you to query HTML pages. You can collect all the links with PyQuery

#!/usr/bin/python3
from pyquery import PyQuery as pq

doc =pq(url = "https://news.ycombinator.com/front?day=2019-07-14" )

for link in doc('a.storylink'):
    print(link.attrib['href'])
Enter fullscreen mode Exit fullscreen mode

That returns the links for the day "2019-07-14". So you have a list of links printed to the screen. You want that in a file.

Hockey dockey.

You can save the output into a csv file. A csv file is a file with all values stored with a delimiter in between, usually a colon but we'll use a semicolon.

#!/usr/bin/python3
from pyquery import PyQuery as pq

date = "2019-07-14"
doc =pq(url = "https://news.ycombinator.com/front?day=" + date )

links = []
for link in doc('a.storylink'):
    links.append(link.attrib['href'])

with open('output.csv','w+') as csvfile:
    for link in links:
        csvfile.write( date + ";" + link + ";" )
        csvfile.write('\n')
Enter fullscreen mode Exit fullscreen mode

Simple right? :) Run it and you'll have all the links in a nicely formatted csv file.

A csv file can read with an office program (any spreadsheet) or you can read them using Python pandas.

Related links:

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay