You may know HN. A news aggregator with tech articles.
Let's scrape that with Python. The PyQuery module allows you to query HTML pages. You can collect all the links with PyQuery
#!/usr/bin/python3 from pyquery import PyQuery as pq doc =pq(url = "https://news.ycombinator.com/front?day=2019-07-14" ) for link in doc('a.storylink'): print(link.attrib['href'])
That returns the links for the day "2019-07-14". So you have a list of links printed to the screen. You want that in a file.
You can save the output into a csv file. A csv file is a file with all values stored with a delimiter in between, usually a colon but we'll use a semicolon.
#!/usr/bin/python3 from pyquery import PyQuery as pq date = "2019-07-14" doc =pq(url = "https://news.ycombinator.com/front?day=" + date ) links =  for link in doc('a.storylink'): links.append(link.attrib['href']) with open('output.csv','w+') as csvfile: for link in links: csvfile.write( date + ";" + link + ";" ) csvfile.write('\n')
Simple right? :) Run it and you'll have all the links in a nicely formatted csv file.
A csv file can read with an office program (any spreadsheet) or you can read them using Python pandas.