DEV Community

Durga Pokharel
Durga Pokharel

Posted on • Updated on

Day 77 Of 100DaysOfCode: Scrapping News Of Gorkha Patra Online

Today is my 77th day of #100daysofcode and #python learning journey. Like the usual day, I purchased some hours to learned about pandas data visualization from datacamp.

For the rest of the time, I keep working on my first project(News scrapping). Today I scrapped news of Gorkha Patra online. I could scrap news on a few different pages. I need to write different codes for different news fields like national, economics, business, province, etc. So it takes a lot of time to scrapped news of the same news portal. Below is my code which I used to scrapped news of the national field.

Python code with BeautifulSoup

Here I import different dependencies

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
from bs4 import BeautifulSoup as BS
import requests
import urllib3

Enter fullscreen mode Exit fullscreen mode

Url of required field is given below,

url = ""
Enter fullscreen mode Exit fullscreen mode

Parse News, Author, Date and Contents: News

ndict = {"Title":[], "Title URL":[], "Author": [], "Date":[], "Description": [], "Content":[]}
ndict = {'Title': [], "URL": [], "Date":[],
      "Author":[], "Author URL":[], "Content":[],"Category": [], "Description":[]}

for content in".business"):
  trend2 = content.select_one(".trending2")
  title = trend2.find("p").text 
  title = title.strip()

  author = trend2.find('small').text
  author = author.strip()
  author = author.split('\xa0\xa0\xa0\xa0\n')[0]
  # author
  date = trend2.find('small').text
  date = date.strip()
  date = date.split('\xa0\xa0\xa0\xa0\n')[1]
  description = trend2.select_one(".description").text.strip()

  # now got to this news url
  http.addheaders = [('User-agent', 'Mozilla/61.0')]
  web_page = http.request('GET',newsurl)
  news_soup = BS(, 'html5lib')
  author_url = news_soup.select_one(".post-author-name").find("a").get("href")
  for p in news_soup.select_one(".newstext").findAll("p"):
  catagory = url.split("/")[-1]
          Title: {title}, URL: {newsurl}
          Date: {date}, Author: {author},
          Category :{catagory} ,
          Author URL: {author_url}, 
          Description: {description},
          Content: {news_content}
Enter fullscreen mode Exit fullscreen mode

Day 77 Of #100DaysOfCode and #Python
Worked On My First Project (Scrapping news of gorkhapatraonline using beautifulSoup)#WomenWhoCode #CodeNewbie #100DaysOfCode #DEVCommunity

— Durga Pokharel (@mathdurga) March 16, 2021

Top comments (0)