Introduction
Dataset of World University Rankings of past 8 years along with the parameters like Quality of Staff, Alumni Employment, Publications, Citations, etc. I used beautiful soup parser to extract data from html source code, which I obtained using requests package in python.
You can find the folder of dataset on Kaggle : https://www.kaggle.com/saumitrajagdale/university-rankings
Scraping Data
I scraped this data from the official rankings page provided by CWUR.
URL: https://cwur.org/
The dependecy packages which I use were:
- Pandas: For keeping data in dataframe format.
- Beautiful Soup (bs4): For parsing HTML source code.
- Requests: For obtaining the source code of a given url.
- Numpy: For basic array operations.
Code Snippet For Scraping:
# Dependencies
import pandas as pd
import bs4
import urllib.request
import numpy as np
# Obtaining source code from the url
url ="https://cwur.org/2012.php"
url_contents = urllib.request.urlopen(url).read()
# Parsing the HTML source code
soup = bs4.BeautifulSoup(url_contents, "html.parser")
# Extracting the data according to the HTML tags
rows=[]
r=soup.findAll("tr")
for i in range(1,len(r)):
temp=r[i].findAll("td")
row=[]
for j in range(0,len(temp)):
if j==0:
s=str(temp[j])
s=s[4:]
s=s[:-5]
row.append(s)
else:
s=str(temp[j])
s=s[4:]
s=s[:-5]
row.append(s)
print(row)
rows.append(row)
# Converting data into dataframe usings pandas
df=pd.DataFrame(rows,columns=["World Rank","University","Location","National Rank", "Quality of Education", "Alumni Employment", "Quality of Faculty", "Publications", "Influence", "Citations", "Patents","Score"])
print(df)
# Creating csv file from the dataframe
df.to_csv("University_Ranks_2012.csv")
Scope of Analysis
This Dataset can be used for following analysis:
- To find the most significant and weighted parameter affecting the ranks of Universities
- To find the trend of rankings of past 8 years based on the parameters provided as columns in dataset.
- To visualise the ranking rise and fall of a particular university with rankings as y- axis and years as x-axis. [Line Graphs]
Top comments (0)