Data analysis
Data analysis in my own understanding is trying to make sense of the data that you have - the new oil is Data and data needs to be cleaned, organized and turned into meaningful data and generally make some insights from it.
Why am Here???
This is my first post on Dev.to doing my first personal project.
An trying to horn my data science/ Analysis skills and learn a new skills.
Lets Begin
The dataset I shall be using is Weather Data for beginners from Kaggle and i will try to answer the following questions:
Week 1 Project:
- Find all records where the weather was exactly clear.
- Find the number of times the wind speed was exactly 4 km/hr.
- Check if there are any NULL values present in the dataset.
- Rename the column "Weather" to "Weather_Condition."
- What is the mean visibility of the dataset?
- Find the number of records where the wind speed is greater than 24 km/hr and visibility is equal to 25 km.
- What is the mean value of each column for each weather condition?
- Find all instances where the weather is clear and the relative humidity is greater than 50, or visibility is above 40.
- Find the number of weather conditions that include snow. Part 2: Move this CSV into a database of your choice and use SQL to answer 4 of the questions above.
The Libraries you need for this Project
In my case these are the Libraries that I have used:
import os
import numpy as np
import pandas as pd
import csv
Then import the downloaded and extracted dataset (Correctly use the correct path to the downloaded dataset) - In my case this is the path and how I imported the data set and read the dataset
#Loading Data from the CSV file
read_data = pd.read_csv(r'C:\Users\DELL\Desktop\Portfolio Data Analyst\Lux Academy\Week 1 Project - Weather Dataset for Beginners\1. Weather Data.csv')
#Reading Data from the CSV File
read_data
Question 1 - Find all records where the weather was exactly clear
To answer this Question we have to look at the Weather Column and check to see the different types of weathers that are there and how many different weathers are there in the dataset using the Values.Count()
read_data['Weather'].value_counts()
The Total Number of times the Weather was Exactly Clear is 1326 Times
Question 2 - Find the number of times the wind speed was exactly 4 km/hr.
To answer this Question we have to look at the Weather Column and check to see the different types of weathers that are there and how many different weathers are there in the dataset using the Values.Count()
read_data[read_data['Wind Speed_km/h'] == 4] #Shows all the data that has a 4km/h wind speed
To get how many days has that speed we can use count
read_data[read_data['Wind Speed_km/h'] == 4].count()
We can see that using Count we have 474 Days that has a 4km/h wind speed
Question 3 - Check if there are any NULL values present in the dataset
read_data.isnull().sum()
The data set doesn't have any null values
Question 4 - Rename the column "Weather" to "Weather_Condition."
read_data.columns
read_data.rename(columns={'Weather':'Weather Condition'})
Question 5 - What is the mean visibility of the dataset?
Two ways to do this - The first Method
read_data['Visibility_km'].mean()
The Second way to do this
read_data.describe()
Question 6 - Find the number of records where the wind speed is greater than 24 km/hr and visibility is equal to 25 km
read_data[(read_data['Wind Speed_km/h']>24) & (read_data['Visibility_km']==25)]
Question 7 - What is the mean value of each column for each weather condition?
For this Question we can use describe to get the mean value for each weather condition
read_data.describe()
Question 8 - Find all instances where the weather is clear and the relative humidity is greater than 50, or visibility is above 40.
read_data[(read_data['Weather'] =='Clear') & (read_data['Rel Hum_%']>50) &(read_data['Visibility_km']>40)]
Question 9 - Find the number of weather conditions that include snow.
read_data[read_data['Weather'] =='Snow']
And that's how I have managed to Handle this task.
Cheers to all.
Top comments (0)