DEV Community

Dennis Kariuki
Dennis Kariuki

Posted on

Weather Dataset- Data Analysis - Personal Project

Data analysis

Data analysis in my own understanding is trying to make sense of the data that you have - the new oil is Data and data needs to be cleaned, organized and turned into meaningful data and generally make some insights from it.

Why am Here???

This is my first post on Dev.to doing my first personal project.
An trying to horn my data science/ Analysis skills and learn a new skills.

Lets Begin

The dataset I shall be using is Weather Data for beginners from Kaggle and i will try to answer the following questions:

Week 1 Project:

  1. Find all records where the weather was exactly clear.
  2. Find the number of times the wind speed was exactly 4 km/hr.
  3. Check if there are any NULL values present in the dataset.
  4. Rename the column "Weather" to "Weather_Condition."
  5. What is the mean visibility of the dataset?
  6. Find the number of records where the wind speed is greater than 24 km/hr and visibility is equal to 25 km.
  7. What is the mean value of each column for each weather condition?
  8. Find all instances where the weather is clear and the relative humidity is greater than 50, or visibility is above 40.
  9. Find the number of weather conditions that include snow. Part 2: Move this CSV into a database of your choice and use SQL to answer 4 of the questions above.

The Libraries you need for this Project

In my case these are the Libraries that I have used:

import os
import numpy as np
import pandas as pd
import csv
Enter fullscreen mode Exit fullscreen mode

Then import the downloaded and extracted dataset (Correctly use the correct path to the downloaded dataset) - In my case this is the path and how I imported the data set and read the dataset

#Loading Data from the CSV file
read_data = pd.read_csv(r'C:\Users\DELL\Desktop\Portfolio Data Analyst\Lux Academy\Week 1 Project - Weather Dataset for Beginners\1. Weather Data.csv')

#Reading Data from the CSV File
read_data
Enter fullscreen mode Exit fullscreen mode

Question 1 - Find all records where the weather was exactly clear

To answer this Question we have to look at the Weather Column and check to see the different types of weathers that are there and how many different weathers are there in the dataset using the Values.Count()

read_data['Weather'].value_counts()
Enter fullscreen mode Exit fullscreen mode

The Total Number of times the Weather was Exactly Clear is 1326 Times

Question 2 - Find the number of times the wind speed was exactly 4 km/hr.

To answer this Question we have to look at the Weather Column and check to see the different types of weathers that are there and how many different weathers are there in the dataset using the Values.Count()

read_data[read_data['Wind Speed_km/h'] == 4] #Shows all the data that has a 4km/h wind speed
Enter fullscreen mode Exit fullscreen mode

To get how many days has that speed we can use count

read_data[read_data['Wind Speed_km/h'] == 4].count() 
Enter fullscreen mode Exit fullscreen mode

We can see that using Count we have 474 Days that has a 4km/h wind speed

Question 3 - Check if there are any NULL values present in the dataset

read_data.isnull().sum()
Enter fullscreen mode Exit fullscreen mode

The data set doesn't have any null values

Question 4 - Rename the column "Weather" to "Weather_Condition."

read_data.columns

read_data.rename(columns={'Weather':'Weather Condition'})
Enter fullscreen mode Exit fullscreen mode

Question 5 - What is the mean visibility of the dataset?

Two ways to do this - The first Method

read_data['Visibility_km'].mean()
Enter fullscreen mode Exit fullscreen mode

The Second way to do this

read_data.describe()
Enter fullscreen mode Exit fullscreen mode

Question 6 - Find the number of records where the wind speed is greater than 24 km/hr and visibility is equal to 25 km

read_data[(read_data['Wind Speed_km/h']>24) & (read_data['Visibility_km']==25)]
Enter fullscreen mode Exit fullscreen mode

Question 7 - What is the mean value of each column for each weather condition?

For this Question we can use describe to get the mean value for each weather condition

read_data.describe()
Enter fullscreen mode Exit fullscreen mode

Question 8 - Find all instances where the weather is clear and the relative humidity is greater than 50, or visibility is above 40.

read_data[(read_data['Weather'] =='Clear') & (read_data['Rel Hum_%']>50) &(read_data['Visibility_km']>40)]
Enter fullscreen mode Exit fullscreen mode

Question 9 - Find the number of weather conditions that include snow.

read_data[read_data['Weather'] =='Snow']
Enter fullscreen mode Exit fullscreen mode

And that's how I have managed to Handle this task.
Cheers to all.

Top comments (0)