Introduction
Hey everyone!
Ever wondered how weather affects bike rentals in the city? Well, Codecademy did too! As part of their Data Engineering Course, they handed me some data and said, "Analyze this!" So, I grabbed my Python, Jupyter Notebook, and PostgreSQL, and embarked on a data adventure to uncover the secrets of bike rental patterns.
In this post, I'll walk you through the process, share some cool insights, and maybe even crack a few jokes along the way. So, grab your helmet, and let's ride through this data journey together!
Data Preparation
First things first, let's talk data. For this project, Codecademy provided two datasets:
- Citi Bike ridership data: This gives us the lowdown on when and where people are riding those snazzy blue bikes.
- NOAA weather data: Because we need to know if it's raining cats and dogs or if it's sunny and perfect for a ride.
Cleaning and transforming the data is like tuning up your bike – it's gotta be done before you hit the road. Here are some highlights of what I did:
- Dealt with missing values: Because incomplete data is like riding a bike with a flat tire.
- Merged the datasets: Think of it as combining the best parts of two bikes into one super-bike.
- Formatted columns: Making sure dates look like dates and not some weird alien language.
Here's a snippet of the code that got the job done (you can check out the entire code at the GitHub repository :
import pandas as pd
# Load the datasets
bike_data = pd.read_csv('citi_bike_data.csv')
weather_data = pd.read_csv('weather_data.csv')
# Handle missing values
bike_data.fillna(0, inplace=True)
weather_data.fillna(method='ffill', inplace=True)
# Merge datasets on the date column
merged_data = pd.merge(bike_data, weather_data, on='date')
# Format columns
merged_data['date'] = pd.to_datetime(merged_data['date'])
With the data prepped and ready, it was time to create the backbone of our analysis – the PostgreSQL database.
Setting Up PostgreSQL
Building a database schema is like designing the perfect bike frame – it needs to be sturdy and efficient. Here's the lowdown on how I set up my PostgreSQL database:
- Tables: Created tables for bike rentals and weather data.
- Relationships: Linked the tables on the date column because it's all about timing!
Here's a sneak peek at the SQL code for creating the tables:
CREATE TABLE bike_rentals (
ride_id SERIAL PRIMARY KEY,
date DATE,
start_station VARCHAR(100),
end_station VARCHAR(100),
user_type VARCHAR(50)
);
CREATE TABLE weather (
weather_id SERIAL PRIMARY KEY,
date DATE,
temperature FLOAT,
precipitation FLOAT,
snowfall FLOAT
);
After setting up the tables, I inserted the cleaned data into PostgreSQL. Smooth sailing so far!
Data Analysis
Now comes the fun part – analyzing the data! WiGitHub repositoryth our PostgreSQL database set up, it was time to dig into the data and uncover some insights. Here are the key questions we aimed to answer:
- How does weather affect the number of bike rentals?
- Are there specific weather conditions that lead to a spike or drop in bike rentals?
To tackle these questions, I used SQL to query the database and visualize the results with Python. Here’s an example of one of the SQL queries I used:
SELECT
br.date,
br.user_type,
COUNT(br.ride_id) AS trip_count,
w.temperature,
w.precipitation
FROM
bike_rentals br
JOIN
weather w ON br.date = w.date
GROUP BY
br.date, br.user_type, w.temperature, w.precipitation
ORDER BY
br.date;
This query gave us a detailed view of bike rentals per day, categorized by user type and correlated with temperature and precipitation.
Results and Discussion
Here are some of the cool insights we uncovered:
- Temperature vs. Bike Rentals: Warmer days see more bike rentals. No surprises there! Who doesn't love a sunny ride?
- Precipitation vs. Bike Rentals: Rainy days? Not so much. Bike rentals drop significantly when it’s raining.
Here’s a visualization of the temperature vs. trip counts:
import matplotlib.pyplot as plt
# Sample data for plotting
temperature = merged_data['temperature']
trip_counts = merged_data['trip_count']
plt.scatter(temperature, trip_counts)
plt.title('Temperature vs. Bike Rentals')
plt.xlabel('Temperature (°C)')
plt.ylabel('Trip Counts')
plt.show()
And another one for precipitation:
precipitation = merged_data['precipitation']
plt.scatter(precipitation, trip_counts)
plt.title('Precipitation vs. Bike Rentals')
plt.xlabel('Precipitation (mm)')
plt.ylabel('Trip Counts')
plt.show()
As expected, there’s a strong positive correlation between warm weather and bike rentals. On the flip side, rain is a big no-no for bike enthusiasts.
Conclusion
This project was a blast to work on and gave me some great insights into how weather affects bike rentals. It was a perfect blend of data cleaning, database management, and analysis – all while pedaling through Codecademy's Data Engineering Course.
If you have any thoughts, suggestions, or just want to share your own bike rental stories, drop a comment below! Let's keep the conversation rolling.
Code and Data
You can check out the complete code and datasets on my GitHub repository. Feel free to fork it, play around with the data, and maybe even come up with your own insights.
Thanks for reading, and happy riding!
Top comments (0)