๐ Table of Contents
- ๐ Welcome to Day 3
- ๐ Review of Day 2 ๐
- ๐ง Introduction to Supervised Learning: Regression ๐ง
- ๐ Regression Algorithms
- ๐ ๏ธ Implementing Regression Algorithms with Scikit-Learn ๐ ๏ธ
- ๐ Model Evaluation for Regression
- ๐ ๏ธ๐ Example Project: Housing Price Prediction
- ๐๐ Conclusion and Next Steps
- ๐ Summary of Day 3 ๐
1. ๐ Welcome to Day 3
Welcome to Day 3 of "Becoming a Scikit-Learn Boss in 90 Days"! Today, we'll explore Supervised Learning with a focus on Regression Algorithms. You'll learn about different regression techniques, implement them using Scikit-Learn, and evaluate their performance to build more accurate predictive models.
2. ๐ Review of Day 2 ๐
Before diving into today's topics, let's briefly recap what we covered yesterday:
- Supervised Learning: Classification Algorithms: Explored Logistic Regression, Decision Trees, K-Nearest Neighbors (KNN), and Support Vector Machines (SVM).
- Implementing Classification Algorithms with Scikit-Learn: Built, trained, and evaluated various classification models.
- Model Evaluation for Classification: Learned about accuracy, precision, recall, F1-score, confusion matrix, and ROC curves.
- Example Project: Advanced Iris Classification: Developed a comprehensive classification pipeline using multiple algorithms to classify Iris species and compared their performance.
With this foundation, we're ready to delve into regression techniques that will enhance your ability to make continuous predictions.
3. ๐ง Introduction to Supervised Learning: Regression ๐ง
๐ What is Regression?
Regression is a type of supervised learning where the goal is to predict a continuous target variable based on one or more predictor variables. Unlike classification, which deals with categorical outcomes, regression focuses on estimating numerical values.
๐ Types of Regression Problems
- Simple Linear Regression: Predicting a target variable using a single feature.
- Multiple Linear Regression: Predicting a target variable using multiple features.
- Regularized Regression: Techniques like Ridge, Lasso, and Elastic Net that add penalties to prevent overfitting.
- Polynomial Regression: Extending linear models to capture non-linear relationships.
4. ๐ Regression Algorithms ๐
๐ Linear Regression ๐
A foundational regression technique that models the relationship between the target variable and one or more features by fitting a linear equation.
๐ช Ridge Regression ๐ช
A regularized version of linear regression that adds an L2 penalty to the loss function to prevent overfitting by shrinking the coefficients.
โ๏ธ Lasso Regression โ๏ธ
Another regularized regression method that adds an L1 penalty, which can shrink some coefficients to zero, effectively performing feature selection.
๐ Elastic Net ๐
Combines both L1 and L2 regularization penalties, balancing the benefits of Ridge and Lasso regression.
5. ๐ ๏ธ Implementing Regression Algorithms with Scikit-Learn ๐ ๏ธ
๐ Linear Regression Example ๐
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Initialize the model
linear_reg = LinearRegression()
# Train the model
linear_reg.fit(X_train_scaled, y_train)
# Make predictions
y_pred_linear = linear_reg.predict(X_test_scaled)
# Evaluate the model
mse_linear = mean_squared_error(y_test, y_pred_linear)
r2_linear = r2_score(y_test, y_pred_linear)
print(f"Linear Regression MSE: {mse_linear:.2f}")
print(f"Linear Regression Rยฒ: {r2_linear:.2f}")
๐ช Ridge Regression Example ๐ช
from sklearn.linear_model import Ridge
# Initialize the model with alpha=1.0
ridge_reg = Ridge(alpha=1.0)
# Train the model
ridge_reg.fit(X_train_scaled, y_train)
# Make predictions
y_pred_ridge = ridge_reg.predict(X_test_scaled)
# Evaluate the model
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
r2_ridge = r2_score(y_test, y_pred_ridge)
print(f"Ridge Regression MSE: {mse_ridge:.2f}")
print(f"Ridge Regression Rยฒ: {r2_ridge:.2f}")
โ๏ธ Lasso Regression Example โ๏ธ
from sklearn.linear_model import Lasso
# Initialize the model with alpha=0.1
lasso_reg = Lasso(alpha=0.1)
# Train the model
lasso_reg.fit(X_train_scaled, y_train)
# Make predictions
y_pred_lasso = lasso_reg.predict(X_test_scaled)
# Evaluate the model
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
r2_lasso = r2_score(y_test, y_pred_lasso)
print(f"Lasso Regression MSE: {mse_lasso:.2f}")
print(f"Lasso Regression Rยฒ: {r2_lasso:.2f}")
๐ Elastic Net Example ๐
from sklearn.linear_model import ElasticNet
# Initialize the model with alpha=0.1 and l1_ratio=0.5
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
# Train the model
elastic_net.fit(X_train_scaled, y_train)
# Make predictions
y_pred_elastic = elastic_net.predict(X_test_scaled)
# Evaluate the model
mse_elastic = mean_squared_error(y_test, y_pred_elastic)
r2_elastic = r2_score(y_test, y_pred_elastic)
print(f"Elastic Net MSE: {mse_elastic:.2f}")
print(f"Elastic Net Rยฒ: {r2_elastic:.2f}")
6. ๐ Model Evaluation for Regression ๐
๐ Mean Squared Error (MSE) ๐
Measures the average of the squares of the errors, providing an indication of the quality of the estimator.
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
๐ Root Mean Squared Error (RMSE) ๐
The square root of MSE, representing the standard deviation of the residuals.
import numpy as np
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Root Mean Squared Error: {rmse:.2f}")
โ๏ธ Mean Absolute Error (MAE) โ๏ธ
Calculates the average of the absolute errors, providing a straightforward measure of prediction accuracy.
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error: {mae:.2f}")
๐ R-squared (Rยฒ) ๐
Indicates the proportion of the variance in the dependent variable that is predictable from the independent variables.
from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2:.2f}")
7. ๐ ๏ธ๐ Example Project: Housing Price Prediction ๐ ๏ธ๐
Let's apply today's concepts by building a regression model to predict housing prices using the California Housing Dataset.
๐ Project Overview
Objective: Develop a machine learning pipeline to predict housing prices based on various features such as location, size, and demographics.
Tools: Python, Scikit-Learn, pandas, Matplotlib, Seaborn
๐ Step-by-Step Guide
1. Load and Explore the Dataset
from sklearn.datasets import fetch_california_housing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load California Housing dataset
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = pd.Series(housing.target, name='MedHouseVal')
# Combine features and target
df = pd.concat([X, y], axis=1)
print(df.head())
# Visualize distribution of target variable
sns.histplot(df['MedHouseVal'], bins=50, kde=True)
plt.title('Distribution of Median House Values')
plt.xlabel('Median House Value')
plt.ylabel('Frequency')
plt.show()
2. Data Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Feature Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
3. Building and Training the Models
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
# Initialize models
linear_reg = LinearRegression()
ridge_reg = Ridge(alpha=1.0)
lasso_reg = Lasso(alpha=0.1)
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
# Train models
linear_reg.fit(X_train_scaled, y_train)
ridge_reg.fit(X_train_scaled, y_train)
lasso_reg.fit(X_train_scaled, y_train)
elastic_net.fit(X_train_scaled, y_train)
4. Making Predictions and Evaluating the Models
from sklearn.metrics import mean_squared_error, r2_score
models = {
'Linear Regression': linear_reg,
'Ridge Regression': ridge_reg,
'Lasso Regression': lasso_reg,
'Elastic Net': elastic_net
}
for name, model in models.items():
y_pred = model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"{name} Evaluation Metrics:")
print(f" MSE: {mse:.4f}")
print(f" RMSE: {rmse:.4f}")
print(f" MAE: {mae:.4f}")
print(f" Rยฒ: {r2:.4f}\n")
5. Comparing Model Performance
import numpy as np
# Initialize a DataFrame to store evaluation metrics
evaluation_df = pd.DataFrame(columns=['Model', 'MSE', 'RMSE', 'MAE', 'Rยฒ'])
for name, model in models.items():
y_pred = model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
evaluation_df = evaluation_df.append({
'Model': name,
'MSE': mse,
'RMSE': rmse,
'MAE': mae,
'Rยฒ': r2
}, ignore_index=True)
print(evaluation_df)
# Visualize the comparison
sns.barplot(x='Rยฒ', y='Model', data=evaluation_df, palette='coolwarm')
plt.title('Rยฒ Score Comparison of Regression Models')
plt.xlabel('Rยฒ Score')
plt.ylabel('Model')
plt.xlim(0, 1)
plt.show()
8. ๐๐ Conclusion and Next Steps ๐๐
Congratulations on completing Day 3 of "Becoming a Scikit-Learn Boss in 90 Days"! Today, you delved into Supervised Learning: Regression Algorithms, mastering techniques like Linear Regression, Ridge, Lasso, and Elastic Net. You implemented these algorithms using Scikit-Learn, evaluated their performance, and applied them to a real-world dataset to predict housing prices.
๐ฎ Whatโs Next?
- Day 4: Model Evaluation and Selection: Learn about cross-validation, hyperparameter tuning, and strategies to select the best model.
- Day 5: Unsupervised Learning โ Clustering and Dimensionality Reduction: Understand clustering algorithms like K-Means and techniques like PCA.
- Day 6: Advanced Feature Engineering: Master techniques to create and select features that enhance model performance.
- Day 7: Ensemble Methods: Explore ensemble techniques like Bagging, Boosting, and Stacking.
- Day 8: Model Deployment with Scikit-Learn: Learn how to deploy your models into production environments.
- Days 9-90: Specialized Topics and Projects: Engage in specialized topics and comprehensive projects to solidify your expertise.
๐ Tips for Success
- Practice Regularly: Apply the concepts through exercises and real-world projects.
- Engage with the Community: Join forums, attend webinars, and collaborate with peers.
- Stay Curious: Continuously explore new features and updates in Scikit-Learn.
- Document Your Work: Keep a detailed journal of your learning progress and projects.
Keep up the great work, and stay motivated as you continue your journey to mastering Scikit-Learn and machine learning! ๐๐
๐ Summary of Day 3 ๐
- ๐ง Introduction to Supervised Learning: Regression: Gained a foundational understanding of regression tasks and their types.
- ๐ Regression Algorithms: Explored Linear Regression, Ridge Regression, Lasso Regression, and Elastic Net.
- ๐ ๏ธ Implementing Regression Algorithms with Scikit-Learn: Learned how to build, train, and evaluate different regression models using Scikit-Learn.
- ๐ Model Evaluation for Regression: Mastered evaluation metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (Rยฒ).
- ๐ ๏ธ๐ Example Project: Housing Price Prediction: Developed a comprehensive regression pipeline using multiple algorithms to predict housing prices and compared their performance.
Top comments (0)