Over the past few weeks, I’ve been diving deep into machine learning by working on a project that predicts California housing prices. This hands-on journey not only strengthened my technical skills but also gave me a clearer understanding of the workflow that turns raw data into actionable insights.
In this article, I’ll walk you through:
What I built
The skills I gained
Why these skills matter in the real world
Project Overview
The goal was to build a regression model that could predict median house prices in California using the California Housing dataset.
Here’s the process I followed:
Loading the dataset
housing = datasets.fetch_california_housing()
x = housing.data
y = housing.target
This dataset contains information such as median income, house age, and average rooms per household.
Feature Engineering
I expanded the dataset using Polynomial Features to capture more complex relationships between the variables:
poly = PolynomialFeatures()
x = poly.fit_transform(x)
This generated 37 additional features essentially combinations and squared values of the original features giving the model more information to learn from.
Train-Test Split
To ensure the model could generalize, I split the data into training (80%) and testing (20%) sets.
Model Optimization
I experimented with different learning rates and iteration counts using the HistGradientBoostingRegressor, a powerful gradient boosting algorithm:
model = HistGradientBoostingRegressor(
max_iter=350,
learning_rate=0.05
)
model.fit(x_train, y_train)
Evaluation
I measured model performance using the R² score:
r2 = r2_score(y_test, y_pred)
print(r2)
This score reflects how well the model explains the variation in housing prices.
Model Deployment
I saved the trained model using joblib so it can be reused in future applications without retraining:
joblib.dump(model, "housing_price_model.joblib")
Key Skills I Gained
Data Preprocessing & Feature Engineering
Learned how to transform raw datasets into forms that machine learning models can better understand.
Understood the importance of feature interactions through polynomial feature expansion.
Model Selection & Optimization
Experimented with different learning rates, iteration counts, and model architectures.
Gained experience in tuning hyperparameters to balance accuracy and computational efficiency.
Model Evaluation
Applied the R² score to assess model performance.
Learned how to interpret evaluation metrics in a real-world context.
Model Persistence
Used joblib to save and load trained models — a critical skill for deploying ML solutions.
Why These Skills Matter
These skills aren’t just academic exercises — they’re exactly what data scientists and machine learning engineers use in real-world projects.
Feature engineering is the backbone of improving model performance.
Hyperparameter tuning can make the difference between an okay model and a production-ready one.
Model evaluation ensures you’re building something that works beyond your own dataset.
Model persistence bridges the gap between experimentation and real-world application.
With these capabilities, I can confidently approach real-world datasets, build predictive models, and prepare them for production environments.
Next Steps
This project has been a solid step forward in my machine learning journey. My plan is to:
Experiment with ensemble models to further improve performance.
Deploy the trained model via an API so it can be used in web applications.
Apply similar workflows to other datasets, such as sales forecasting and recommendation systems.
If you’re a developer or employer looking for someone who can turn data into decisions, this project is a small window into how I approach machine learning challenges in a way that is methodical, curious, and results-driven.
I’d love to hear your thoughts on how would you have improved this model?
Top comments (0)