DEV Community

Kamaumbugua-dev
Kamaumbugua-dev

Posted on

From Data to Predictions: My Journey Building a California Housing Price Model.

Over the past few weeks, I’ve been diving deep into machine learning by working on a project that predicts California housing prices. This hands-on journey not only strengthened my technical skills but also gave me a clearer understanding of the workflow that turns raw data into actionable insights.

In this article, I’ll walk you through:

What I built

The skills I gained

Why these skills matter in the real world

Project Overview
The goal was to build a regression model that could predict median house prices in California using the California Housing dataset.

Here’s the process I followed:

Loading the dataset

housing = datasets.fetch_california_housing()
x = housing.data
y = housing.target
Enter fullscreen mode Exit fullscreen mode

This dataset contains information such as median income, house age, and average rooms per household.

Feature Engineering
I expanded the dataset using Polynomial Features to capture more complex relationships between the variables:

poly = PolynomialFeatures()
x = poly.fit_transform(x)
Enter fullscreen mode Exit fullscreen mode

This generated 37 additional features essentially combinations and squared values of the original features giving the model more information to learn from.

Train-Test Split
To ensure the model could generalize, I split the data into training (80%) and testing (20%) sets.

Model Optimization
I experimented with different learning rates and iteration counts using the HistGradientBoostingRegressor, a powerful gradient boosting algorithm:

model = HistGradientBoostingRegressor(
    max_iter=350,
    learning_rate=0.05
)
model.fit(x_train, y_train)
Enter fullscreen mode Exit fullscreen mode

Evaluation
I measured model performance using the R² score:

r2 = r2_score(y_test, y_pred)
print(r2)
Enter fullscreen mode Exit fullscreen mode

This score reflects how well the model explains the variation in housing prices.

Model Deployment
I saved the trained model using joblib so it can be reused in future applications without retraining:

joblib.dump(model, "housing_price_model.joblib")
Enter fullscreen mode Exit fullscreen mode

Key Skills I Gained
Data Preprocessing & Feature Engineering

Learned how to transform raw datasets into forms that machine learning models can better understand.

Understood the importance of feature interactions through polynomial feature expansion.

Model Selection & Optimization

Experimented with different learning rates, iteration counts, and model architectures.

Gained experience in tuning hyperparameters to balance accuracy and computational efficiency.

Model Evaluation

Applied the R² score to assess model performance.

Learned how to interpret evaluation metrics in a real-world context.

Model Persistence

Used joblib to save and load trained models — a critical skill for deploying ML solutions.

Why These Skills Matter
These skills aren’t just academic exercises — they’re exactly what data scientists and machine learning engineers use in real-world projects.

Feature engineering is the backbone of improving model performance.

Hyperparameter tuning can make the difference between an okay model and a production-ready one.

Model evaluation ensures you’re building something that works beyond your own dataset.

Model persistence bridges the gap between experimentation and real-world application.

With these capabilities, I can confidently approach real-world datasets, build predictive models, and prepare them for production environments.

Next Steps
This project has been a solid step forward in my machine learning journey. My plan is to:

Experiment with ensemble models to further improve performance.

Deploy the trained model via an API so it can be used in web applications.

Apply similar workflows to other datasets, such as sales forecasting and recommendation systems.

If you’re a developer or employer looking for someone who can turn data into decisions, this project is a small window into how I approach machine learning challenges in a way that is methodical, curious, and results-driven.

I’d love to hear your thoughts on how would you have improved this model?

Top comments (0)