Jem Herbert-Rice

Posted on Feb 17

The Question That Changed How I Build ML Models

#machinelearning #datascience #python #career

I was proud of my first machine learning model. Linear regression on a housing dataset, decent R² score, clean visualizations. I showed my mentor.

"What's the use case?" they asked.

I didn't have an answer.

The Problem With My First Model

I'd done what a lot of bootcamp grads do: grabbed a dataset from Kaggle, ran it through scikit-learn, got some metrics, called it done. The model worked. The code ran. The accuracy was... fine.

But I couldn't answer a simple question: Why does this exist?

"Well, it predicts house prices," I said.

"For who? When would someone actually use this?"

Silence.

I'd built a solution without understanding the problem. Classic mistake.

Starting Over With a Real Question

I scrapped the fabricated dataset and asked myself: What question am I actually trying to answer?

Here's what I landed on: I'm looking at properties in Sydney. I want to know what price range to expect for different suburbs and property types. I need something I can actually use when browsing real estate listings.

Not "build a machine learning model." That's the how, not the why.

The why: Give myself a realistic price estimate before I waste time on overpriced listings.

That's a real use case.

What Changed When I Had a Real Problem

1. The Dataset Mattered

Fabricated data → Real Sydney housing transactions (10,373 properties, 2016-2021)

I needed actual suburbs, actual price distributions, actual market segments. The model had to reflect reality, not just pass a training loop.

2. The Features Changed

Generic columns → Domain knowledge

I added:

Suburb clustering (5 market tiers from entry to luxury)
Price-per-sqm normalization (because a 50sqm apartment and 500sqm house shouldn't be in the same model unscaled)
Deviation from suburb median (is this property over/underpriced for its area?)
Inflation adjustment (2016 prices scaled to 2026 dollars)

These features came from asking: "What actually affects whether a property is expensive?"

3. The Accuracy Suddenly Mattered

R² = 0.62 → R² = 0.95

When it was a classroom exercise, 62% was fine. When I'm using it to evaluate actual properties, 62% means I'm off by $850k on a $1.5M house. That's useless.

I needed ±$146k error (8.7% MAPE). That's a range I can work with.

Getting there required:

Switching to XGBoost (non-linear relationships matter in real estate)
Proper feature engineering (not just throwing columns at the model)
Extensive hyperparameter tuning
Actually understanding why the model was making mistakes

4. Deployment Became Non-Negotiable

Jupyter notebook → Live Streamlit app

If I'm the only person who can use this, what's the point? I built a web interface where anyone can select a suburb, input property details, and get an instant estimate with an uncertainty range.

It's live. You can use it right now: house-price-predictor.streamlit.app

What I Actually Learned

Machine learning isn't about models. It's about problems.

The model is just the tool. If you don't know what problem you're solving, you're just doing math exercises.

Here's the framework I use now:

1. Start with the question

Who needs this answer?
When would they use it?
What decision does it inform?

2. Build for that question

What data actually answers this?
What features capture the real dynamics?
What accuracy is "good enough" for the use case?

3. Make it usable

Can someone else run this?
Does it work on real data, not just test sets?
Is the output actionable?

The Difference

Before: "I built a model that predicts house prices with 62% accuracy"

After: "I built a tool that tells me if a Sydney property is overpriced before I waste time inspecting it, with ±8.7% error on typical listings"

One is a portfolio project. The other is something I actually use.

What Changed for Me

I still build models. But now I start every project with: What question am I answering?

If I can't articulate that in one sentence, I don't write the code yet.

That one question from my mentor—"What's the use case?"—completely shifted how I approach machine learning. I'm not building models anymore. I'm solving problems that happen to need models.

Turns out that's what the job actually is.

The predictor is live if you want to try it: house-price-predictor.streamlit.app

Code on GitHub: github.com/JemHRice/house-price-predictor

Building in public as I transition from operations manager to ML engineer. Follow along on Dev.to or connect on LinkedIn.