John Wakaba

Posted on Feb 23

🏠 Building a Machine Learning Property Price Predictor (From Web Scraping to Deployment

#webscraping #streamlit #python #data

In this project, I built a complete end-to-end machine learning system
that:

Scrapes property listings
Cleans and engineers features
Trains multiple ML models
Deploys a pricing app
Builds a business-ready dashboard

This article walks through the entire pipeline from raw web data to a deployed ML product.

Step 1 --- Web Scraping

I built a Selenium scraper to extract:

Location
Property Type
Bedrooms
Bathrooms
Size (sqm)
Amenities
Price (KES)
Listing Date

Sample Scraping Logic

listings = driver.find_elements(
    By.XPATH,
    "//div[contains(@class,'listing') or contains(@class,'property') or contains(@class,'card')]"
)

for listing in listings:
    link = listing.find_element(By.XPATH, ".//a[contains(@href,'/listings/')]")
    property_url = link.get_attribute("href")

Sample Scraping Logic

listings = driver.find_elements(
    By.XPATH,
    "//div[contains(@class,'listing') or contains(@class,'property') or contains(@class,'card')]"
)

for listing in listings:
    link = listing.find_element(By.XPATH, ".//a[contains(@href,'/listings/')]")
    property_url = link.get_attribute("href")

Step 3 --- Exploratory Analysis

Most Expensive Locations

location_prices = df.groupby("Location")["Price (KES)"].median().sort_values(ascending=False)
print(location_prices)

Step 4 --- Modeling

Train/Test Split

from sklearn.model_selection import train_test_split

X = df[["Bedrooms", "Bathrooms", "Size (sqm)", "amenity_score"]]
y = df["Price (KES)"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Linear Regression (Baseline)

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

lr = LinearRegression()
lr.fit(X_train, y_train)

pred = lr.predict(X_test)

mae = mean_absolute_error(y_test, pred)
rmse = np.sqrt(mean_squared_error(y_test, pred))
r2 = r2_score(y_test, pred)

Random Forest

from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(
    n_estimators=200,
    random_state=42
)

rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)

XGBoost

from xgboost import XGBRegressor

xgb = XGBRegressor(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=5,
    random_state=42
)

xgb.fit(X_train, y_train)
xgb_pred = xgb.predict(X_test)

Step 5 --- Deployment (Streamlit App)

The pricing app allows users to input:

Location
Bedrooms
Bathrooms
Size
Amenities

And returns:

Predicted price
Estimated range (± MAE)
Explanation of price drivers

Run locally:

streamlit run Streamlit_app.py

Step 6 --- Executive Dashboard

Built using Streamlit with interactive filters.

Includes:

Median price by location
Monthly price trends
Price per sqft comparison
Amenity impact analysis

Run:

streamlit run Dashboard.py

Key Insights

Size is the strongest determinant of price.
Premium neighborhoods significantly increase valuation.
Amenities increase value but are secondary drivers.

DEV Community

🏠 Building a Machine Learning Property Price Predictor (From Web Scraping to Deployment

Step 1 --- Web Scraping

Sample Scraping Logic

Sample Scraping Logic

Step 3 --- Exploratory Analysis

Most Expensive Locations

Step 4 --- Modeling

Train/Test Split

Linear Regression (Baseline)

Random Forest

XGBoost

Step 5 --- Deployment (Streamlit App)

Step 6 --- Executive Dashboard

Key Insights

Top comments (0)