DEV Community

Rupak Biswas
Rupak Biswas

Posted on • Edited on

House price prediction using Lasso Regression

Lasso Regreesion analysis was performed to evaluate the importance of a series of explanatory variables in predicting a probable answer or price in this case. The following explanatory variables were included as possible contributors to a Lasso Regression evaluating the probable price of a house in melbourne (output) includes the area, no.of rooms, landsize and many more.

This is my copy of colab notebook so evryone can see the output along with the code

import pandas as pd
df = pd.read_csv("Melbourne_housing_FULL.csv")
df.nunique()

Suburb 351
Address 34009
Rooms 12
Type 3
Price 2871
Method 9
SellerG 388
Date 78
Distance 215
Postcode 211
Bedroom2 15
Bathroom 11
Car 15
Landsize 1684
BuildingArea 740
YearBuilt 160
CouncilArea 33
Lattitude 13402
Longtitude 14524
Regionname 8
Propertycount 342
dtype: int64

dfS = df[['Suburb', 'Rooms', 'Type', 'Method', 'SellerG', 'Regionname', 'Propertycount', 
               'Distance', 'CouncilArea', 'Bedroom2', 'Bathroom', 'Car', 'Landsize', 'BuildingArea', 'Price']]

Suburb Rooms Type Method SellerG Regionname Propertycount Distance CouncilArea Bedroom2 Bathroom
0 Abbotsford 2 h SS Jellis Northern
Metropolitan 4019.0 2.5 Yarra City Council 2.0 1.0
34857 rows × 15 columns

dfS
dfS.isna().sum()

Suburb 0
Rooms 0
Type 0
Method 0
SellerG 0
Regionname 3
Propertycount 3
Distance 1
CouncilArea 3


dfS[['Propertycount','Distance','Bedroom2','Bathroom','Car']] = dfS[['Propertycount','Distance','
dfS['Landsize']=dfS['Landsize'].fillna(dfS['Landsize'].mean())
dfS['BuildingArea']=dfS['BuildingArea'].fillna(dfS['BuildingArea'].mean())

dfS.dropna(inplace=True)
dfS = pd.get_dummies(dfS,drop_first=True)

X = dfS.drop 
Y= dfS['Price']

from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.2)
from sklearn.linear_model import Lasso #l1
CPU times: user 3 µs, sys: 2 µs, total: 5 µs
Wall time: 10.3 µs
▾ Lasso
Lasso(alpha=50, tol=0.1)
lasso = Lasso(alpha=50, max_iter=1000, tol=0.1)
lasso.fit(X_train,Y_train)
Enter fullscreen mode Exit fullscreen mode

Output

predictions = lasso.predict(X_test)
predictions
array([1323721.23922339, 721160.34344916, 623689.80964616, ...,
 987946.0460597 , 983561.59313765, 160658.00272658])
lasso.score(X_test,Y_test)
0.6388165172009165

Enter fullscreen mode Exit fullscreen mode

As mentioned in the code from the Lasso Regression Analysis, We get an overall accuracy of about 63% (as shown in code)

Top comments (0)