Lasso Regreesion analysis was performed to evaluate the importance of a series of explanatory variables in predicting a probable answer or price in this case. The following explanatory variables were included as possible contributors to a Lasso Regression evaluating the probable price of a house in melbourne (output) includes the area, no.of rooms, landsize and many more.
This is my copy of colab notebook so evryone can see the output along with the code
import pandas as pd
df = pd.read_csv("Melbourne_housing_FULL.csv")
df.nunique()
Suburb 351
Address 34009
Rooms 12
Type 3
Price 2871
Method 9
SellerG 388
Date 78
Distance 215
Postcode 211
Bedroom2 15
Bathroom 11
Car 15
Landsize 1684
BuildingArea 740
YearBuilt 160
CouncilArea 33
Lattitude 13402
Longtitude 14524
Regionname 8
Propertycount 342
dtype: int64
dfS = df[['Suburb', 'Rooms', 'Type', 'Method', 'SellerG', 'Regionname', 'Propertycount',
'Distance', 'CouncilArea', 'Bedroom2', 'Bathroom', 'Car', 'Landsize', 'BuildingArea', 'Price']]
Suburb Rooms Type Method SellerG Regionname Propertycount Distance CouncilArea Bedroom2 Bathroom
0 Abbotsford 2 h SS Jellis Northern
Metropolitan 4019.0 2.5 Yarra City Council 2.0 1.0
34857 rows × 15 columns
dfS
dfS.isna().sum()
Suburb 0
Rooms 0
Type 0
Method 0
SellerG 0
Regionname 3
Propertycount 3
Distance 1
CouncilArea 3
dfS[['Propertycount','Distance','Bedroom2','Bathroom','Car']] = dfS[['Propertycount','Distance','
dfS['Landsize']=dfS['Landsize'].fillna(dfS['Landsize'].mean())
dfS['BuildingArea']=dfS['BuildingArea'].fillna(dfS['BuildingArea'].mean())
dfS.dropna(inplace=True)
dfS = pd.get_dummies(dfS,drop_first=True)
X = dfS.drop
Y= dfS['Price']
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.2)
from sklearn.linear_model import Lasso #l1
CPU times: user 3 µs, sys: 2 µs, total: 5 µs
Wall time: 10.3 µs
▾ Lasso
Lasso(alpha=50, tol=0.1)
lasso = Lasso(alpha=50, max_iter=1000, tol=0.1)
lasso.fit(X_train,Y_train)
Output
predictions = lasso.predict(X_test)
predictions
array([1323721.23922339, 721160.34344916, 623689.80964616, ...,
987946.0460597 , 983561.59313765, 160658.00272658])
lasso.score(X_test,Y_test)
0.6388165172009165
As mentioned in the code from the Lasso Regression Analysis, We get an overall accuracy of about 63% (as shown in code)
Top comments (0)