khaula nauman

Posted on Dec 16, 2024

Predicting House Rent with Linear Regression in Python

#buildinpublic #machinelearning

Introduction
Dataset
Visualizing the Data
Splitting Data: Features and Target
Splitting Data: Training and Testing
Implementing Linear Regression
Making Predictions
Wrapping Up
What’s Next?

Introduction

Are you curious about predicting house rent based on factors like area? Let's walk through building a simple predictive model using Python! We’ll use the House Rent Prediction Dataset from Kaggle and tools like Google Colab, Pandas, NumPy, and Matplotlib. For machine learning, we’ll leverage Scikit-Learn.

Dataset

We’ll use the dataset from Kaggle: House Rent Prediction. Download the .xlsx file

import pandas as pd

# Load the Excel file
df = pd.read_excel("Rent.xlsx")

Explanation:

The pd.read_excel() function from the pandas library reads data from an Excel file.
Parsing Data: The function parses the data from the file and creates a pandas DataFrame object.
DataFrame (df): This variable now holds the data from the Excel file in a structured format that can be easily manipulated and analyzed.

# Preview the data
df.head()

Explanation:

.head(): Displays the first 5 rows of the DataFrame. This is a quick way to preview the structure of the data, including its columns, datatypes, and a sample of the actual values.

Fun Fact 🧠

The .head() function gets its name because it displays the "head" or the first few rows of a dataset—just like a quick peek at the top of a document.

Visualizing the Data

To understand the relationship between area and rent, let’s plot a scatter plot.

import matplotlib.pyplot as plt

plt.scatter(df['area'], df['rent'])
plt.xlabel('area')
plt.ylabel('rent')
plt.show()

Explanation:

plt.scatter(): Creates a scatter plot with:
- x=df['area']: Values on the x-axis (independent variable, area).
- y=df['rent']: Values on the y-axis (dependent variable, rent).
plt.xlabel() and plt.ylabel(): Add labels for the x and y axes.

Observation 📊

Here’s what we observe from the plot:

The relationship between area and rent seems somewhat linear—perfect for a Linear Regression Model!

Splitting Data: Features and Target

We’ll now separate our dataset into features (X) and target (Y).

# Selecting the feature and target variables
x = df.iloc[:, 0:1]  # Feature: Area
y = df.iloc[:, -1]   # Target: Rent

y

Explanation:

iloc: This function provides integer-based indexing to select specific rows and columns.
[:, 0:1]: Selects all rows and the first column (0:1 includes column index 0, excludes column index 1).
[:, -1]: Selects all rows and the last column (-1 refers to the last column).

Pro Tip 📝

The iloc function is useful for slicing data:

x = df.iloc[:, 0:1] → Selects the first column (Area).
y = df.iloc[:, -1] → Selects the last column (Rent).

Splitting Data: Training and Testing

We need to split our data into training and testing sets. This helps us evaluate how well our model performs on unseen data.

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2)

Explanation:

train_test_split(): Splits arrays or matrices into random training and testing subsets.
Arguments:
- test_size=0.2: Allocates 20% of the data to testing, 80% to training.
- random_state=2: Ensures reproducibility of the split.

Fun Fact 🤓

Scikit-Learn, the library we’re using, was originally a Google Summer of Code project. It has since grown into one of the most widely-used tools for machine learning.

Implementing Linear Regression

Time to build and train our linear regression model! 🎉

from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(x_train, y_train)

Explanation:

LinearRegression(): Initializes the linear regression model.
fit(): Trains the model using:
- x_train: Training feature data (area).
- y_train: Training target data (rent).

Making Predictions

Let’s test our model by predicting the rent for a sample area from our test data.

# Predict rent for a specific area in the test set
lr.predict(x_test.loc[[x_test.index[2]], ['area']])

Explanation:

The predict() method takes data points as input and returns predicted target values (e.g., rent).
x_test.loc[]: Retrieves specific rows and columns using label-based indexing.
[x_test.index[2]]: Selects the third row from the test set by its index.
['area']: Ensures only the 'area' column is used as the feature for prediction.

Output 🏡

Our model predicts a rent of ₹21,112, which is quite close to the actual rent of ₹21,500! 🎯

Wrapping Up

In this tutorial, we:

Explored the House Rent Prediction Dataset.
Visualized the relationship between area and rent.
Built a Linear Regression Model using Scikit-Learn.
Made predictions and validated their accuracy.

What’s Next?

Learn multivariate linear regression!
Happy coding 🌠

DEV Community

Predicting House Rent with Linear Regression in Python

Table of Contents

Introduction

Dataset

Fun Fact 🧠

Visualizing the Data

Observation 📊

Splitting Data: Features and Target

Pro Tip 📝

Splitting Data: Training and Testing

Fun Fact 🤓

Implementing Linear Regression

Making Predictions

Output 🏡

Wrapping Up

What’s Next?

Top comments (0)