Table of Contents
- Introduction
- Dataset
- Visualizing the Data
- Splitting Data: Features and Target
- Splitting Data: Training and Testing
- Implementing Linear Regression
- Making Predictions
- Wrapping Up
- What’s Next?
Introduction
Are you curious about predicting house rent based on factors like area? Let's walk through building a simple predictive model using Python! We’ll use the House Rent Prediction Dataset from Kaggle and tools like Google Colab, Pandas, NumPy, and Matplotlib. For machine learning, we’ll leverage Scikit-Learn.
Dataset
We’ll use the dataset from Kaggle: House Rent Prediction. Download the .xlsx file
import pandas as pd
# Load the Excel file
df = pd.read_excel("Rent.xlsx")
Explanation:
- The pd.read_excel()function from the pandas library reads data from an Excel file.
- Parsing Data: The function parses the data from the file and creates a pandas DataFrame object.
- 
DataFrame (df): This variable now holds the data from the Excel file in a structured format that can be easily manipulated and analyzed.
# Preview the data
df.head()
Explanation:
- 
.head(): Displays the first 5 rows of the DataFrame. This is a quick way to preview the structure of the data, including its columns, datatypes, and a sample of the actual values.
Fun Fact 🧠
The .head() function gets its name because it displays the "head" or the first few rows of a dataset—just like a quick peek at the top of a document.
Visualizing the Data
To understand the relationship between area and rent, let’s plot a scatter plot.
import matplotlib.pyplot as plt
plt.scatter(df['area'], df['rent'])
plt.xlabel('area')
plt.ylabel('rent')
plt.show()
Explanation:
- 
plt.scatter(): Creates a scatter plot with:- 
x=df['area']: Values on the x-axis (independent variable, area).
- 
y=df['rent']: Values on the y-axis (dependent variable, rent).
 
- 
- 
plt.xlabel()andplt.ylabel(): Add labels for the x and y axes.
Observation 📊
Here’s what we observe from the plot:
The relationship between area and rent seems somewhat linear—perfect for a Linear Regression Model!
Splitting Data: Features and Target
We’ll now separate our dataset into features (X) and target (Y).
# Selecting the feature and target variables
x = df.iloc[:, 0:1]  # Feature: Area
y = df.iloc[:, -1]   # Target: Rent
y
Explanation:
- 
iloc: This function provides integer-based indexing to select specific rows and columns.
- 
[:, 0:1]: Selects all rows and the first column (0:1includes column index 0, excludes column index 1).
- 
[:, -1]: Selects all rows and the last column (-1refers to the last column).
Pro Tip 📝
The iloc function is useful for slicing data:
- 
x = df.iloc[:, 0:1]→ Selects the first column (Area).
- 
y = df.iloc[:, -1]→ Selects the last column (Rent).
Splitting Data: Training and Testing
We need to split our data into training and testing sets. This helps us evaluate how well our model performs on unseen data.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2)
Explanation:
- 
train_test_split(): Splits arrays or matrices into random training and testing subsets.
- 
Arguments:
- 
test_size=0.2: Allocates 20% of the data to testing, 80% to training.
- 
random_state=2: Ensures reproducibility of the split.
 
- 
Fun Fact 🤓
Scikit-Learn, the library we’re using, was originally a Google Summer of Code project. It has since grown into one of the most widely-used tools for machine learning.
Implementing Linear Regression
Time to build and train our linear regression model! 🎉
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(x_train, y_train)
Explanation:
- 
LinearRegression(): Initializes the linear regression model.
- 
fit(): Trains the model using:- 
x_train: Training feature data (area).
- 
y_train: Training target data (rent).
 
- 
Making Predictions
Let’s test our model by predicting the rent for a sample area from our test data.
# Predict rent for a specific area in the test set
lr.predict(x_test.loc[[x_test.index[2]], ['area']])
Explanation:
- The predict()method takes data points as input and returns predicted target values (e.g., rent).
- 
x_test.loc[]: Retrieves specific rows and columns using label-based indexing.
- 
[x_test.index[2]]: Selects the third row from the test set by its index.
- 
['area']: Ensures only the 'area' column is used as the feature for prediction.
Output 🏡
Our model predicts a rent of ₹21,112, which is quite close to the actual rent of ₹21,500! 🎯
Wrapping Up
In this tutorial, we:
- Explored the House Rent Prediction Dataset.
- Visualized the relationship between area and rent.
- Built a Linear Regression Model using Scikit-Learn.
- Made predictions and validated their accuracy.
What’s Next?
Learn multivariate linear regression!
Happy coding 🌠
 
 
              


 
    
Top comments (0)