Welcome back, data enthusiasts! Today, we start a new chapter in the journey of linear regression. We learned about the simple linear regression that works when we have one input variable- but what if we have more than one factor influencing our predictions? in that case, we use the Multiple linear regression.
From Lines to Planes and Beyond:
Let's go with a similar kind of example as of previous one. Imagine you are trying to predict the placement of the student in Lakh per Annum using the CGPA and the IQ of the students. In this case, we have two input variables one is CGPA the other is IQ and our output variable is placement(LPA).
As in simple linear regression, we tried to draw a line that makes fewer mistakes on data, similarly in this case we will try to make a plane that makes fewer mistakes on the training data.
Equation Behind Multiple Linear Regression:
Simple linear regression used a straight line (y = mx + c). In multiple linear regression, the equation gets a bit more complex, but the underlying concept remains the same - finding the best fit for the data. Here is the equation for a plane in 3D ( applicable to multiple linear regression with two input variables):
Here:
- Y is the dependent variable,
- X1 and X2 are input Variables,
- b0, b1 and b2 are coefficients.
But in case of more than two input variables we try to draw a hyperplane(nD) in the higher dimension, then the equation becomes:
Coefficients:
The coefficients are just like the weightage(how much the input depends upon the respective input variable).
For example, if one of the coefficients is zero that means the output of the variable does not depend on the respective input variable.
Our task is mainly to find the intercept and the coefficients of the equation.
Implementing with scikit-learn
In the scikit-learn we can use multiple linear regression using the same class LinearRegression() but this time instead of single input we provide the data frame with n input features.
Let's just take an example:
import numpy as np
from sklearn.linear_model import LinearRegression
# Sample data (2 independent variables)
X = np.array([[1, 2], [3, 4], [5, 6]]) # Independent variables
Y = np.array([5, 7, 11]) # Dependent variable
# Create a linear regression model
model = LinearRegression()
# Fit the model to the data
model.fit(X, Y)
# Print the coefficients (b0, b1, b2)
print("Intercept (b0):", model.intercept_)
print("Coefficients (b1, b2):", model.coef_)
# Predict new values
new_data = np.array([[7, 8]])
pred = model.predict(new_data)
print("Predicted value for new data:", pred)
Conclusion
Congratulations! We have moved from predicting along a straight line to predicting on a plane in higher-dimensional space. We will next discuss the Mathematical formulation of the Multiple Linear Regression. Stay tuned for that!
Top comments (0)