Disclaimer: My study note on Linear Regression
Linear Regression is an algorithm used to find the best fit line( trend) for a set of data points.
It revolves around the straight-line equation
where
- y is the dependent variable( the variable we went to predict),
- X is the dependent variable ( or the predictor or feature variable) and
- the two parameters or weights which are the slope and intercepts respectively.
The whole game or idea in a linear regression model is to adjust the weights to find the best-suited line that fits the data enough to predict the y variable accurately or near-accurately. In turn, the best-fit line is the equation which has the least error/lowest error
There are two major ways of getting the best line by adjusting the weights
- The absolute and square trick
- The error functions (MAE OR MSE) along with gradient descent
Types of Gradients
There are 3 major types
- Batch Gradient descent: This involves summing all the errors and updating the weights
- Stochastic Gradient Descent: This involves using each point to update the weights
- Mini-Batch Gradient Descent: This involves splitting the data into small equal batches and use each batch to update the weights.
(N:B, the mini-batch method is most times used because the earlier two types can be computationally expensive.)
It would be great to understand the mechanics behind each Gradient descent visually
Dimensions in Linear Regression Models
- Two-dimension: This includes one feature variable and a dependent variable. In this case, the prediction is in the form of a line
- Three-dimension: This includes two feature variables and a dependent variable. In this case, the prediction is in the form of a plane
- N-dimension: This includes n-number of features along with a dependent variable. In this case, the prediction is in the form of a hyperplane.
It is good to note that when we have one feature(X) variable, it is known as a Simple Linear Regression because it builds a simple model. More than one feature variable makes it a Multiple Linear regression and as more features are introduced, the more the model becomes complex.
There are two major red flags to note about Linear Regression Model.
- It works best when the data is linear or has a linear relationship. If the x and y variable has no linear relationship, you may need to
- Make adjustments (or transform the data)
- Add more features
- Use another type of non-linear model
- Linear Regression is sensitive to outliers.
Polynomial Regression is a method to transform the data that have a non-linear relationship. It is more of a preprocessing/data transformation method. It also adds more complexity to the linear model. It would be great to understand more on its intuition.
Regularization is a way to penalize complex models and help smoothen the line to make the model more generalized.
There are two types of Regularization: L1 and L2 which are penalties that can be found in the lasso regression and Ridge Regression respectively( These are variations of the vanilla Linear Regression). In each of the Variation models, the parameter lambda is used to regulate these penalties.
Below is an image that shows the difference between L1 and L2
Feature Scaling
This is a way of transforming your data into a common range of values. There are two types
- Standardizing - This is a process of subtracting each data point with the mean and dividing by the standard deviation.
- Normalizing - This is a process where the data is scaled between 0 and 1.
When to feature scale
- When your algorithm uses a distance-based metric to predict like SVMs, KNN, K-means, etc.
- When you incorporate regularization.
Some questions to ponder are:
- Can feature scaling improve the linearity of the variables?
- How does scaling affect distance-based models or regularization?
Top comments (1)
This is awesome.