Linear regression

Disclaimer: My study note on Linear Regression

Linear Regression is an algorithm used to find the best fit line( trend) for a set of data points.
It revolves around the straight-line equation

y = mX + b

where

y is the dependent variable( the variable we went to predict),
X is the dependent variable ( or the predictor or feature variable) and
the two parameters or weights which are the slope and intercepts respectively.

The whole game or idea in a linear regression model is to adjust the weights to find the best-suited line that fits the data enough to predict the y variable accurately or near-accurately. In turn, the best-fit line is the equation which has the least error/lowest error

There are two major ways of getting the best line by adjusting the weights

The absolute and square trick
The error functions (MAE OR MSE) along with gradient descent

Types of Gradients

There are 3 major types

Batch Gradient descent: This involves summing all the errors and updating the weights
Stochastic Gradient Descent: This involves using each point to update the weights
Mini-Batch Gradient Descent: This involves splitting the data into small equal batches and use each batch to update the weights.

(N:B, the mini-batch method is most times used because the earlier two types can be computationally expensive.)

It would be great to understand the mechanics behind each Gradient descent visually

Dimensions in Linear Regression Models

Two-dimension: This includes one feature variable and a dependent variable. In this case, the prediction is in the form of a line
Three-dimension: This includes two feature variables and a dependent variable. In this case, the prediction is in the form of a plane
N-dimension: This includes n-number of features along with a dependent variable. In this case, the prediction is in the form of a hyperplane.

It is good to note that when we have one feature(X) variable, it is known as a Simple Linear Regression because it builds a simple model. More than one feature variable makes it a Multiple Linear regression and as more features are introduced, the more the model becomes complex.

There are two major red flags to note about Linear Regression Model.

It works best when the data is linear or has a linear relationship. If the x and y variable has no linear relationship, you may need to
- Make adjustments (or transform the data)
- Add more features
- Use another type of non-linear model
Linear Regression is sensitive to outliers.

Polynomial Regression is a method to transform the data that have a non-linear relationship. It is more of a preprocessing/data transformation method. It also adds more complexity to the linear model. It would be great to understand more on its intuition.

Regularization is a way to penalize complex models and help smoothen the line to make the model more generalized.
There are two types of Regularization: L1 and L2 which are penalties that can be found in the lasso regression and Ridge Regression respectively( These are variations of the vanilla Linear Regression). In each of the Variation models, the parameter lambda is used to regulate these penalties.

Below is an image that shows the difference between L1 and L2

Feature Scaling

This is a way of transforming your data into a common range of values. There are two types

Standardizing - This is a process of subtracting each data point with the mean and dividing by the standard deviation.
Normalizing - This is a process where the data is scaled between 0 and 1.

When to feature scale