vindianadoan

Posted on Feb 25, 2023

Supervised learning - Linear regression model

#machinelearning #mathematics #linearregression #algorithms

Linear Regression Model

Concept map:
Linear regression model $\rightarrow$ SSE $\rightarrow$ Gradient descent $\rightarrow$ Training the model

1. Linear Regression Model:
Linear regression is a type of supervised learning algorithm used to predict a continuous target variable based on one or more input features. Other common supervised learning algorithms include logistic regression, decision trees, and neural networks.
The goal of linear regression is to find the best linear relationship between the input features and the target variable. The model takes the form of a linear equation:

y = b_0 + b_1 x_1 + b_2 x_2 + ... + b_n x_n

where:

y

is the target variable

x_1, x_2, ..., x_n

are the input features

b_0

is the y-intercept (also known as the bias)

b_1, b_2, ..., b_n

are the coefficients (also known as weights) of the input features

The objective of linear regression is to find the values of $b_0, b_1, b_2, ..., b_n$ that minimize the difference between the predicted and actual values of $y$ . This is typically done by minimizing the sum of squared errors (SSE) between the predicted and actual values.

2. SSE
By definition:

SSE = \sum_{i=1}^n (y_i - \hat{y_i})^2

where:

y_i

is the actual value of the target variable for the

i^{th}

data point

\hat{y_i}

is the predicted value of the target variable for the

i^{th}

data point
The minimization of SSE is typically achieved using gradient descent, which is an optimization algorithm that iteratively adjusts the values of

b_0, b_1, b_2, ..., b_n

to find the optimal values that minimize SSE.

3. Gradient Descent:
Gradient descent is an iterative optimization algorithm that is used to find the optimal values of the model parameters (weights and biases) that minimize the cost function (SSE in this case). It works by updating the parameters in the opposite direction of the gradient of the cost function with respect to the parameters.

The update rule for the weights in gradient descent is:

w_i = w_i - \alpha \frac{\partial Cost}{\partial w_i}

where:

w_i

is the

i^{th}

weight

\alpha

is the learning rate or step size for each iteration of gradient descent

\frac{\partial Cost}{\partial w_i}

is the partial derivative of the cost function with respect to the

i^{th}

weight
The update rule for the bias term is similar:

b_0 = b_0 - \alpha \frac{\partial Cost}{\partial b_0}

where:

b_0

is the bias term

\frac{\partial Cost}{\partial b_0}

is the partial derivative of the cost function with respect to the bias term

The partial derivatives of the cost function with respect to the weights and bias can be calculated using calculus.

4. Training the Model:
To train the model, we first initialize the weights and bias to some random values. We then iteratively update the weights and bias using the gradient descent algorithm until the cost function reaches a minimum or a predefined stopping criterion is met (e.g., maximum number of iterations reached).

Once the model is trained, we can use it to make predictions on new data by plugging in the values of the input features into the linear equation and calculating the predicted value of the target variable.

DEV Community

Supervised learning - Linear regression model

Linear Regression Model

Top comments (0)

Read next

Machine Learning in Algorithmic Trading: The Global Impact and India’s Rising Role

Demystifying Algorithms: Brute Force

Exposing LLM-Controlled Robots' Vulnerability to Jailbreaking Physical Attacks

1574. Shortest Subarray to be Removed to Make Array Sorted