DEV Community

vindianadoan
vindianadoan

Posted on • Edited on

Why is SSE commonly used in linear regression?

SSE (sum of squared errors) is commonly used in linear regression because it has some desirable mathematical properties and is relatively easy to optimize using techniques like gradient descent.

One reason SSE is used is because it is a squared error metric. Squaring the errors has a few benefits. First, it makes the errors positive, which means that errors in both directions (i.e., overpredictions and underpredictions) are treated equally. Second, it penalizes large errors more heavily than small errors. This is desirable because we typically care more about reducing large errors than small errors.

Another reason SSE is used is because it leads to a convex optimisation problem. In other words, the cost function has a single global minimum that can be found using gradient descent or other optimization techniques. This is important because it guarantees that the optimisation algorithm will converge to the optimal solution.

Finally, SSE is easy to work with mathematically. The cost function is differentiable, which means that we can calculate its gradient (i.e., the partial derivatives of the cost function with respect to the weights and bias) analytically. This makes it possible to use optimization techniques like gradient descent to find the optimal values of the model parameters.

In addition to SSE, other types of errors used in machine learning include:

Mean Absolute Error (MAE): This measures the absolute differences between the predicted values and the actual values. MAE is less sensitive to outliers compared to SSE.

Root Mean Squared Error (RMSE): This measures the square root of the average of the squared differences between the predicted values and the actual values. RMSE is used when the errors follow a normal distribution.

Mean Absolute Percentage Error (MAPE): This measures the absolute percentage difference between the predicted values and the actual values. MAPE is commonly used in forecasting and time series analysis.

R-squared (R2): This measures the proportion of the variance in the dependent variable that is explained by the independent variables. R2 ranges from 0 to 1, where 1 indicates a perfect fit.

The choice of error metric depends on the type of problem and the nature of the data. Some errors are more appropriate for certain types of data, while others may be more robust to outliers or more intuitive to interpret. It is important to choose the appropriate error metric that aligns with the problem and evaluate the model's performance using that metric.

Top comments (0)