The idea behind simple linear regression is to "fit" the observations of two variables into a linear relationship between them. Graphically, the task is to draw the line that is "best-fitting" or "closest" to the points.
The equation of a straight line is written using the
y = mx + b, where
m is the slope (Gradient) and
b is y-intercept (where the line crosses the Y axis).Where,
m is the slope (Gradient) and
b is y-intercept (Bias).In calculation of mean and y-intercept, we use some of the mathematical concepts explained below:
This term is used to describe properties of statistical distributions. It is determined by adding all the data points in a population and then dividing the total by the number of points. The resulting number is known as the mean or the average.
x̄ = Sum of observations / number of observations
σ2) is a measurement of the spread between numbers in a data set. That is, it measures how far each number in the set is from the mean and therefore from every other number in the set.
Variance = n * sum of all(xi − x̄)2
xi = ith data point
x̄ = the mean of all data points
n = the number of data points
Covariance is a measure of how much two random variables vary together. It’s similar to variance, but where variance tells you how a single variable varies, co-variance tells you how two variables vary together.Square root of variance is called
Cov(X,Y) = Σ (E(X) - μ) * (E(Y) - ν) / (n - 1)
Xis a random variable
μis the expected value (the mean) of the random variable
νis the expected value (the mean) of the random variable
n= the number of items in the data set
Sy= Standard deviation of
The slope of the line is calculated as the change in y divided by change in x.
slope m = change in y / change in x
The y-intercept over bias shall be calculated using the formula
y = m(x - x1) + y1
These values are different from what was actually there in the training set and if we plot this (x, y) graph against the original graph, the straight line will be way off the original points in the graph. This may lead to error which is the difference of values between actual points and the points on the straight line. Ideally, we’d like to have a straight line where the error is minimized across all points.Error can be reduced using many mathematical ways. One of such method is "Least Square Regression"
Least Square Regression is a method which minimizes the error in such a way that the sum of all square error is minimized.
m = (Σ ((x - x̄) * (y - ȳ)) / Σ (x - x̄))2
m = r(Sy / Sx)
(and we get the y-interept)
b = ȳ - m * x̄
Sx is standard deviation of
Sy is standard deviation of
r is correlation between
m is slope
b is the y-intercept
This method is intended to reduce the sum square of all error values. The lower the error, lesser the overall deviation from the original point.
The cost function calculates the square of the error for each example in the dataset, sums it up and divides this value by the number of examples in the dataset (denoted by
m).This cost function helps in determining the best fit line.The cost function for two variables
θ1 denoted by
J and is given as follows.
Now, we have to make use of cost function to adjust our parameters
θ1 such that they result in the least cost function value. We make use of a technique called Gradient Descent to minimize the cost function.
This series is made possible by help from: