In the world of machine learning, understanding the core concepts of how models are trained is essential. One fundamental approach is supervised learning, where we provide the algorithm with a training set comprising input features and their corresponding output targets. The learning algorithm then processes this information and produces a function, which we'll denote as f.
Historically, this function f has been referred to as a hypothesis. The primary role of f is to take a new input, denoted as x, and output an estimate or prediction, which we'll call ŷ (y-hat).
Linear Regression: Fitting the Data
One common method within supervised learning is linear regression. The goal of linear regression is to determine the values for the parameters w (weights) and b (bias) so that the resulting straight line from the function f fits the data points well.
Here's a quick breakdown:
- f is the model or function.
- x is the input or input features.
- ŷ (y-hat) is the output of the model, representing the prediction or estimate for y.
The model's prediction, ŷ, is an estimate of the true value y. When we simply use the symbol y, we refer to the target value, which is the actual true value within the training set. It's crucial to note that ŷ is merely an estimate and might not always align perfectly with the actual true value y.
The Role of Parameters w and b
It's important to understand that w and b are numbers, and the values chosen for them determine the prediction ŷ based on the input feature x. When we write f{w,b}(x), it means f is a function that takes x as input, and depending on the values of w and b, f will output some prediction ŷ. For simplicity, we'll often write f(x) without explicitly including w and b in the subscript, though it means the same thing as f{w,b}(x).
In simpler terms, this function f uses the input feature x to predict the output y.
Application: House Sizes and Prices
The image above provides a clear illustration of this concept. It shows how a linear regression model predicts house prices based on their sizes. The graph demonstrates a positive correlation between house size (in square feet) and price (in thousands of dollars). The blue regression line indicates the relationship, highlighting how larger houses tend to have higher prices.
To illustrate these concepts, let's consider the example of predicting house prices based on house sizes:
- x represents the house size in square feet.
- y is the actual house price in thousands of dollars.
- ŷ (y-hat) is the predicted house price based on the input feature x.
For example, consider a house size of 1,250 square feet (x = 1250). The actual price of the house might be $220,000 (y = 220).
To make a prediction, we use the function f{w,b}(x) = wx + b. This function helps us find the values of ŷ (predicted price) by minimizing the difference between y (actual price) and ŷ (predicted price). In a real scenario, the parameters w and b are calculated by the computer using the training data. For illustration, let's assume the model finds w = 0.15 and b = 100. Thus, the prediction function would be: f(x) = 0.15x + 100. This is just a hypothetical example to illustrate the concept.
Applying this function:
For a house size of 1,250 square feet (x = 1250), the predicted price (ŷ) would be:
ŷ = 0.15(1250) + 100 = 187.5 + 100 = 287.5This would mean the model predicts the house price to be $287,500.
Comparing this prediction (ŷ = 287.5) to the actual price (y = 220), we can see there's a difference. The goal of the linear regression model is to minimize such differences across all data points to make the predictions as accurate as possible.
In summary, the linear regression model uses the input feature (x), applies the learned parameters (w and b), and provides a prediction (ŷ) that estimates the true value (y).
Why Use a Linear Function?
You might wonder, why use a straight-line (linear) function instead of a more complex curve or parabola (non-linear function)? Although non-linear functions can sometimes fit data more accurately, linear functions are simpler and easier to work with. They serve as a great starting point because they are easier to understand and interpret.
By using a linear function, we can gain a foundational understanding of the relationship between the input and output variables. Once this foundation is established, we can build on it and move to more complex non-linear models that might capture the intricacies of the data better. Essentially, mastering linear regression provides the stepping stones necessary to tackle more sophisticated machine learning models effectively.
This post is based on notes taken from Andrew Ng's Machine Learning Specialization course on Coursera.
Top comments (0)