Introduction:
"Torture the data, and it will confess to anything.” – Ronald Coase
In the vast field of machine learning, regression models play a vital role in understanding and predicting continuous outcomes. Regression is a supervised learning algorithm. It establishes the relationship between a dependent (target) variable and one or several independent variables. It is widely used in finance, marketing, healthcare, etc. Usage of regression models varies according to the nature of data involved.
In this article, we will explore the concept of regression machine learning models, their applications, and popular algorithms used for regression tasks.
Regression analysis
Regression analysis is a predictive modelling technique to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variable. It helps us understand how the dependent variable changes corresponding to the independent variables. For example, predicting checking the number of ice creams sold(target) by using the temperature (independent variable).
The primary goal of regression models is to find a mathematical function that best fits the observed data points, allowing us to predict the value of the dependent variable. In Regression, the predicted output values are real numbers. It deals with problems such as predicting the price of a house or the trend in the stock price at a given time, etc.
Types of regression models
Linear Regression
This regression technique finds out a linear relationship between a dependent variable and the other given independent variables. The below-given equation is used to denote the linear regression model:
y=mx+c+e
where m is the slope of the line, c is an intercept, and e represents the error in the model.
Train and evaluating linear regression
We start by splitting the dataset into train and test
from sklearn.model_selection import train_test_split
# Split data 70%-30% into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)
print ('Training Set: %d rows\nTest Set: %d rows' % (X_train.shape[0], X_test.shape[0]))
we then fit the model to train
# Train the model
from sklearn.linear_model import LinearRegression
# Fit a linear regression model on the training set
model = LinearRegression().fit(X_train, y_train)
print (model)
Predict
predictions = model.predict(X_test)
np.set_printoptions(suppress=True)
print('Predicted labels: ', np.round(predictions)[:10])
print('Actual labels : ' ,y_test[:10])
Evaluate
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(y_test, predictions)
print("MSE:", mse)
rmse = np.sqrt(mse)
print("RMSE:", rmse)
r2 = r2_score(y_test, predictions)
print("R2:", r2)
Here's, an example notebook
In this notebook, we'll focus on regression, using an example based on a real study in which data for a bicycle sharing scheme was collected and used to predict the number of rentals based on seasonality and weather conditions. We'll use a simplified version of the dataset from that study
Top comments (0)