DEV Community

Cover image for Learn regression model
Mugi  Mugendi
Mugi Mugendi

Posted on

Learn regression model

Introduction:

"Torture the data, and it will confess to anything.” – Ronald Coase

In the vast field of machine learning, regression models play a vital role in understanding and predicting continuous outcomes. Regression is a supervised learning algorithm. It establishes the relationship between a dependent (target) variable and one or several independent variables. It is widely used in finance, marketing, healthcare, etc. Usage of regression models varies according to the nature of data involved.

In this article, we will explore the concept of regression machine learning models, their applications, and popular algorithms used for regression tasks.

Regression analysis

Regression analysis is a predictive modelling technique to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variable. It helps us understand how the dependent variable changes corresponding to the independent variables. For example, predicting checking the number of ice creams sold(target) by using the temperature (independent variable).

The primary goal of regression models is to find a mathematical function that best fits the observed data points, allowing us to predict the value of the dependent variable. In Regression, the predicted output values are real numbers. It deals with problems such as predicting the price of a house or the trend in the stock price at a given time, etc.

Types of regression models

Linear Regression
This regression technique finds out a linear relationship between a dependent variable and the other given independent variables. The below-given equation is used to denote the linear regression model:

y=mx+c+e

where m is the slope of the line, c is an intercept, and e represents the error in the model.

Image description

Train and evaluating linear regression

We start by splitting the dataset into train and test

from sklearn.model_selection import train_test_split

# Split data 70%-30% into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

print ('Training Set: %d rows\nTest Set: %d rows' % (X_train.shape[0], X_test.shape[0]))
Enter fullscreen mode Exit fullscreen mode

we then fit the model to train

# Train the model
from sklearn.linear_model import LinearRegression

# Fit a linear regression model on the training set
model = LinearRegression().fit(X_train, y_train)
print (model)
Enter fullscreen mode Exit fullscreen mode

Predict

predictions = model.predict(X_test)
np.set_printoptions(suppress=True)
print('Predicted labels: ', np.round(predictions)[:10])
print('Actual labels   : ' ,y_test[:10])
Enter fullscreen mode Exit fullscreen mode

Evaluate

from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y_test, predictions)
print("MSE:", mse)

rmse = np.sqrt(mse)
print("RMSE:", rmse)

r2 = r2_score(y_test, predictions)
print("R2:", r2)

Enter fullscreen mode Exit fullscreen mode

Here's, an example notebook

In this notebook, we'll focus on regression, using an example based on a real study in which data for a bicycle sharing scheme was collected and used to predict the number of rentals based on seasonality and weather conditions. We'll use a simplified version of the dataset from that study

Top comments (0)