DEV Community

Silvia-nyawira
Silvia-nyawira

Posted on

Week 2 project :Comparing linear regression and random forest regression models for Airbnb booking prices prediction.

Introduction

In statistics,** linear regression** is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear

  1. linear regression can be used if the goal is;
  • Error reduction in prediction or forecasting in smaller data sets Simple and Straight forward interpretability
  • To explain variation in the response variable that can be attributed to variation in the explanatory variables
  • To quantify the strength of the relationship between the response and the explanatory variables,
  1. ** Random forest Regression** Random forest is a statistical algorithm that is used to cluster points of data in functional groups. When the data set is large and/or there are many variables it becomes difficult to cluster the data because not all variables can be taken into account, therefore the algorithm can also give a certain chance that a data point belongs in a certain group.

Random forest regression can be used when the goal is;

-To capture complex non linear relationships

  • To provide feature important scores
  • To capture intricate patterns
  • To provide more stable and robust prediction to when dealing with larger data sets

To make the decision I tested the two models using the same dataset and from the output RandomForest regression was the most fit model since it had lesser Mean Squared Error.
Here's a link to the project Airbnbs Price Prediction.)

Top comments (0)