Discussion on: Machine Learning Classification vs Regression

View post

Regression aims to predict a continuous output value. For example, say that you are trying to predict the revenue of a certain brand as a function of many input parameters. A regression model would literally be a function which can output potentially any revenue number based on certain inputs. It could even output revenue numbers which never appeared anywhere in your training set.

Classification aims to predict which class (a discrete integer or categorical label) the input corresponds to. e.g. let us say that you had divided the sales into Low and High sales, and you were trying to build a model which could predict Low or High sales (binary/two-class classication). The inputs might even be the same as before, but the output would be different. In the case of classification, your model would output either "Low" or "High," and in theory every input would generate only one of these two responses.

The above description is is true for any data science method; but, there is a gray area: There are algorithms which predict probability, which is a continuous value, between 0 and 1. By the above definition, you can consider them regression algorithms (think of logistic regression). At the same time, this probability refers to classes, so they can be used for classification (just set a threshold for probability: everything with probability < 0.5 goes into one class, and with > 0.5 into the other). How you "classify" these algorithms is a philosophical question, of little practical importance.

net-informations.com/ds/iq/default...