Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable, although many more complex extensions exist.
— Wikipedia.
— All the images (plots) are generated and modified by Author.
Probably, for every Data Practitioner, the Linear Regression happens to be the starting point when implementing Machine Learning, where you learn about foretelling a continuous value for the given independent set of rules.
Why Logistic, not Linear?
Let us start with the most basic one, in Binary Classification, the model should be able to predict the dependent variable as one of the two probable class which could be 0 or 1. If we consider using Linear Regression, we can predict the value for the given set of rules as input to the model but it will forecast continuous values like 0.03, +1.2, -0.9, etc. which is not suitable for categorizing it in one of the two classes neither identifying it as a probability value to predict a class.
E.g. When we have to predict if a website is malicious when the length of the URL is given as a feature, the response variable has two values, benign and malicious.
If we try to fit a Linear Regression model to a binary classification problem, the model fit will be a straight line and can be seen why it is not suitable for using the same.
To overcome this problem, we use a sigmoid function, which tries to fit an exponential curve to the data to build a good model.
Logistic/Sigmoid Function
The Logistic Regression can be explained with Logistic function, also known as Sigmoid function that takes any real input x, and outputs a probability value between 0 and 1 which is defined as,
The model fit using the above Logistic function can be seen as below:
Further, for any given independent variable t, let us consider it as a linear function in a univariate regression model, where β0 is the intercept and β1 is the slope and is given by,
The general Logistic function p which outputs a value between 0 and 1 will become,
We can see that the data separable into two classes can be modelled using a Logistic function for the given variable in a linear function. But the relation between the input variable x and output probability cannot be interpreted easily which is given by the sigmoid function, we introduce the Logit (log-odds) function now that makes this model interpretable in a linear fashion.
Logit (Log-Odds) Function
The Log-odds function, a.k.a natural logarithm of the odds, is an inverse of the standard Logistic function which can be defined and further simplified as,
In the above equation, the terms are as follows:
g is the logit function. The equation for g(p(x)) shows that the logit is equivalent to linear regression expression
ln denotes the natural logarithm
p(x) is the probability of the dependent variable that falls in one of the two classes 0 or 1, given some linear combination of the predictors
β0 is the intercept from the linear regression equation
β1 is the regression coefficient multiplied by some value of the predictor
On further simplifying the above equation and exponentiating both sides, we can deduce the relationship between the probability and the linear model as,
The left term is called odds, which is defined as equivalent to the exponential function of the linear regression expression. With ln (log base e) on both sides, we can interpret the relation as linear between the log-odds and the independent variable x.
Why Regression?
The change in probability p(x) with change in variable x can not be directly understood as it is defined by the sigmoid function. But with the above expression, we can interpret that the change in log-odds of variable x is linear concerning a change in variable x itself. The plot of log-odds with linear equation can be seen as,
The probability outcome of the dependent variable shows that the value of the linear regression expression can vary from negative to positive infinity and yet, after transformation with sigmoid function, the resulting expression for the probability p(x) ranges between 0 and 1, i.e. 0<p<1. Therefore, this is what makes Logistic Regression a Classification algorithm being regression, that classifies the value of linear regression to a particular class depending upon the decision boundary.
Decision Boundary
The decision boundary is defined as a threshold value that helps us to classify the predicted probability value given by sigmoid function into a particular class, positive or negative.
Linear Decision Boundary
When two or more classes can be linearly separable,
Non-Linear Boundary
When two or more classes are not linearly separable,
Multi-Class Classification
The basic intuition behind Multi-Class and Binary Logistic Regression is the same. However, for a multi-class classification problem, we follow a ***one v/s all classification***. If there are multiple independent variables for the model, the traditional equation is modified as,
Here, the Log-Odds can be defined as linearly related to multiple independent variables present when the linear regression becomes multiple regression with m explanators.
Eg. If we have to predict whether the weather is sunny, rainy, or windy, we are dealing with a Multi-class problem. We turn this problem into three binary classification problem i.e whether it is sunny or not, whether it is rainy or not and whether it is windy or not. We run all three classifications independently on input features and the classification for which the value of probability is the maximum relative to others, becomes the solution.
Conclusion
Logistic regression is one of the most simple Machine Learning models. They are easy to understand, interpretable, and can give pretty good results. Every practitioner using Logistic Regression out there must know about the Log-Odds which is the main concept behind this learning algorithm. The Logistic regression is very much interpretable considering the business needs and explanation regarding how the model works concerning different independent variables used in the model. This post aimed to provide an easy way to understand the idea behind regression and transparency provided by Logistic Regression.
Thanks for reading. You can find my other Machine Learning related posts here.
I hope this post has been useful. I appreciate feedback and constructive criticism. If you want to talk about this article or other related topics, you can drop me a text here or at LinkedIn.
Top comments (4)
Great post @imsparsh would love to see more such awesome content by you on DEV.
Thanks :)
This is really an outstanding post @imsparsh .
I am really looking forward to more such posts.
Thanks :)