Harsimranjit Singh

Posted on Jun 27

Maximum Likelihood Estimation with Logistic Regression

In our previous article, we introduce logistic regression, a fundamental technique in machine learning used for binary classification. Logistic regression predicts the probability of binary outcomes based on input features.
This article dives into the mathematical foundation of logistic regression.

Understanding Likelihood

Likelihood, refers to the chance of observing a specific outcome or event given a particular model or set of conditions.
Breakdown to understand better:

Focus on Specific Outcome: Unlike probability, which deals with the general chance of an event happening, likelihood focuses on a specific outcome given something else is true.
Model-Based: We use the model to calculate the likelihood of observing a specific data set assuming the model's parameters are true.
Higher Likelihood: Higher the likelihood of the model means parameters are better fit for explaining the data.

Example

Imagine you have a coin that might be biased, and you flip it 5 times, getting the results:
Heads, Tails, Heads, Heads, Tails. You want to estimate the probability θ of getting heads.
1-> Suppose θ = 0.5:

The likelihood of getting the sequence is: L(0.5) = P(H) x P(T) X P(H) X P(H) X P(T) = 0.5 * 0.5 * 0.5 * 0.5 * 0.5 = 0.03125

2-> Suppose θ = 0.7:

The likelihood of getting the same sequence is : L(0.7) = P(H) x P(T) X P(H) X P(H) X P(T) = 0.7 * 0.7 * 0.7 * 0.7 * 0.7 = 0.1029

Difference between likelihood and probability

Probability: Focuses on the general chance of an event happening in the long run.
Likelihood: Focuses on the chance of observing a specific outcome given a particular scenario.

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) is a method used to estimate the parameters of the statistical model. The goal is to find the parameter values that maximize the likelihood function, with best fitting the observed data.

Step-by-Step

Define the Likelihood Function: The likelihood function L(θ) represents the probability of observing the data as a function of the model parameters θ.
Log-Likelihood Function: For mathematical convenience, we often use the log-likelihood function L(θ), which is the natural log of likelihood function:
L(θ) = log L(θ)
Maximize the log-likelihood: Find the parameter values that maximize the log-likelihood function. This involves taking the derivative of the log-likelihood with respect to the parameters and setting it to zero to solve for the parameters.

MLE in Logistic Regression

Logistic regression models the probability of a binary outcome (success/failure) based on input features x.

where Xi are the input features, β are the parameters to be estimated, and yi is the binary outcome.

Log-Likelihood Function:

The likelihood of observing the given data under logistic regression is:

Deriving the MLE for Logistic Regression

To Find the MLE for β, we need to maximize the log-likelihood function. This involves:

Calculating the Gradient: Compute the derivative of the log-likelihood with respect to β.
Optimization: Use an optimization algorithm (e.g., gradient descent) to find the parameter values that maximize the log-likelihood.

Practical Implementation

import numpy as np
import scipy.optimize as opt

X = np.array([[1, 2], [1, 3], [1, 4], [1, 5]])  # Adding a column of ones for the intercept
y = np.array([0, 0, 1, 1])

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Log-likelihood function
def log_likelihood(beta, X, y):
    z = np.dot(X, beta)
    return -np.sum(y * np.log(sigmoid(z)) + (1 - y) * np.log(1 - sigmoid(z)))

beta_init = np.zeros(X.shape[1])

# Optimization
result = opt.minimize(log_likelihood, beta_init, args=(X, y), method='BFGS')
beta_hat = result.x

print("Parameters", beta_hat)

Conclusion

Understanding the mathematical foundations of logistic regression and maximum likelihood estimation is essential for effectively applying these techniques in machine learning. By maximizing the likelihood function, logistic regression identifies the parameters β that best fit the observed data, enabling accurate predictions of binary outcomes based on input features

DEV Community

Maximum Likelihood Estimation with Logistic Regression

Understanding Likelihood

Example

Difference between likelihood and probability

Maximum Likelihood Estimation (MLE)

Step-by-Step

MLE in Logistic Regression

Log-Likelihood Function:

Deriving the MLE for Logistic Regression

Practical Implementation

Conclusion

Top comments (0)

Read next

「Mac畅玩鸿蒙与硬件8」鸿蒙开发环境配置篇8 - 应用依赖与资源管理

「Mac畅玩鸿蒙与硬件4」鸿蒙开发环境配置篇4 - DevEco Studio高效使用技巧

「Mac畅玩鸿蒙与硬件5」鸿蒙开发环境配置篇5 - 熟悉DevEco Studio界面

「Mac畅玩鸿蒙与硬件9」鸿蒙开发环境配置篇9 - 使用Git进行版本控制