Bayesian Statistics 101

#statistics #datascience #machinelearning #beginners

This article is for beginners in statistics or those who heard something about "Bayes" but didn't really know what that means. Welcome aboard, my friends.

A bit of history

One of the most important and exciting developments in statistics in recent years has been the rise of Bayesian statistics. Unlike traditional frequentist statistics, which relies on hypothesis testing and p-values, Bayesian statistics offers a flexible and intuitive approach to data analysis that can help researchers better understand the uncertainty inherent in their data.

Bayesian statistics is a way to update our beliefs about something as we gather more information. This method is named after Thomas Bayes, who first proposed a way to update probabilities in the 18th century.

What's going on

The basic idea behind Bayesian statistics is to start with a prior belief or probability about something, and then update that belief based on new data using Bayes' theorem. Bayes' theorem states that the probability of a hypothesis (what we believe about something) given the data (what we observe) is proportional to the probability of the hypothesis multiplied by the probability of the data given the hypothesis.

In mathematical terms, this can be written as:

P(hypothesis | data) = P(hypothesis) x P(data | hypothesis) / P(data).

Some example

Imagine you were just diagnosed with a rare disease. The test result came back positive, indicating that you have the disease.
Let's say that the prior probability of having the disease is very small, say 0.1%. The likelihood of the test result being positive given that you have the disease is high, say 99%. However, the likelihood of the test result being positive given that you do not have the disease is also not negligible, and it is equal to the false positive rate of the test, which is 2%.

What's your real chances to have a disease?

For our example it will mean:

P(sick) = 0.001
P(healthy) = 1 - P(sick) = 0.999
P(positive test | sick) = 0.99
P(positive test | healthy) = 0.02
P(sick | positive test) = P(sick) x P(positive test | sick) / P(positive test)

First let's compute full probability of being tested positive.

P(positive test result) = P(positive test result | sick) * P(sick) + P(positive test result | healthy) * P(healthy)
0.99 X 0.001 X 0.01 + 0.02 X 0.999 = 0.0209
Plugging this to the Bayes formula gives
0.001 X 0.99 / 0.0209 = 0.047

As you see, in this circumstances you still have a 95.2% chance of being healthy and should consider testing second time.

Applications

Bayesian statistics has several applications in machine learning, some of which include:

Bayesian Optimization: This technique is used to optimize hyperparameters of machine learning models. It involves constructing a probabilistic model of the objective function and using Bayesian inference to find the best hyperparameters.
Bayesian Regression: This technique is used to model the relationship between input variables and output variables in a regression problem. It involves specifying a prior distribution over the model parameters and using Bayesian inference to obtain the posterior distribution.
Bayesian Neural Networks: These are neural networks with a prior distribution over the network weights. The posterior distribution is obtained using Bayesian inference, which allows for uncertainty quantification in the predictions.
Bayesian Decision Theory: This is a framework for decision making under uncertainty. It involves specifying a prior distribution over the possible states of the world and using Bayesian inference to obtain the posterior distribution. This posterior distribution can then be used to make decisions that minimize expected loss.
Bayesian Networks: These are probabilistic graphical models that represent the dependencies between variables. Bayesian networks can be used for classification, prediction, and decision making.

Real-world applications of Bayesian statistics in machine learning include image recognition, speech recognition, natural language processing, fraud detection, and recommendation systems. For example, Bayesian optimization can be used to optimize the hyperparameters of a convolutional neural network for image recognition tasks. Bayesian regression can be used to predict customer churn in a subscription-based service. Bayesian decision theory can be used to decide whether to approve or reject a loan application based on the applicant's credit score and income.