Classifiers are special functions or algorithms that are used to assign labels to points in a dataset.
For example, in identifying credit card fraud, the classifier would be an algorithm that has inputs for payments, and labels them as fraudulent payments or not fraudulent payments, based on a set of features.
Naive bayes classifiers are a set of probabilistic classifying algorithms based on the Bayes Theorem.
To understand this theorem, let's briefly discus conditional probability.
Conditional probability is well demonstrated in the context of a deck of cards. A deck has 52 cards, with 13 cards in each suit, and you draw a diamond. You then go to draw another card, hoping that it will also be a diamond. Since you have already drawn one diamond, you have 12 diamonds left, and 51 cards in total, giving you a 12/51 chance of drawing another diamond.
So we can summarise this as: Given your first card is a diamond, the probability of your second card being a diamond is 12/51.
We can define this statement as:
And this is conditional probability, where the chance of one event occurring is dependant on another event or state.
Moving on to Bayes Theorem, this is a way to go from a known state of P(A|B) to P(B|A).
This theorem is stated as:
But how did we get here? Let's look at some maths:
P(B|A) = P(A AND B) / P(A) --- (1) P(A|B) = P(A AND B) / P(B) --- (2) => P(A AND B) = P(B|A) * P(A) = P(A|B) * P(B) => P(A|B) = [P(B|A) * P(A)] / P(B)
So we now understand the basis of the Bayes Theorem, and how to calculate the probability of event A, given B. However, in real life situations, we have multiple B's. i.e. a range of different events or features determine the event or label A.
So, we can extend Bayes theorem for multiple features. It is considered 'Naive' because we are assuming the events are all independent of each other (in the real world, that's rare).
Naive Bayes becomes:
P(A|b1..bn) = [P(b1|A)P(b2|A)P(bn|A)P(A)] / [P(b1)P(b2)P(bn)]
Where bn represents the different events.
From this you can see we are making the assumption that all events are independent and have an equal effect on the outcome.
This method is just one type of naive bayes classifier. This works for discrete data.
This method assumes the data is distributed in a gaussian distribution (normal distribution) and uses a different formula:
This method works for continuous data and is one of the simplest classifiers to understand and use.
In later posts I will go into other classifiers and examples of using them.