Hello Scientist
Before we start, this article is a follow-up to the previous one about (Sets) in Math and Python.
To understand today’s post you should have knowledge of Sets or you can go and read What Is Set In Math And Python
Ok? Are you ready??
Our first example is detecting Corona on People.
- Let's call all the people that we chose as a test sample -> X
- And the set of people among X who are sick (have Corona) -> S
- The set of people among X who are Healthy(don't have Corona) -> H
To represent S in mathematics we write:
S={x ∈ X: x has Corona}
The small x means one of the X sets (a member of X).
So the formula says an x "a person from the sample test" that is a member of the X "the set of all the people of the sample test" and that x "that person" has Corona (from the S sub set).
And to represent H in mathematics we write:
H={x ∈ X: x doesn't have Corona}
Now, what is the possibility that one of the X people has Corona and doesn't have Corona at the same time?
Of course, this is impossible!
So we know that |S| ∩ |H| = ∅ and |S| ∪ |H| = X (the people in the test sample are either sick or healthy so if we put all the sick and healthy people together they will be all of our samples)
Now let's add our Corona detect prediction as we want to know how accurate is our CPR prediction.
- So let the people from the sample who tested positive ->P And we represent them as P= {x ∈ X : x positive for Corona}
Note that positive here means that the CPR predicts that someone has Corona.
- The people from the sample who tested negative ->N And we represent them as N= {x ∈ X : x negative for Corona}
Now we have 4 probabilities:
- someone who has Corona and his test for Corona is positive.
- someone who does not have Corona and his test for Corona is negative.
- someone who has Corona but his test for Corona is negative.
- someone who does not have Corona but his test for Corona is positive.
these probabilities are called the Confusion matrix
Confusion Matrix:
A confusion matrix is a technique for summarizing the performance of a classification algorithm.
Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of errors it is making.
Now Let's discuss the 4 possibility:
Someone who has Corona and his test for Corona is positive.
The representation of this possibility is: |S| ∩ |P|
because S is the people who have Corona and P are the people who tested positive.
someone who has Corona and tested positive means we want to find someone in both sets (S and P) which means the intersection.
If we have a girl called Sara, Sara is in the P set which means the CPR predicted that Sara is in the S set ( the set of sick people)
If Sara is tested positive we call this prediction "positive" and if she is sick "if she is in S set" then we say the prediction is True -> True Positive.
- When the prediction of what we are trying to find is right we call it positive.
- When the reality meets the prediction we call it True So |S| ∩ |P| = True Positive.
someone who does not have Corona and his test for Corona is negative.
The representation of this possibility is: |H| ∩ |N|
because H is the healthy people who don't have Corona and N are the people who tested negative.
someone who doesn't have Corona and tested negative means we want to find someone in both sets (H and N) which means the intersection.
Let's assume that the girl called Sara is in the N set which means the CPR predicted that Sara is in the H set ( the set of healthy people)
If Sara is tested negative we call this prediction "negative" and if she is sick "if she is in S set" then we say the prediction is True -> True negative.
- When the prediction of what we are , not trying to find (or the opposite of what we want to predict) is right we call it negative. -When the reality meets the prediction we call it True
So |H| ∩ |N| = True Negative
Someone who has Corona but his test for Corona is negative.
The representation of this possibility is: |S| ∩ |N|
because S is the sick people who do have Corona and N are the people who tested negative.
Let's assume that the girl called Sara is in the N set which means the CPR predicted that Sara is in the H set ( the set of healthy people)
If Sara is tested negative we call this prediction "negative" and because she is sick and not healthy "she is in S set" then we say the prediction is False -> False-negative.
- When the prediction of what we are , not trying to find (or the opposite of what we want to predict) is right we call it negative.
- When the reality does not match the prediction we call it False
So |H| ∩ |N| = False Negative
someone who does not have Corona but his test for Corona is positive.
The representation of this possibility is: |H| ∩ |P|
because H are the healthy people who don't have Corona and P are the people who tested positive.
Again, the poor Sara is in the P set which means the CPR predicted that Sara is in the N set ( the set of sick people)
If Sara is tested positive we call this prediction "positive" and because she is healthy "she is in H set" and not sick then we say the prediction is False -> False positive.
- When the prediction of what we are trying to find is right we call it positive.
- When the reality does not match the prediction we call it False
So |H| ∩ |P| = False positive
Note that to make our prediction accurate we want to increase the True positive and True negative and decrease the False-positive and the False-negative.
I know that it is a bit of a headache so here is another example you can think of:
we are making an app with machine learning algorithms that predict if a word is a bad - nasty word.
Again the word would be either bad word or not bad word and
Because we are searching for bad words, if we predict one we call the prediction positive, and if we predict the opposite "not bad" we call the production negative.
If our prediction is True then we call the prediction True, and if our prediction is False then we call the production False.
So
- our sample X is the words in a sentence.
- G is the set of not bad words. -> G= {x ∈ X: X is not a bad word}.
- B is the set of bad words. -> B= {x ∈ X: X is a bad word}.
- P is for the probability of bad words -> P= {x ∈ X: positive for bad words}.
- P is for the probability of not bad words -> P= {x ∈ X: negative for bad words}.
Then:
- |B| ∩ |P|= True Positive
- |B| ∩ |N|= False Negative
- |G| ∩ |P|= False Positive
- |G| ∩ |N|= True Negative
Bonus 🥳:
Let's represent the Confusion Matrix
References:
Top comments (4)
I signed up to " Dev " because your writing is amazing! Looking forward to more articles from you! I'm a beginniner on the path of Data Science, and programming! I can't wait to see how it will change my perspective on life going on this journey! Thanks for sharing!
Thank you for your kind words, your comment really made me happy !
I will keep posting and I hope you can enjoy and learn.
Wish you the best of luck in your journey!
Hi M,
Thinking of studying some ML?
This was good. ;)
How about something on the "False Positive Paradox" or Fisher's Exact Test?
False Positive Paradox OK 👍