DEV Community

Cover image for Multinomial Probabilities
Abzal Seitkaziyev
Abzal Seitkaziyev

Posted on • Updated on

Multinomial Probabilities

Binomial, Trinomial, and Multinomial probabilities

In the binomial distribution post, I reviewed examples where outputs were either success with probability p or failure with complementary probability (1-p). What if we want to know the probabilities where more than two outcomes are possible. For example, game results could be a win, lose, or draw, which is trinomial distribution. Because there are multiple outcomes, it is called a multinomial probability distribution.

So, Binomial probability mass function (pmf) :
Alt Text

Because we have 2 possible outputs, like success and failure (k=2), where x1+x2=n, and p1+p2 =1, we can rewrite it as:
Alt Text

Multinomial pmf is the generalization of the binomial pmf, where x1+x2+...xk=n and p1+p2+..pk =1:
Alt Text

We can see that the binomial pmf is a special case of the multinomial pmf where k=2.

Multinomial vs Hypergeometric probabilities

Let's do a practical example using Python.

We are launching a new product and initial testing showed that out of 100 reviews, the product received positive feedback ⅔ of the times, negative 2/15 of the times, and the remaining were neutral. If we are pulling out of the database 3 random reviews for product presentation, what is the probability that at least one will be positive? At least one negative? Exactly 1 positive, 1 negative, and 1 neutral?

Approximate solution

Practically, we are not going to use the same review twice for the presentation, so it is a so-called experiment without replacement. So, we cannot use multinomial distribution, as we pull out any review, it changes the proportion of remaining reviews, and probabilities will not stay the same from one trial to another. But I assume here that we pull out only 3 reviews out of 100, we are not changing probabilities drastically in each next trial.

from scipy.stats import multinomial

# exactly 1 positive, 1 negative, and 1 neutral
one = multinomial.pmf([1, 1, 1], n=3, p=[2/3, 2/15, 1/5])

# at least on negative
min_one_neg = \
multinomial.pmf([1, 1, 1], n=3, p=[2/3, 2/15, 1/5]) + \
multinomial.pmf([2, 1, 0], n=3, p=[2/3, 2/15, 1/5]) + \
multinomial.pmf([0, 1, 2], n=3, p=[2/3, 2/15, 1/5]) + \
multinomial.pmf([1, 2, 0], n=3, p=[2/3, 2/15, 1/5]) + \
multinomial.pmf([0, 2, 1], n=3, p=[2/3, 2/15, 1/5]) + \
multinomial.pmf([0, 3, 0], n=3, p=[2/3, 2/15, 1/5])

# at least on positive

min_one_pos = \
multinomial.pmf([1, 1, 1], n=3, p=[2/15, 2/3, 1/5]) + \
multinomial.pmf([2, 1, 0], n=3, p=[2/15, 2/3, 1/5]) + \
multinomial.pmf([0, 1, 2], n=3, p=[2/15, 2/3, 1/5]) + \
multinomial.pmf([1, 2, 0], n=3, p=[2/15, 2/3, 1/5]) + \
multinomial.pmf([0, 2, 1], n=3, p=[2/15, 2/3, 1/5]) + \
multinomial.pmf([0, 3, 0], n=3, p=[2/15, 2/3, 1/5])

one, min_one_neg, min_one_pos
Enter fullscreen mode Exit fullscreen mode

Results:
(0.10666666666666669, 0.34903703703703703, 0.962962962962963)

Accurate solution

To account for the factor that we do experiment with replacement, I will use the Multivariate Hypergeometric probability mass function. Despite the complicated name, it is a very intuitive distribution and based on the counting of the probabilities using combinations. In case, somebody interested in more details here is the link for the video with a good explanation of the concept.

from math import factorial
def comb(n,x):
    # calculates combination of n choose k
    return factorial(n)/(factorial(x)*(factorial(n-x)))

# exactly 1 positive, 1 negative, and 1 neutral
one = (comb(67,1)*comb(13,1)*comb(20,1))/(comb(100,3))

# at least on negative
min_one_neg = \
(comb(67,1)*comb(13,1)*comb(20,1))/(comb(100,3))+\
(comb(67,0)*comb(13,1)*comb(20,2))/(comb(100,3))+\
(comb(67,2)*comb(13,1)*comb(20,0))/(comb(100,3))+\
(comb(67,0)*comb(13,2)*comb(20,1))/(comb(100,3))+\
(comb(67,1)*comb(13,2)*comb(20,0))/(comb(100,3))+\
(comb(67,0)*comb(13,3)*comb(20,0))/(comb(100,3))


# at least on positive
min_one_pos = \
(comb(67,1)*comb(13,1)*comb(20,1))/(comb(100,3))+\
(comb(67,1)*comb(13,2)*comb(20,0))/(comb(100,3))+\
(comb(67,1)*comb(13,0)*comb(20,2))/(comb(100,3))+\
(comb(67,2)*comb(13,0)*comb(20,1))/(comb(100,3))+\
(comb(67,2)*comb(13,1)*comb(20,0))/(comb(100,3))+\
(comb(67,3)*comb(13,0)*comb(20,0))/(comb(100,3))

one, min_one_neg, min_one_pos

Enter fullscreen mode Exit fullscreen mode

Results:
(0.10773036487322202, 0.3444959802102659, 0.9662585034013607)

Conclusion

As we expected, approximate and exact solution results are quite close. Apart from understanding basic concepts, it may be more interesting how they are applied and used in the world of practical data science and machine learning. I will try to describe my findings on the application topic in the next few posts.

Top comments (0)