A **probability distribution** is a description of how likely a random variable or set of random variables is to take on each of its possible states. The way we describe probability distributions depends on whether the variables are discrete or continuous.

####
**3.3.1 Discrete Variables and Probability Mass functions**

A probability distribution over discrete variables may be described using a **probability mass function (PMF)**. A probability mass function maps from a state of a random variable to the probability of that random variable taking on that state.

For example, the roll of a dice is random and a discrete variable means the roll can only have 1, 2, 3, 4, 5 or 6 on a die and no values inbetween.

We denote probability mass functions with **P**, where we denote a **PMF** equation as P(X = x). Here *x* can be a number on the dice when **X** is the event of rolling the dice.

```
"""
In a fair 6 sided dice, when you roll, each number has a chance of 1/6 = 16.7% of landing and we can show
this by running long enough rolls. So in this example, we do 10000 rolls and we verify that P(X=4) = 16.7%.
In short, the probability from a PMF says what chance x has. Play around with the different x values, number of rolls and sides and see what kind of probability you get and see if it makes sense.
"""
def single_dice(x, sides, rolls):
"""Calculates and prints the probability of rolls.
Arguments:
x (int) : is the number you want to calculate the probability for.
sides (int) : Number of sides for the dice.
rolls (int) : Number of rolls.
Returns:
a printout.
"""
result = roll(sides, rolls)
for i in range(1, sides +1):
plt.bar(i, result[i] / rolls)
print("P(X = {}) = {}%".format(x, tf.divide(tf.multiply(result[x], 100), rolls)))
def roll(sides, rolls):
"""Returns a dictionary of rolls and the sides of each roll.
Arguments:
sides (int) : Number of sides for the dice.
rolls (int) : Number of rolls.
Returns:
a dictionary.
"""
d = defaultdict(int) # creating a default dictionary
for _ in range(rolls):
d[random.randint(1, sides)] += 1 # The random process
return d
single_dice(x=6, sides=6, rolls=10000)
P(X = 6) = 16.43%
```

To be a **PMF** on a random variable x, a function **P** must satisfy the following properties:

The domain of

**P**must be the set of all possible states of x. In our example above the possible states of x are from 1-6, try plugging in 7 for x and see what value you get.β x β, 0β€P(x)β€1. An impossible event has probability 0, and no state can be less probable than that. Likewise, an event that is guaranteed to happen has probability 1, and no state can have a greater chance of occurring. If you tried plugging in 7 for our example above, you would have seen the probability of obtaining a 7 would be zero, that is an impossible event because 7 is not in our set.

β_xβx P(x)=1. Normalized property that prevents from obtaining probabilities greater than one. Meaning if you add all the individual values of our dice probabilities, it should sum to 1 or 100%.

Probability mass functions can act on many variables at the same time. Such a probability distribution over many variables is known as a **joint probability mass function**. P(x=x;y=y) = P(x)P(y) denotes the probability that x=x and y=y simultaneously.

```
"""
In this example, we are rolling two dices, there are ways to simplify the code so it's not this long but
I wanted to show that we are rolling two dice 1000 times, and in the example we are calculating the probability
of rolling x=4 and y=1, this can be easily calculated by multiplying the individual probabilities of x and y."""
def multi_dice(x, y, sides, rolls, plot=True):
"""Calculates the joint probability of two dice.
Arguments:
x (int) : is the number you want to calculate the probability for.
y (int) : is the number you want to calculate the probability for.
sides (int) : Number of sides for the dice.
rolls (int) : Number of rolls.
plot (bool) : Whether you want to plot the data or not.
Returns:
probabilities (float).
"""
result1 = roll(sides, rolls) # first result from the rolls
result2 = roll(sides, rolls) # second result from the rolls
prob_x = tf.divide(result1[x], rolls) # calculates the probability of x
prob_y = tf.divide(result2[y], rolls) # calculates the probability of y
joint_prob = tf.multiply(prob_x, prob_y) # calculates the joint probability of x&y by multiplying
if plot:
for i in range(1, sides +1):
plt.title("Dice 1 {} Rolls".format(rolls))
plt.bar(i, result1[i] / rolls, color=color_b)
plt.show()
for i in range(1, sides +1):
plt.title("Dice 2 {} Rolls".format(rolls))
plt.bar(i, result2[i] / rolls, color=color_o)
plt.show()
return prob_x, prob_y, joint_prob
prob_x, prob_y, joint_prob = multi_dice(x=4, y=1, sides=6, rolls=10000, plot=True)
print("P(x = {:.4}%), P(y = {:.4}%), P(x = {}; y = {}) = {:.4}%\n\n".format(tf.multiply(prob_x, 100),
tf.multiply(prob_y, 100),
4, 1, tf.multiply(joint_prob, 100)))
P(x = 16.9%), P(y = 16.39%), P(x = 4; y = 1) = 2.77%
```

####
**3.3.2 Continuous Variables and Probability Density Functions**

When working with continuous random variables, we describe probability distributions using a **probability density function (PDF)**.

Let's play a game shall we, what if I ask you to guess the integer that I am thinking of between 1 to 10, regardless of the number you pick, the probability of each of the options is the same (1/10) because you have 10 options and the probabilities must add up to 1.

But what if I told you to guess the real number I am thinking between 0 and 1. Now this gets tricky, I can be thinking of 0.2, 0.5, 0.0004 and it can go on and on and the possibilities are endless. So we run into problems like how are we going to describe the probability of each option since there are infinite numbers. This is where **PDF** comes to help, instead of asking the exact probability, we look for a probability that is close to a single number.

```
"""
In our guessing game example, I told you how difficult it would be for you to guess a real number I am thinking of
between 0 and 1 and below, we plot such a graph with minval of 0 and maxval of 1 and we "guess" the values 500
times and the resulting distribution is plotted.
"""
# Outputs random values from a uniform distribution
continuous = tf.random.uniform([1, 500], minval=0, maxval=1, dtype=tf.float32)
g = sns.distplot(continuous, color=color_b)
plt.grid()
```

To be a probability density function, a function *p* must satisfy the

following properties:

The domain of p must be the set of all possible states of x

β xβx, p(x)β₯0. Note that we do not require p(x)β€1

β«p(x)dx=1

A probability density function p(x) does not give the probability of a specific state directly; instead the probability of landing inside an infinitesimal region with volume Ξ΄x is given by p(x)Ξ΄x

```
"""
Below is the same histogram plot of our continuous random variable, note that the values of y axis looks different
between the seaborn distplot and the histogram plot because the sns distplot is also drawing a density plot.
You can turn it off by setting βkde=Falseβ and you will get the same plot as you see below.
The goal of the following plot is to show you that if you want to calculate the p(0.3) then you would need to
calculate the volume of the region delta x
"""
n, bins, patches = plt.hist(continuous, color=color_b)
patches[3].set_fc(color_o)
plt.grid()
```

We can integrate the density function to find the actual probability mass of a set of points. Specifically, the probability that *x* lies in some set **S** is given by the integral of p(x) over that set ( β«_[a,b]p(x)dx )

**Tensorflow Probability Distribution Library**

From here onwards, we will be using TFP distributions module often and we will be calling it as tfd (=tfp.distributions). So, before getting started, let me explain a few things about the module.

The TF Probability uses distribution subclasses to represent stochastic, random variables. Recall the first cause of uncertainty, inherent stochasticity. This means that even if we knew all the values of the variables' parameters, it would still be random. We would see examples of these distributions in Section 9. In the previous example, we created the distribution using a random variable but extracting samples from it and manipulating those will not be as intuitive as it would when you are using the tfp distributions library. We usually start by creating a distribution and then when we draw samples from it, those samples become tensorflow tensors which can be deterministically manipulated.

Some common methods in tfd:

- sample(sample_shape=(), seed=None): Generates a specified sample size
- mean(): Calculates the mean
- mode(): Calculates the mode
- variance(): Calculates the variance
- stddev(): Calculates the standard deviation
- prob(value): Calculates both the Probability density/mass function
- log_prob(value): Calculates the Log probability density/mass function.
- entropy(): Shannon entropy in nats.

```
"""
Let's say we want to find the probability of 1.5 (p(1.5)) from a continuous distribution. We can ofcourse
do the integral and find it but in tensorflow probability you have "prob()" which allows you to calculate
both Probability Mass Function and Probability Density Function.
For tfp.distributions.Normal "loc" is the mean and "scale" is the std deviation. Don't worry if you don't
understand those, we will go through distributions in Section 9. And I recommend you come back and go through
these examples again after you finish section 9.
Also, there's nothing special about these numbers, play around with the scale, p(x) values and the k limits to
get a better understanding.
"""
import tensorflow_probability as tfp
tfd = tfp.distributions
# creating an x axis
samples = tf.range(-10, 10, 0.001)
# Create a Normal distribution with mean 0 and std deviation 3
normal_distribution = tfd.Normal(loc=0., scale=3)
# Then we calculate the PDFs of drawing 1.25
pdf_x = normal_distribution.prob(1.5)
# We can't plot tensors so evaluate is a helper function to convert to ndarrays
[pdf_x_] = evaluate([pdf_x])
# Finally, we plot both the PDF of the samples and p(1.25)
plt.plot(samples, normal_distribution.prob(samples), color=color_b)
plt.fill_between(samples, normal_distribution.prob(samples), color=color_b)
plt.bar(1.5, pdf_x_, color=color_o)
plt.grid()
print("Probability of drawing 1.5 = {:.4}% from the normal distribution".format(pdf_x*100))
Probability of drawing 1.5 = 11.74% from the normal distribution
```

This is section three of the Chapter on Probability and Information Theory with Tensorflow 2.0 of the Book Deep Learning with Tensorflow 2.0.

You can read this section and the following topics:

03.00 - Probability and Information Theory

03.01 - Why Probability?

03.02 - Random Variables

03.03 - Probability Distributions

03.04 - Marginal Probability

03.05 - Conditional Probability

03.06 - The Chain Rule of Conditional Probabilities

03.07 - Independence and Conditional Independence

03.08 - Expectation, Variance and Covariance

03.09 - Common Probability Distributions

03.10 - Useful Properties of Common Functions

03.11 - Bayes' Rule

03.12 - Technical Details of Continuous Variables

03.13 - Information Theory

03.14 - Structured Probabilistic Models

at Deep Learning With TF 2.0: 03.00- Probability and Information Theory. You can get the code for this article and the rest of the chapter here. Links to the notebook in Google Colab and Jupyter Binder are at the end of the notebook.

## Top comments (0)