How Relative Frequency Is Important for Predictions (Pandas)

#datascience #analytics

What is the relative frequency? How to find it? And what importance does it serve?

When analyzing data, sometimes knowing the value count of some of the entries in a certain column isn't enough to form an insight into what the data means. So, we obtain the relative frequency, which is the ratio of the frequency of a particular occurrence divided by the total number of occurrences.

For example, if you can form meaningful insights from data 9 times out of 12, this means that this happens 75% of the time.

How to find it in Pandas?

Instead of using this:

df['Column'].value_counts()

You should add normalize=True

df['Column'].value_counts(normalize=True)

The default of normalization is False, which doesn't take into account the frequency of the occurrence.

The importance of Relative Frequency

When you obtain the relative frequency of something, you're better able to get an insight into the probability of its occurrence, which means that you can make better predictions.

Note: You can represent the probability distribution using histograms. And when representing the relative frequency using histograms, the heights would indicate the probability.