Today I'm going to drop you a few relevant notes on normal distributions, so grab a coffee and follow me.
What is a distribution?
A distribution, or a probability distribution is a mathematical function that provides the probabilities of the occurrence of various possible outcomes in an experiment. Probability distributions are used to define different types of random variables in order to make decisions based on these models.
In other terms it is a function that shows the possible values for a variable and how often they occur.
Distributions can be both
If we are in a university campus and study the height of the people, we will probably see that most of the students are in the same height range (mean) we'll also see few students who are below that mean and few others above it. We can plot the histogram on that and what we'll see will be the distribution of the students!
We'll see something like this (height measured in cm)
That's the distribution and what it tells us is what we see, that most of the students are located in the same height range, other heights sure exist but on lower frequency so it is less probable to see students on those other ranges.
...and by the way that is also a normal distribution!
Normal distributions
In common words, normal distributions or bell-form distributions are those that look similar to the one that just presented. On those distributions most of the values fall within the mean
aproximately 68% of the values fall within one standard deviation (we already presented this one on our first posts) and 95% fall within 2 stdevs, like you can see here:
The first key point on this one is that, if your data follows a normal distribution, we only need the mean and the stdev to compute the whole graph or even better to get the actual probability/frequency on a particular value.
We can use the following formula
Other distributions
Other distributions exist as well such as the poisson distribution that looks like a normal distribution somehow moved to the left, the skellam distribution that looks like a normal one but with a peak on the centre and of course uniform distributions that present the same value all the way up.
Most of the time the type of data distribution won't matter that much in our experiments because of the central limit theorem
Central limit theorem and normal distributions
In the study of probability theory, the central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution (also known as a “bell curve”), as the sample size becomes larger, assuming that all samples are identical in size, and regardless of the population distribution shape.
Said another way, CLT is a statistical theory stating that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population, divided by each sample's size.
That comes in very handy when it comes to statistican analysis on very large populations, as we can grab smaller grups of random samples and infere knowledge about the whole population!
Some resources and references:
https://towardsdatascience.com/understanding-the-normal-distribution-with-python-e70bb855b027
https://www.investopedia.com/terms/c/central_limit_theorem.asp
Top comments (0)