First step of statistical analysis
Before performing the first step of statistical analysis, the first step is to determine whether the data you are dealing with is population or sample data.
Population includes all the elements from a dataset. It is an entire group that you want to draw conclusions about. It is denoted by N.
Sample is a subset of the population. It is denoted by n.
To summarize, sample is the group of individuals who participate in your study, and population is a broader group of people to whom your results will apply. In research population does not refer to people.
Example: Let's take an example of average weight of human beings. Let's say we have an estimated amount of 8 billion human beings in the world. Finding the average is very simple, so as to add all the weights of the people and divide it by the count, but how to get the weight of 8 billion people ?
For the above problem, probability is used and the concept of Population and Sample will help.
In our case, it is the weight of all people in the world is the population. The sample here is picked up randomly like 10 from India, 10 from the USA, 10 from Europe etc.. in equal proportion.
We can pick any number of elements from the population to the sample, in fact choosing more number of elements make the results more relevant and approximately same to the population and this makes a model statistically significant
The most important thing to consider here is, while taking samples from different countries, we have to take samples in equal proportions. The process of splitting samples are called sampling.
Types of sampling techniques
1. Simple Random Sampling
2. Stratified Sampling
Simple Random Sampling (BIASED SAMPLE)
In this method, every data is picked in a random manner and every data has equal chances of getting selected. Selection of one data point will not affect the selection of any other point in the population.
Stratified Sampling (UNBIASED SAMPLE)
In this method, we may have different groups, for example different countries. In our above example, we have weight of human beings in different countries, we have to select in such a way that we select 10 from India, 10 from the USA, 10 from Europe etc... i.e exactly in equal proportions.
To summarize, if there are no sub-groups in a population, it is good to go for Simple random sampling. If there are sub-groups, it is good to go for Stratified sampling. Using Stratified sampling leads to more accurate output.
Top comments (0)