In this article we are going to explore methods used in measuring central tendency of data, and their importance in the field of data science.
What are measures of central tendency?
These are numerical values that represent the middle value in a dataset, also known as averages. They are important for summarizing data by finding average values.
They describe to what extent numerical variables tend to group around a specific value.
The commonly used measures are:
- The Mean
It is also known as average. This can be computed by dividing the sum of values by the number of data values that were summed.
The mean represents one way of finding the most typical value in a set of data values. it uses all values in a sample population, hence outliers can affect its accuracy.
- The Median
This is the middle value in a sorted set of values.
When the number of data values is even, there is no natural middle value, therefore to determine the mean you can compute the mean of the two middle values.
The median splits the set of ordered values into two parts that have equal number of values. It is a good alternative to use for a dataset that has outliers since it's not affected by extreme values.
- The Mode
It is the value that appears frequently in a distribution.
Example using Python
Importance of measures of central tendency in the field of data science
• Summarize large datasets making them easier to understand.
• Detect outliers/anomalies to help identify potential errors in the data for accurate assessments.
• Communicate insights from analysis of differences/similarities between different datasets or time periods for better decision making.
• Draw inductive inferences as data samples are used to make inferences about larger populations.
• Make predictions through understanding of averages of expected outcome. For example in real estate, it can be determined which region is most preferred by customers by analyzing trends in sales over a given time period.
• Draw conclusions about the corresponding statistics in the population
Applications
• A clothing store stocking the most common sizes purchased.
• Companies evaluating their average employee salaries.
• Insurance providers evaluating the median age of their customers.
Top comments (0)