In this scenario, we have a dataframe that's made up of multiple attributes and we want to find the means of some of those attributes but from the scope of one or two main attributes.
For example, if we want to find the mean height in a population that consists of males and females with different age groups:
Bear in mind that my dataframe is called population and there are attributes like (for example) weight, height, BMI, and the age and gender, which we will use to split the data during analysis.
Importing and Parsing
Import pandas as pd
population_df = pd.read_csv('investigation_data.csv')
To view means relative to the age of the person:
population_df.groupby('age').mean()
This will result in showing us the mean age of all samples with a certain age, which will be specified in the first column of my dataset.
To view means relative to the age and then relative to the gender:
So, to use multiple columns with groupby, we can do the following:
population_df.groupby(['age', 'gender']).mean()
Which will show us the mean of the attributes according to age, and then gender.
Top comments (0)