Using GroupBy to Investigate Data from a Certain Scope According to One or More Specific Attributes

#datascience #analytics #python #pandas

In this scenario, we have a dataframe that's made up of multiple attributes and we want to find the means of some of those attributes but from the scope of one or two main attributes.

For example, if we want to find the mean height in a population that consists of males and females with different age groups:

Bear in mind that my dataframe is called population and there are attributes like (for example) weight, height, BMI, and the age and gender, which we will use to split the data during analysis.

Importing and Parsing

Import pandas as pd
population_df = pd.read_csv('investigation_data.csv')

To view means relative to the age of the person:

population_df.groupby('age').mean()

This will result in showing us the mean age of all samples with a certain age, which will be specified in the first column of my dataset.

To view means relative to the age and then relative to the gender:

So, to use multiple columns with groupby, we can do the following:

population_df.groupby(['age', 'gender']).mean()

Which will show us the mean of the attributes according to age, and then gender.

DEV Community

Using GroupBy to Investigate Data from a Certain Scope According to One or More Specific Attributes

Importing and Parsing

To view means relative to the age of the person:

To view means relative to the age and then relative to the gender:

Top comments (0)

Read next

Unraveling Package Hallucinations: A Comprehensive Analysis of Code-Generating LLMs

Bridging Machine Learning with TensorFlow: From Python to JavaScript

Implementing Heap In Python

Understanding FastAPI Fundamentals: A Guide to FastAPI, Uvicorn, Starlette, Swagger UI, and Pydantic