Pandas - Aggregating and Grouping Data - 7 Days of Pandas

Welcome to the fifth article in the "7 Days of Pandas" series where we cover the pandas library in Python which is used for data manipulation.

In the first article of the series, we looked at how to read and write CSV files with Pandas. In this tutorial, we will look at some of the most common operations that we perform on a dataframe in Pandas.

In the second article, we looked at how to perform basic data manipulation.

In the third article, we looked at how to perform EDA (exploratory data analysis) with Pandas.

In the fourth article, we looked at how to handle missing values in a dataframe.

In this tutorial, we will look aggregate and group data in Pandas.

Aggregating and grouping data is a common task when working with datasets, and pandas provides a range of functions and methods to help you do this efficiently.

In this tutorial, we will cover the following topics:

Applying aggregate functions to pandas dataframe.
Grouping data in pandas dataframe (and applying aggregate functions to the grouped data).

Before we begin, let's first import pandas and create a sample dataframe that we will be using throughout this tutorial.

import pandas as pd

# employee data
data = {
    "Name": ["Tim", "Shaym", "Noor", "Esha", "Sam", "James", "Lily"],
    "Gender": ["M", "M", "F", "F", "M", "M", "F"],
    "Age": [26, 28, 27, 32, 24, 31, 33],
    "Department": ["Marketing", "Product", "Product", "HR", "Product", "HR", "Marketing"],
    "Salary": [60000, 70000, 82000, 55000, 58000, 55000, 65000]
}

# create pandas dataframe
df = pd.DataFrame(data)

# display the dataframe
df

	Name	Gender	Age	Department	Salary
0	Tim	M	26	Marketing	60000
1	Shaym	M	28	Product	70000
2	Noor	F	27	Product	82000
3	Esha	F	32	HR	55000
4	Sam	M	24	Product	58000
5	James	M	31	HR	55000
6	Lily	F	33	Marketing	65000

Applying aggregate functions

Pandas comes with a number of aggregate functions that you can apply to the entire dataframe or one or more columns in the dataframe.

For example, you can apply the sum() funciton to get the sum of values in each column.

df.sum()

Name                         TimShaymNoorEshaSamJamesLily
Gender                                            MMFFMMF
Age                                                   201
Department    MarketingProductProductHRProductHRMarketing
Salary                                             445000
dtype: object

Note that for object type sum() resulted in a concatenated string.

You can select which columns to apply the aggregate functions to.

For example, let's the mean value of the "Age" and the "Salary" columns.

df[["Age", "Salary"]].mean()

Age          28.714286
Salary    63571.428571
dtype: float64

Grouping data

To group a pandas DataFrame by one or more columns, you can use the pandas dataframe groupby() method. This method takes one or more column names as arguments and returns a groupby object that can be used to apply various operations to the grouped data.

Let's group the above data on the "Gender" column.

df.groupby("Gender")

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10684b970>

We get a groupby object. You can now use this object to apply aggregations to the grouped data. For example, let's get the average "Age" value for each group.

df.groupby("Gender")["Age"].mean()

Gender
F    30.666667
M    27.250000
Name: Age, dtype: float64

We get the mean value of the "Age" column for each group (here, "Gender") in the data.

You can group the data of more than one columns as well. For example, let's group the data on "Gender" and "Department" and get the average "Age" in each group.

df.groupby(["Gender", "Department"])["Age"].mean()

Gender  Department
F       HR            32.0
        Marketing     33.0
        Product       27.0
M       HR            31.0
        Marketing     26.0
        Product       26.0
Name: Age, dtype: float64

You can also apply multiple aggregate functions to the grouped data using the .agg() function.

df.groupby(["Gender", "Department"]).agg(['mean', 'count'])

		Age		Salary
		mean	count	mean	count
Gender	Department
F	HR	32.0	1	55000.0	1
	Marketing	33.0	1	65000.0	1
	Product	27.0	1	82000.0	1
M	HR	31.0	1	55000.0	1
	Marketing	26.0	1	60000.0	1
	Product	26.0	2	64000.0	2

DEV Community

Pandas - Aggregating and Grouping Data - 7 Days of Pandas

Applying aggregate functions

Grouping data

Top comments (0)

Read next

Mastering the Spread Operator for Objects and Arrays in JavaScript

Can English Replace Java? The Future of Programming in Plain Language

Understanding Default Parameters in JavaScript Functions

Similarities of the CodeBehind Framework with the Microsoft WebForms