DEV Community

chanduthedev
chanduthedev

Posted on

2 2

Category type in pandas

Python pandas library supports a data type called Category. When working with pandas dataframe, using Category will help in many ways. Let's see about Category datatype.

What is Category data type in pandas?

  • Category is a datatype which can be used when we have a fixed number of string values like
    • Months(Jan, Feb)
    • Country Names(India, Singapore)
    • Size(Small, Medium, Large)
  • In a simple way is using a sequence of integer values for the strings(Jan - 1, Feb - 2 etc)
  • Categories are similar to ENUM data types in other programming languages like C/C++, Java.

Advantages of using Category:

  1. Saving lot of memory by reducing the size
  2. Increasing processing speed

How to use Category in pandas dataframe:

- While reading the CSV file:

We can convert column from object to category while reading the file like below

filename = "~/Downloads/US_Accidents_Dec20.csv"
# Converting into category data type while reading CSV file
us_accidents_dec20_cat = pd.read_csv(filename, dtype = {'State' : 'category', 'City' : 'category'})
Enter fullscreen mode Exit fullscreen mode
- Converting column into category type:

We can convert the column on the fly like below

# Loading csv file into data frame
filename = "~/Downloads/US_Accidents_Dec20.csv"
us_accidents_dec20_cat = pd.read_csv(filename,)

# Normal column access
us_accidents_dec20['State']

# Converting to category data type
us_accidents_dec20['State'].astype('category')
Enter fullscreen mode Exit fullscreen mode

Memory comparison between Object vs Category data types:

  • Normal object column:
us_accidents_dec20['State'].memory_usage(deep=True) / 1e6
Enter fullscreen mode Exit fullscreen mode

Result:
249.720047

  • Category column:
us_accidents_dec20['State'].astype('category').memory_usage(deep=True) / 1e6
Enter fullscreen mode Exit fullscreen mode

Result:
4.23684

We can clearly observe storage space reduced from 249 to 4 which is a very huge difference.

Converting to Category data type will certainly help improve processing speed and space with a large set of data.

Happy Learning!!

P.S: Used Accidents' data of December 2020 from The USA, You can get this data from kaggle.

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more