How to handle categorical variables with large number of unique values

#preprocessing

How to handle categorical variables with large number of unique values.

I have a payment dataset with about 10000 observations which has DateTime fields, Expenses category fields, Payee name field, Amount fields. I want to identify anomalous payments in this dataset.

In regards to expenses category and Payee name fields there are more than 100, and 400 unique values.
There for is it appropriate to create dummy variables for these categorical fields?
Instead should I create subsets of dataset based on unique categorical variables. (eg- Dataset for Travelling expenses)

Appreciate it if you can give some inputs.

DEV Community

How to handle categorical variables with large number of unique values

Top comments (0)