This week I decided to participate in a hack-a-thon provided by analyticsvidhya.com. Here is the link: https://datahack.analyticsvidhya.com/contest/black-friday/#About. The basic premise of this problem is to be able to predict how much money a customer is going to spend this Black Friday based on criteria ranging anywhere from age, occupation, gender, to marital status.
Initial Thoughts
My initial thought at looking into this data set is that I probably have to do a linear regression model based on the type of information that I have. Most of the data in these columns are numerical values. There are approximately 170k nulls in the Product Category 2 column, 380k in Product Category 3 column, and none in the other columns. For my first run through, I always drop the columns with a large amount of nulls. Next I try to see if there's any relationships between the columns and target, like do girls spend more than guys or do married individuals spend more than unmarried ones.
Top comments (0)