Marjan Ferdousi

Posted on

# Understanding The Problem

Imagine you’re a cat, who is obsessed with potato chips, and has no idea about what data science is. You have a hooman friend who has a lot of data but too lazy to do anything with it. You love potato chips so much that one day you decide to have your own tuna flavoured potato chips brand. You’re not sure if the hoomans would like your tuna flavoured potato chips, or how you should decide the price, or how the demand would be in future. So you’ve called your hooman friend to have some advice because he has a lot of data on it, and data can do magic.

Now you’ve started thinking about your other questions. How can you have some basic idea about the price? You start checking your data where you see that a 16oz pack of Hay’s chips made with onions and sour cream flavour costs \$3.66, and an 8oz pack of Tingles tomato salsa flavoured chips costs \$2. You’ve noticed that you know various information about the chips in your data like packet size, flavours, ingredients and so on, and the prices of all the chips are not necessarily always \$3.66 or \$2. Depending on the features like size or ingredients, it is varying within a range. For example, if the first 5 samples of chips have prices as following: \$2.19, \$4.10, \$3.50, \$2.20 and \$2.50, there is no such rule that the price of the 6th sample has to be within these exact prices only. It can be \$1.99, or \$4.50, depending on how complex the flavour profile is, and how big the pack size is. You mentally take note that your hooman friend is calling this a REGRESSION problem.

Hearing you meowing enthusiastically, your hooman friend decides to explain a special type of regression to you. He calls it a TIME SERIES regression. It is a special type of regression where you try to predict some future values of something using the values from the past, linked by time. You suddenly realize that your third problem is a time series problem where you’re trying to predict the demand of potato chips in the next month using the demand data of this month, the previous month and so on. In other words, the sales prediction of the next month can be predicted from the sales record of this month. You haven’t understood all the details of this regression yet, but hooman said he will explain this later.

Now the hooman thinks that you are prepared for starting some real work with all these data. He believes you have understood how to identify your questions and which approach you should take to explain your problems.

Marian

That was purr-fect, looking forward to the next part.

Marjan Ferdousi

Nyan~ The next part will come soon!

Azalea

As a cat person, this makes so much more sense. For real.

Matt Curcio

Great Job!
LOVE D.S. and Cats ;))

Miguel Manjarres

I liked the problem setup, it's a nice hook to the post