Frame the question approach
So the idea is you are given a dataset, but more often than not it is not the ideal dataset that you can use directly in the machine learning pipeline.You need to get it as close to the ideal state as possible.And the way to do this is to ask questions to your dataset. Ask questions so that you get answers from the dataset and transform that dataset into something that can be used to get results.And the questions depend upon what results you want, or what your end goal is.
Also, note that the questions don't necessarily have to come first. More often than not exploring the data, and combining different operations can give rise to a solution that can answer a question.So it is hard to tell whether the chicken came first or the egg.
Eventually, the end goal is to make a dataset that is much more useful in the context of that is set out to be solved.
Dataset
The dataset considered here is of 120 years of Olympic history athlete and results.Download this dataset if you want to follow along.
What will be covered?
- Read and select data
- Summary statistics
- Group and sort data
- create DataFrame
- Combine data (merge, concat)
Pre- requesite
- Basic python knowledge
- Jupyter notebook setup
- Install pandas (pip install pandas)
Check out the full blog here : Picture Pandas - Little Intro guide (Olympics data)
Top comments (0)