DEV Community

WanjohiChristopher
WanjohiChristopher

Posted on

3.Data Preparation

For learning intro to statistics check here.
For Mathematics for data science intro check here.
In this section we perform :

1.data cleaning(wrangling)----eg.filling missing values ,dropping duplicates,standardizing columns
2.data analysis(univariate and bivariate analysis)
3.data preprocessing,Encoding variables
Scaling and normalizing our data—scaling using STANDARDSCALER AND NORMARIZING USING
MINIMAXSCALER
4.Checking of outliers
5.Binning our features
6.Feature engineering
7.Checking correlation
Visualizing our features using ggplot In matplotlib
Visualizing using percentages on our bars charts
Also can filter columns which have same data types
Checking for imbalanced datasets using visuals and using smote or nearmiss to balance them—
especially classification problems.—accuracy is not a good metric to measure imbalanced
datasets.
Performing feature selection and importance.check here-

https://www.kaggle.com/swet44/notebook1c

cheers!! Happy learning

Top comments (0)