DEV Community


Posted on


3.Data Preparation

For learning intro to statistics check here.
For Mathematics for data science intro check here.
In this section we perform : cleaning(wrangling)----eg.filling missing values ,dropping duplicates,standardizing columns analysis(univariate and bivariate analysis) preprocessing,Encoding variables
Scaling and normalizing our data—scaling using STANDARDSCALER AND NORMARIZING USING
4.Checking of outliers
5.Binning our features
6.Feature engineering
7.Checking correlation
Visualizing our features using ggplot In matplotlib
Visualizing using percentages on our bars charts
Also can filter columns which have same data types
Checking for imbalanced datasets using visuals and using smote or nearmiss to balance them—
especially classification problems.—accuracy is not a good metric to measure imbalanced
Performing feature selection and importance.check here-

cheers!! Happy learning

Top comments (0)

An Animated Guide to Node.js Event Loop

>> Check out this classic DEV post <<