DEV Community

WanjohiChristopher
WanjohiChristopher

Posted on

1 1

3.Data Preparation

For learning intro to statistics check here.
For Mathematics for data science intro check here.
In this section we perform :

1.data cleaning(wrangling)----eg.filling missing values ,dropping duplicates,standardizing columns
2.data analysis(univariate and bivariate analysis)
3.data preprocessing,Encoding variables
Scaling and normalizing our data—scaling using STANDARDSCALER AND NORMARIZING USING
MINIMAXSCALER
4.Checking of outliers
5.Binning our features
6.Feature engineering
7.Checking correlation
Visualizing our features using ggplot In matplotlib
Visualizing using percentages on our bars charts
Also can filter columns which have same data types
Checking for imbalanced datasets using visuals and using smote or nearmiss to balance them—
especially classification problems.—accuracy is not a good metric to measure imbalanced
datasets.
Performing feature selection and importance.check here-

https://www.kaggle.com/swet44/notebook1c

cheers!! Happy learning

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay