Exploratory Data Analysis
It is basically an approach to summarize the main characteristics of the data through statistics and visualization to gain better understanding of the data set for further analysis. It involves the following steps
1.Data Cleaning or Wrangling: Involves handling missing values, removing outliers and data transformation
2.Data exploration: Identifying patterns or relationships
3.Feature Engineering: introducing a new feature or selecting relevant variables for analysis
4.Data visualization: Presenting insights through charts or graphs
The most common types of exploratory Data Analysis are;
Univariate analysis deals with data with only one variable. It can be represented using descriptive statistics, box plots or scatter plot
multivariate analysis deals with multiple variables to identify any relation between them. We majorly use correlation which measures the extent different variables are interdependent. For example rain to umbrella. Note that correlation doesn't imply causation. We can use GroupBy method in Python or Heatmap
Data visualization techniques
The importance of data visualization is simple: it helps people see, interact with, and better understand data.
Matplotlib -comprehensive library for creating static, animated, and interactive visualizations in Python
Seaborn - data visualization library that is commonly used for data science and machine learning tasks
Histograms - graph showing frequency distributions
Heat maps- plots target variable over multiple variable using color
Top comments (0)