Introduction
Think of what data is for a moment, from gathering it through sharing the final report with the domain or company to enable them make informed decision-making.
Data collection has its dos and don'ts, and in order to make sense of the data so that you may derive insights, a method known as exploratory data analysis (EDA) must be used.
EDA aids in learning more about the data set and discovering patterns, trends, and connections between them. In order to obtain high-quality data, it also helps to discover missing values, outliers, and anomalies.
We employ statistical functions and tools to complete all of this.
What is EDA
Exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main
characteristics, often using statistical graphics
and other data visualization methods this is according to wikipedia
With EDA, you may examine data, find patterns, anomalies, and outliers, as well as fill in missing values to get the proper data for insights and machine learning models (both supervised and unsupervised).
Steps of performing EDA
1. Understanding Domain
2. Get the right libraries and Read file
3. Cleaning Data & Preparing data
4. Feature Engineering
5. Data Visualization & Interpretation
Tools Used For EDA
There are a several tools used to do EDA, including Tableau, Power BI, and Python libraries, among many others.
Python Libraries
- Numpy
- Pandas
- Matplotlib
- Seaborn
Others
Tableau
PowerBI
Statistical Function & Techniques for EDA tools
- Clustering and Dimension
- Univariate Viausalization
- Bivariate Visualization
- Multivaraite
- K-means Clustering
- Predictive Models
To get more understanding of how this techniques read through IBM
Data Visualization in EDA
Graphical techniques:
Box plot, Histogram, Multi-vari chart
Run chart, Pareto chart,
Scatter plot (2D/3D),Heat map
Bar chart
Dimensionality reduction:
Multidimensional scaling
Principal component analysis (PCA)
Multilinear PCA
Nonlinear dimensionality reduction (NLDR)
Iconography of correlations
Typical quantitative:
Median polish
Trimean
Ordination
Conclusion
This article may not include much information, but it will offer you a general idea of what to have and when to use certain tools and basic actions. EDA is essential for understanding data insights and obtaining clean data for modeling (ML). Google is your friend outside of this, though.
REMEMBER: "A picture is worth a thousand words"
Top comments (0)