What is data analysis?
Data analysis is a systematic process of applying systematic or logical techniques to describe and evaluate data. In this process, data is cleaned, analyzed, interpreted, and visualized using various techniques and business intelligence tools.
What is Exploratory Data Analysis (EDA)?
Exploratory Data Analysis (EDA is an important preprocess before one indulges in exploring data analysis. It is used by data enthusiasts to analyze and investigate data sets often employing data visualization methods.
The concept of exploratory data analysis is regarded as a vital data investigation procedure before the formal analysis. It seeks to determine how best to manipulate data sets to answer the questions you have, making it easier to discover patterns and anomalies, discover trends, and test hypotheses with summary statistics and visualizations. During the process of Exploratory Data Analysis (EDA), data scientists are able to understand the dataset key characteristics. It assists in formulating how to handle data during the analysis procedures like picking models.
Types of exploratory data analysis
There are four primary types of Exploratory Data Analysis. They include:
Univariate non-graphical.
It is the simplest form of data analysis. In this form of data analysis, the data being analyzed consists of just one variable and it deals with analyzing a single variable mainly to identify patterns, trends, and outliers in the data. Univariate analysis is mostly used to describe the data and identify any patterns within it.
Univariate graphical.
This method provides graphical presentation. Some Common types of univariate graphics include: Stem-and-leaf plots, Histograms and Box plots.
Multivariate nongraphical:
This entails simultaneously evaluating several variables to find patterns, trends, and correlations in the data Usually, cross-tabulation or statistical approaches are used to show the relationship between two or more data variables.
Multivariate graphical:
This method provides graphical presentation to show relationships between two or more types of data. A grouped bar plot or bar chart is the most commonly used visual, with each group representing one level of one of the variables and each bar inside a group indicating the levels of the other variable.
EDA Tools
The following are some of the tools used in exploratory data analysis:
Python–It is an interpreted, object-oriented programming language with dynamic semantics. Python and EDA are frequently used together to detect missing values in a data set, which is critical for deciding how to handle missing values for machine learning.
R - This is an open-source programming language and free software environment for statistical computing and graphics. The R programming language is commonly used by statisticians to create statistical observations and analyze data.
Top comments (1)
Great article, Kuria! BTW, Welcome to the DEV community. 🎉