DEV Community

Kipngeno Ruto.
Kipngeno Ruto.

Posted on

Exploratory Data Analysis

Exploratory Data Analysis is a critical step in every data science project's life cycle. It is important in that, it allows you to find patterns that exist in the data, inconsistencies or data quality concerns ranging from outliers, missing values, incorrect data types, and most importantly, it aids in the preliminary selection of the suitable models.'It's like a detective conducting an investigation'

Exploratory data analysis is categorized into 4:
1 Univariate non-graphical
Univariate non-graphical analysis is typically used to examine a single variable or feature. Assume you have a dataset containing the income of people in a specific country; a summary statistic such as mean can be used to calculate the average income of people residing in that country.

2 Graphical univariate
Apart from summary statistics, we can also visualize the data for better understanding; this is the most common method to use, particularly when presenting your findings to non-technical teams, due to the ease with which the information is conveyed, graphical univariate method involves visualizing only one variable/feature. You can use many graphical representations such as bar graphs, line graphs, and so on.

3 Multivariate non-graphical
Multivariate non-graphical, as the name suggests is a method that involves applying summary statistics to more than one variable. An example can be, the management wants to understand various factors contributing to customer churn. In this case, the data scientist will be tasked with looking at various features to see how they contribute to customer churn in an organization and come up with tabular data containing the results.

4 Graphical Multivariate
This method involves visualization of more than one variable to show the distribution and relationship between them. In the example mentioned above, the data scientist, instead of presenting the results in a tabular format, the most effective way will be to visualize the results for easy understanding. Here you also can use various graphical methods available e.g line graphs, bar graphs e.t .c

Top comments (0)