DEV Community

Sinasr6
Sinasr6

Posted on

Exploratory Data Analysis using Data Visualization Techniques

What is EDA and what does it have to do with data visualization?
EDA or exploratory data analysis is the statistical process of analyzing and comprehending the data, ultimately summarizing the main features and important information within. One of the most common and effective tools utilized for EDA is data visualization techniques. Using these, we can present the information and insights with charts and graph and etc. to be more comprehensible and easier to understand for shareholders and people without technical knowledge.

What techniques are viable here?
Data visualization offer a myriad of tools and methods for different purposes. One of the most commonly used techniques is pie charts. Pie charts are often used to display proportions or for simple comparing of variables. The neat thing about these is that they are easy to understand, and they can conveniently present some key takeaways. They are however unable to get a lot of information across, for those who require it.

Another easy to use tool is bar graph. Bar graphs can be used to illustrate multiple categories on the x-axis, and a measured value for comparison on the y-axis. The length of each bar is a visual factor for clearly differentiating the categories. But when the number and type of categories begin to increase, these graphs will fall short.

A similar chart to bar graph is Histogram, though these are used to display the distribution of data within regular intervals. Histograms are especially useful for showing frequency of events. We might also get an idea about irregular concentrations or gaps.

Box plots are a very useful technique, when it comes to visual representation. Using the concept of quartiles, a box will be drawn between the first and the third quartile. Another line then is drawn representing the median or the second quartile. Finally, a line from the minimum value to the box, and from the box to the maximum is drawn. Dots outside of the aforementioned line usually mean outliers. Box plots visualize the skewness of the dataset.
When the dataset is a large one, it can be challenging to identify trend and patterns within it. Scatter plots come to aid here. The are used to explore the relationship between two variables represented by points plotted against the vertical and horizontal axis. In these charts the closer the points are cuddled together, the stronger their correlation will be.

Speaking of relationship among variables, Heat maps are an amazing technique with multiple application. They basically visually represent the strength of relationship between all of the variables, making it easy to draw conclusions and understand how each variable will affect another.

One other tool for this very purpose, is correlation matrix. They show actual correlation coefficient numbers in a table like format and use color coding to help with visualization. Summarizing and finding patterns in large datasets can mentioned as applications for this method.

Why should I learn these?
In this article I mentioned a few of the most common and basic visualization methods that can assist EDA. These techniques are absolutely essential as the allow us to find patterns, clusters, outliers, trends and etc. without having to look at raw numbers. They also assist in communicating our findings and deductions to other more effectively. Therefore, mastering usage of visualization is a must have tool in the tool box of anyone who works with data.

Top comments (0)