DEV Community

Cover image for Exploratory Data Analysis using Data Visualization Techniques.
Victor Midamba Jr
Victor Midamba Jr

Posted on

Exploratory Data Analysis using Data Visualization Techniques.

What is Exploratory Data Analysis?

Exploratory data analysis refers to the essential data investigation process before the formal analysis to spot patterns and anomalies, discover trends, and test hypotheses with summary statistics and visualizations. It gives an idea about the data we will be digging deep into while analyzing. It aids in formulating how we can handle data during analysis, like choosing models, handling outliers, deciding model accuracy parameters, etc. Visualization helps to infer insights easily from massive datasets.

Types of Exploratory Data Analysis

1. Histograms

Histograms are two-dimensional plots in which the x-axis divide into a range of numerical bins or time intervals. The y-axis shows the frequency values, which are counts of occurrences of values for each bin. When compared with bar graphs, which have gaps between the bars to indicate that they compare distinct groups, but there are no gaps in histograms. Hence, they tell us if the distribution is left/positively skew, right/negatively skewed, bi-modal, normal, or uniform. Generally, a histogram can best be used when there is the need to reveal the comparison of the distribution of certain numerical data in numerous ranges of interval.

Image description

2. Bar Graphs

Bar charts are used to visualize the distribution or frequency of a categorical variable. Each category is represented by a bar, and the height of the bar corresponds to the frequency or count of observations in that category. Bar charts help compare the distribution of categorical variables or identify dominant categories. Whereby the categorical variables is often on the y-axis(vertical) while the comparable numerical variables are usually plotted on the x-axis(horizontal).

Image description

3. Scatter Plots

Scatter plots are used to visualize the relationship between two numerical variables. Each observation is represented as a point on the plot, with the x-axis representing one variable and the y-axis representing the other. Therefore, a scatter plot plays a vital role in testing the strength of a relationship between two variables by providing a visual and statistical means.

Image description

4. Pie Charts

Pie charts are used to visualize the proportion or percentage distribution of different categories within a whole. Each category is represented by a slice of the pie, with the size of the slice corresponding to its proportion.

Image description

5. Box Plots

Box plots summarize the distribution of a numerical variable using quartiles, median, and outliers. They provide information about the spread, skewness, and presence of extreme values in the data. Box plots are particularly useful for comparing distributions across different groups or categories.

Image description

6. Line Plots

Line plots are used to visualize the trend or pattern of a numerical variable over time or another continuous variable. They connect data points with lines, providing insights into the overall trend, seasonality, or fluctuations in the data.

Image description

7. Correlation Plots (Heat Maps)

Heatmaps are graphical representations of data where the values of a matrix are represented using colors. They are particularly useful for visualizing correlation matrices or displaying patterns in large datasets, making them to be easier to conduct an analysis at a glance since they are more visual than standard analytics report.

Image description

Conclusion

These are among the examples of data visualization techniques used in EDA. The choice of visualization depends on the nature of the data and the research questions or objectives. Using a combination of these techniques helps in gaining a comprehensive understanding of the dataset and identifying insights that can drive further analysis or decision-making.

Top comments (0)