DEV Community

Cover image for Exploratory Data Analysis-Data Visualization Techniques
ptah
ptah

Posted on

Exploratory Data Analysis-Data Visualization Techniques

unlocking Insights through the power of Visualization

Hello there.....This week we'll dive deep into the world of Exploratory Data Analysis(EDA), using Data Visualization Techniques. Get ready to uncover hidden patterns in data, gain insights, and make informed decisions like never before.
*what exactly is EDA (Exploratory Data Analysis)? *
EDA is the first and the most crucial step in data analysis. It's where whereby we take into action, a "Detective's" approach to unravel the story behind a particular data, and what is the best way to tell a story, other than through its visuals?

POWER OF DATA VISUALIZATION

In the Data world, Visualizations act as the storytellers. They bring Data to life, by making it accessible and understandable-and especially to people not adept to the tech world. Data visualizations, either through shapes, graphs, patterns or colors, provide a better way in which we can perceive information. We are going to briefly look at some popular visualization techniques/types:

  1. Histograms They are a fantastic way for understanding the distribution of numerical data. They illustrate the distribution of data, making it easy to spot trends and outliers.

code example for generating a histogram

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randn(1000)### use your dataset
plt.hist(data, bins=20, color='skyblue', edgecolor='black')
plt.title('Histogram Example')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

Enter fullscreen mode Exit fullscreen mode

2. Scatter plots
This is a perfect way of visualizing the relationship between two numerical variables. in essence that they are positively or negatively correlated ,or not correlated at all , with the highest value being 0.99 which means they are highly correlated.
import matplotlib.pyplot as plt
import numpy as np
x=
y=
plt.scatter(x, y, color="green", marker="o")
plt.title("") ###Insert your title
plt.xlabel("") ###the x variable
plt.ylabel("") ##y-axis label
plt.show()

3. Bar charts
Bar charts are an excellent way for comparing categorical data. enabling you to easily identify which categories are most prevalent and also identify trends or anomalities.
`import matplotlib.pyplot as plt

categories = ['Category A', 'Category B', 'Category C']
values = [10, 20, 15]
plt.bar(categories, values, color='purple')
plt.title('Bar Chart Example')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
`

4. Box plots
Box plots generally provide a visual summary of the distribution of data, displaying the median, quartiles and outliers, allowing you to grasp data variability from a glance.
`import seaborn as sns
import matplotlib.pyplot as plt

data = sns.load_dataset('')
sns.boxplot(x='', y='', df=data)
plt.title('Box Plot Example')
plt.xlabel('Species')
plt.ylabel('Sepal Length')
plt.show()

**5.Heatmaps**
Heatmaps are an excellent way of revealing patterns and insights in large datasets. it invovles showing the correlation between variables, whereby the intensity of the color shows the strength of the relationship.
import seaborn as sns
import matplotlib.pyplot as plt

data = sns.load_dataset('')
pivot_data = data.pivot('', '', '')
sns.heatmap(pivot_data, cmap='YlGnBu')
plt.title('Heatmap Example')
plt.xlabel('')
plt.ylabel('')
plt.show()
`

Let's dive into some examples

sales analysis
Imagine you are working for a retail company, and you have been issued with the sale data for the past one year. What are your goals?- To Identify sales trend and identify areas of improvement right?
In this instance, you can use Line charts, to visualize the monthly sales trends, and in this you can answer questions such as; are there seasonal fluctuations? month with peak sales among others.
Medical Research
Imagine a scenario whereby you are a medical researcher, analyzing medical data. You can leverage box plots, in comparing the distribution of something like cholesterol levels between different patient groups.
Customer segmentation
Imagine in a scenario whereby you are part of a marketing team, and you are issued with a task to segment customers based on their behaviors. In an instance like this, scatter plots can help you understand better the relationship between different variables such as the purchase frequency and the average spending per visit
EDA doesn't just stop at there. through EDA you can interpret data way better. Ask questions, make hypothesis and let the data guide you to discovering other things. It's through EDA that you can identify whether your data deserve further investigation. That's where the real adventure begins
wrapping up
In EDA- through visualization is where you uncover the hidden stories within your data, through this visualization, you are well prepared to make informed decisions, solve complex-problems and share your insights.
Stay Tuned as we tackle this journey of Data science

Image of a sales team  discussing

Top comments (0)