DEV Community

Nechama Borisute
Nechama Borisute

Posted on

Presenting data

As a data scientist, I find myself not only finding the data required, but also needing to think about how to present the data in a helpful, meaningful fashion I.E. Data Visualization. Though these two fields are distinct, they are often dependent on each other, for if I cannot present or visualize the data I have obtained in a helpful way, it can mean that the data is incomplete or even incorrect.

One database company has a whitepaper written to advance their product, and this is an excerpt from the paper about time series data:

A line graph is the simplest way to represent time series data. It helps the viewer get a quick sense of how something has changed over time:

lineplot time series

This statement is hard to deny, but the placeholder graph can be applied to a very broad range of data, not especially for time series data. That being the case, I feel there is actually a lack in the sense of how things changed since the graph is so generic.

Presenting data is a skill very distinct from isolating and compiling it from the raw data sources. Just as I make use of dedicated Python libraries for building data sets, I feel one should go to dedicated presentation resources when thinking about how to present data.

Here is a graph, modeled after one of the famous charts presented by the Statistician Hans Rosling in his TED talks:

Time series image

The format Rosling chose here manages to let you "get a sense" of a very large dataset very quickly and in an unprecedentedly dramatic fashion.

One popular product for data visualization is Seaborn, a python library. Dedicated products enable me to quickly try out different views and formats for my data and this enables quick feedback to see if I am on the right track. One advantage of Seaborn is that its code is very succinct, allowing you to create plots and grids in just a few lines of code, sometimes just one will suffice.

Here are a few plots made with Seaborn:

# import seaborn library into memory
import seaborn as sns
# load in a built-in dataset from seaborn to use for plots
df = sns.load_dataset('iris') 
# check dataset to see column names
df.info()
Enter fullscreen mode Exit fullscreen mode
# plot a histogram of sepal length according to species
sns.histplot(data=df, x="sepal_length", hue="species", element="step");
Enter fullscreen mode Exit fullscreen mode

histogram

# plot a pairplot for all features
sns.pairplot(df);
Enter fullscreen mode Exit fullscreen mode

pairplot

# plot a scatterplot for petal information
sns.scatterplot(x = 'petal_width', y= 'petal_length', hue = 'species', data = df);
Enter fullscreen mode Exit fullscreen mode

scatterplot

# plot a barplot of sepal information on x and y
sns.barplot(data= df, x = 'sepal_width', y='sepal_length')
Enter fullscreen mode Exit fullscreen mode

barplot

These are but a few examples of the awesome things Seaborn can help you accomplish. To learn about Seaborn, check out the docs.

Top comments (0)