DEV Community

Nechama Borisute
Nechama Borisute

Posted on

Presenting data

As a data scientist, I find myself not only finding the data required, but also needing to think about how to present the data in a helpful, meaningful fashion I.E. Data Visualization. Though these two fields are distinct, they are often dependent on each other, for if I cannot present or visualize the data I have obtained in a helpful way, it can mean that the data is incomplete or even incorrect.

One database company has a whitepaper written to advance their product, and this is an excerpt from the paper about time series data:

A line graph is the simplest way to represent time series data. It helps the viewer get a quick sense of how something has changed over time:

lineplot time series

This statement is hard to deny, but the placeholder graph can be applied to a very broad range of data, not especially for time series data. That being the case, I feel there is actually a lack in the sense of how things changed since the graph is so generic.

Presenting data is a skill very distinct from isolating and compiling it from the raw data sources. Just as I make use of dedicated Python libraries for building data sets, I feel one should go to dedicated presentation resources when thinking about how to present data.

Here is a graph, modeled after one of the famous charts presented by the Statistician Hans Rosling in his TED talks:

Time series image

The format Rosling chose here manages to let you "get a sense" of a very large dataset very quickly and in an unprecedentedly dramatic fashion.

One popular product for data visualization is Seaborn, a python library. Dedicated products enable me to quickly try out different views and formats for my data and this enables quick feedback to see if I am on the right track. One advantage of Seaborn is that its code is very succinct, allowing you to create plots and grids in just a few lines of code, sometimes just one will suffice.

Here are a few plots made with Seaborn:

# import seaborn library into memory
import seaborn as sns
# load in a built-in dataset from seaborn to use for plots
df = sns.load_dataset('iris') 
# check dataset to see column names
df.info()
Enter fullscreen mode Exit fullscreen mode
# plot a histogram of sepal length according to species
sns.histplot(data=df, x="sepal_length", hue="species", element="step");
Enter fullscreen mode Exit fullscreen mode

histogram

# plot a pairplot for all features
sns.pairplot(df);
Enter fullscreen mode Exit fullscreen mode

pairplot

# plot a scatterplot for petal information
sns.scatterplot(x = 'petal_width', y= 'petal_length', hue = 'species', data = df);
Enter fullscreen mode Exit fullscreen mode

scatterplot

# plot a barplot of sepal information on x and y
sns.barplot(data= df, x = 'sepal_width', y='sepal_length')
Enter fullscreen mode Exit fullscreen mode

barplot

These are but a few examples of the awesome things Seaborn can help you accomplish. To learn about Seaborn, check out the docs.

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more