DEV Community

Hanjin Liu
Hanjin Liu

Posted on

I unified all the plotting libraries of Python

Recently, I published a python module, whitecanvas.

GitHub logo hanjinliu / whitecanvas

A type safe and backend independent plotting library for Python.

whitecanvas

PyPI - Version Python package index download statistics PyPI - Python Version

A type safe and backend independent plotting library for Python, aiming at not the simplest, but the tidiest API.

Installation

pip install whitecanvas -U
Enter fullscreen mode Exit fullscreen mode

Project Philosophy

Type safety

All the methods should be designed to have nice signature, and should return the same type of object, so that your program can be statically checked by the IDE.

Backend independency

Every plotting library has their own strength and weakness. Same code should work on different backends, so that you can choose the best one for different purposes.

Currently supported backends are matplotlib, pyqtgraph, vispy, plotly and bokeh. If you want other backends, please feel free to open an issue.

API tidiness

Most of (probably all of) the plotting libraries rely on the large number of arguments to configure the plot elements. They are usually hard to remember forcing you to look up the documentation every…

Documentation

whitecanvas is a wrapper for most of the major plotting libraries including:

which means that you can select which backend to use depending on your purpose. However, the backend independency is not the main goal of my project. What I really wanted to do is to overcome common problems of the existing plotting libraries. In this post, I'll discuss the problems and how whitecanvas solved it.

Programming is not only coding but type-checking

Python was developed as an interpreter language, but as people started to realize that the types are needed for humans rather than for the machines, "type annotation" was introduced as a part of the Python syntax. In line with this, the performance of static type-checking functionalities of editors (such as VSCode and PyCharm) have grown rapidly. If you're writing well-typed Python scripts, powerful suggestions and auto-completion are available, helping you to code what you want to do very smoothly, even without looking for the documentations. Type-checking is now an important aspect of programming.

However, most of the plotting libraries lack this perspective.

Type inconsistency and non-interpretability

Some of you may already have faced this problem.

import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 1)
axes[0].plot  # <-- type-checking fails
Enter fullscreen mode Exit fullscreen mode

This is because axes is not typed, probably because the function plt.subplots returns a ndarray of Axes objects, but the dimension of ndarray is unknown until runtime and its components cannot be inferred as Axes.

Similarly, plt.errorbar returns objects of different type depending on the arguments.

All the functions should be well-typed and return the same type.

Auto-completion doesn't work

It is hard to use most of the functions of matplotlib and bokeh without carefully read the documentations because of the *args and **kwargs. Things are even worse for pyqtgraph; because many methods are dynamically set by setattr, even the method names cannot be auto-completed.

We should avoid these implementations to improve the coding experience.

High-level API is not what we need

There are countless numbers of customizability in plotting functions. If all the parameters for customization are placed in a function, the function will become huge. For example, the scatter function of plotly has 41 (!!) arguments if I'm not mistaken. To find how to do something you want, you'll have to browse all of these. Is this what we wanted?

Another problem of high-level API is that functions are usually too "smart" or have mutually incompatible arguments. For example, the kdeplot function of seaborn automatically determines how to plot the input data based on the given information and even support cumulative plot for some reason.

We should carefully design the functions, such as categorizing functions by method-chaining. Similarly, it is very difficult to remember how to modify eaach property (title texts, color, font, and the x-axis ranges etc.). Splitting the properties into namespaces should be a solution.

Data visualization with whitecanvas

Here, I'll show how to plot with whitecanvas. It's available in PyPI.

pip install whitecanvas -U
pip install whitecanvas[matplotlib] -U  # if you want to plot with matplotlib
Enter fullscreen mode Exit fullscreen mode

Basics

new_canvas creates a Canvas on which you'll add layers that represent the data you want to visualize. You can specify which backend to use by "(name of library):(name of lower backend (optional))". For example, the Qt backend of matplotlib can be specified as follow:

import numpy as np
from whitecanvas import new_canvas

canvas = new_canvas("matplotlib:qt")
Enter fullscreen mode Exit fullscreen mode

Basically, the API is similar to those in matplotlib.

x = np.linspace(-np.pi, np.pi, 100)
line = canvas.add_line(np.cos(x*2), np.sin(x*3), color="red")
markers = canvas.add_markers(np.cos(x), np.sin(x), color="gray")
canvas.show()
Enter fullscreen mode Exit fullscreen mode

Image description

Layers are stored in canvas.layers.

canvas.layers
# Out: LayerList([Line<'line'>, Markers<'markers'>])
Enter fullscreen mode Exit fullscreen mode

Designs such as the color can be set later. This feature is very important if you want to embed the canvas in a GUI application (especially when you use pyqtgraph orvispy).

# Since a `Line` is only a line, properties are under the layer itself.
line.color = "blue"
line.style = "--"
line.width = 3

# Since a `Markers` has both faces and edges, properties are under `face` and `edge`.
markers.face.color = "pink"
markers.edge.color = "purple"

# Properties specific to `Markers` are under the layer itself.
markers.size = 18
markers.symbol = "D"
Enter fullscreen mode Exit fullscreen mode

Image description

Update axis labels and ranges

I also reorganized these properties into namespaces of Canvas.

  • canvas.x: Namespace of the x-axis. Same for the y-axis.
    • canvas.x.label: Namespace of the x-axis label.
    • canvas.x.ticks: Namespace of the x-axis ticks.
  • canvas.title: Namespace of the title.

Following example shows how to use these properties. Let's switch to the bokeh backend here.

canvas = new_canvas("bokeh")
canvas.add_bars([0, 1, 2], [5, 4, 6])  # bar plot for now

canvas.x.lim = (-1, 3)  # update the range
canvas.x.label.text = "X axis"  # update the text string
canvas.x.label.color = "red"  # update the color
canvas.y.ticks.set_labels([4, 5, 6], ["[4]", "[5]", "[6]"])  # override the tick labels
canvas.title.text = "Test namespaces"  # update the title text
canvas.title.size = 28  # update the font size of the title text
canvas.show()
Enter fullscreen mode Exit fullscreen mode

Image description

Drawing complex plots

The more complex a plot is, the more cumbersome its API will becomes. whitecanvas solved this problem by method chaining.

Here's an example of visualizing time-series data with a line, markers and errorbars. Let's try pyqtgraph this time.

# sample data
time = np.linspace(0, 5, 25)
observations = np.exp(-time/3) * np.cos(time*3)
errors = np.abs(observations) / 3 + 0.3

# only plot the observations
canvas = new_canvas("pyqtgraph")
canvas.add_line(time, observations)
canvas.show()
Enter fullscreen mode Exit fullscreen mode

plot with only observations

Even if we only focused on the colors, there are four:

  • Color of the line.
  • Color of the marker faces.
  • Color of the marker edges.
  • Color of the errorbars.

and if we want to specify the line dash and the marker sizes, the API will immediately become "hopeless". Nevertheless, if we plot each component one by one, there will be duplicate of arguments, which makes the code redundant and error-prone.

In whitecanvas, new components are added by with_* methods. In this case, we'll call with_markers to add markers and with_yerr to add errorbars, just like you're drawing a graph with your hands.

canvas = new_canvas("pyqtgraph")
layer = (
    canvas
    .add_line(time, observations)
    .with_markers()
    .with_yerr(errors, capsize=0.2)
)
canvas.show()
Enter fullscreen mode Exit fullscreen mode

data with markers and errors

Isn't it much better? Arguments are split into different methods, and the chained methods (with_markers and with_yerr) have the same signatures as the original methods (add_markers and add_errorbars) except for the arguments for the data coordinates.

Visualize DataFrames

seaborn and plotly.express are designed to make data visualization and exploration easier, and actually, they are very convenient for the daily data analysis tasks. In order to follow this idea and make it better, whitecanvas was designed as follows:

  • Do not stick to pandas. If one has polars.DataFrame or dict already, importing pandas is waste of time and don't even need to be installed.
  • Make a plotter using the input data and x/y column names instead of providing methods directly. I got this idea from seaborn.objects.Plot. Indeed, it's reasonable to make a new object since it seems very rare that we have to change the data of x/y axis for a single DataFrame (if it changes, there's no mean plotting them in the same canvas anymore).
  • Split plotters for categorical data and numeric data. Don't make functions too "smart"! We will never use violin plot for numeric × numeric data.

As an example, we'll visualize the open dataset iris. We'll use seaborn to load the dataset.

Numeric × Numeric

from seaborn import load_dataset
from whitecanvas import new_canvas

# load iris as a pandas.DataFrame
df = load_dataset("iris")

canvas = new_canvas("matplotlib:qt")

(
    canvas
    .cat(df, x="sepal_length", y="sepal_width")
    .add_markers(color="species")
)
canvas.add_legend()
canvas.show()
Enter fullscreen mode Exit fullscreen mode

iris

Here, cat() is used to create a plotter for numeric × numeric data visualization. DataFrame df and the x/y columns are specified here. The add_markers() method then determines how the markers will be displayed. Here, colors are separated by "species," and a legend item can be added automatically with canvas.add_legend().

Categorical × Numeric

Next, let's plot a data with categorical x-axis. A column named "is_petal_wide" is added to show whether "petal_width" is large, which is distinguished by the marker symbols. A nice thing about plotly and bokeh is that you can tag information on each data point and see them by hovering. I implemented the same functionality to matplotlib and pyqtgraph.

from seaborn import load_dataset
from whitecanvas import new_canvas

# load iris as a pandas.DataFrame and add a new column
df = load_dataset("iris")
df["is_petal_wide"] = df["petal_width"] > df["petal_width"].mean()

canvas = new_canvas("matplotlib:qt")

(
    canvas
    .cat_x(df, x="species", y="sepal_length")
    .add_stripplot(symbol="is_petal_wide")
)
canvas.add_legend()
canvas.show()
Enter fullscreen mode Exit fullscreen mode

strip with tooltip

cat_x() interprets the x-axis to be a categorical axis. Because cat_x() and cat_y() are implemented, we don't need the arguments like orient in seaborn.

Aggregation

Plotters for aggregations such as mean of each category are also implemented. Method mean() returns a new plotter with aggregated data.

from seaborn import load_dataset
from whitecanvas import new_canvas

# load iris as pandas.DataFrame
df = load_dataset("iris")

canvas = new_canvas("matplotlib:qt")

plotter = canvas.cat_x(df, x="species", y="sepal_width")
plotter.add_stripplot(color="orange")  # all data
plotter.mean().add_line(width=3, color="red")  # mean
canvas.show()
Enter fullscreen mode Exit fullscreen mode

strip with agg

Method chaining

The idea of method chaining is also very helpful to keep the API tidy. For example, box plot is usually combined with strip plot that shows the outliers. Arguments specific to the outliers can be split into the chained method.

from seaborn import load_dataset
from whitecanvas import new_canvas

# load dataset "tips"
df = load_dataset("tips")

canvas = new_canvas("matplotlib:qt")

(
    canvas
    .cat_x(df, x=["smoker", "sex"], y="tip")
    .add_boxplot(color="time")
    .with_outliers(size=9, symbol="o", update_whiskers=False)
)
canvas.add_legend()
canvas.show()
Enter fullscreen mode Exit fullscreen mode

Image description

And others ...

Multi-dimensional data

Methods for dealing with multi-dimensional data are under the dims namespace. Although only low-level functions are implemented at the moment, they work pretty nicely.

from skimage.data import brain
from whitecanvas import new_canvas

img = brain()  # 3D image

canvas = new_canvas("matplotlib:qt")

# Show 3D image along the "time" axis
canvas.dims.in_axes("time").add_image(img)

# Make and show a slider widget
canvas.dims.create_widget().show()

canvas.show()
Enter fullscreen mode Exit fullscreen mode

3D image

Mouse event handling

Event handling is implemented in canvas.events namespace. The canvas.events.mouse_moved can listen the mouse-moved event. Please take a look at the source code if you are interested in it!

Mouse event

Top comments (0)