Recently, I published a python module, whitecanvas
.
hanjinliu / whitecanvas
A type safe and backend independent plotting library for Python.
whitecanvas
A type safe and backend independent plotting library for Python, aiming at not the simplest, but the tidiest API.
Installation
pip install whitecanvas -U
Project Philosophy
Type safety
All the methods should be designed to have nice signature, and should return the same type of object, so that your program can be statically checked by the IDE.
Backend independency
Every plotting library has their own strength and weakness. Same code should work on different backends, so that you can choose the best one for different purposes.
Currently supported backends are matplotlib
, pyqtgraph
, vispy
, plotly
and
bokeh
. If you want other backends, please feel free to
open an issue.
API tidiness
Most of (probably all of) the plotting libraries rely on the large number of arguments to configure the plot elements. They are usually hard to remember forcing you to look up the documentation every…
whitecanvas
is a wrapper for most of the major plotting libraries including:
which means that you can select which backend to use depending on your purpose. However, the backend independency is not the main goal of my project. What I really wanted to do is to overcome common problems of the existing plotting libraries. In this post, I'll discuss the problems and how whitecanvas
solved it.
Programming is not only coding but type-checking
Python was developed as an interpreter language, but as people started to realize that the types are needed for humans rather than for the machines, "type annotation" was introduced as a part of the Python syntax. In line with this, the performance of static type-checking functionalities of editors (such as VSCode and PyCharm) have grown rapidly. If you're writing well-typed Python scripts, powerful suggestions and auto-completion are available, helping you to code what you want to do very smoothly, even without looking for the documentations. Type-checking is now an important aspect of programming.
However, most of the plotting libraries lack this perspective.
Type inconsistency and non-interpretability
Some of you may already have faced this problem.
import matplotlib.pyplot as plt
fig, axes = plt.subplots(2, 1)
axes[0].plot # <-- type-checking fails
This is because axes
is not typed, probably because the function plt.subplots
returns a ndarray
of Axes
objects, but the dimension of ndarray
is unknown until runtime and its components cannot be inferred as Axes
.
Similarly, plt.errorbar
returns objects of different type depending on the arguments.
All the functions should be well-typed and return the same type.
Auto-completion doesn't work
It is hard to use most of the functions of matplotlib
and bokeh
without carefully read the documentations because of the *args
and **kwargs
. Things are even worse for pyqtgraph
; because many methods are dynamically set by setattr
, even the method names cannot be auto-completed.
We should avoid these implementations to improve the coding experience.
High-level API is not what we need
There are countless numbers of customizability in plotting functions. If all the parameters for customization are placed in a function, the function will become huge. For example, the scatter function of plotly
has 41 (!!) arguments if I'm not mistaken. To find how to do something you want, you'll have to browse all of these. Is this what we wanted?
Another problem of high-level API is that functions are usually too "smart" or have mutually incompatible arguments. For example, the kdeplot function of seaborn
automatically determines how to plot the input data based on the given information and even support cumulative plot for some reason.
We should carefully design the functions, such as categorizing functions by method-chaining. Similarly, it is very difficult to remember how to modify eaach property (title texts, color, font, and the x-axis ranges etc.). Splitting the properties into namespaces should be a solution.
Data visualization with whitecanvas
Here, I'll show how to plot with whitecanvas
. It's available in PyPI.
pip install whitecanvas -U
pip install whitecanvas[matplotlib] -U # if you want to plot with matplotlib
Basics
new_canvas
creates a Canvas
on which you'll add layers that represent the data you want to visualize. You can specify which backend to use by "(name of library):(name of lower backend (optional))". For example, the Qt backend of matplotlib
can be specified as follow:
import numpy as np
from whitecanvas import new_canvas
canvas = new_canvas("matplotlib:qt")
Basically, the API is similar to those in matplotlib
.
x = np.linspace(-np.pi, np.pi, 100)
line = canvas.add_line(np.cos(x*2), np.sin(x*3), color="red")
markers = canvas.add_markers(np.cos(x), np.sin(x), color="gray")
canvas.show()
Layers are stored in canvas.layers
.
canvas.layers
# Out: LayerList([Line<'line'>, Markers<'markers'>])
Designs such as the color can be set later. This feature is very important if you want to embed the canvas in a GUI application (especially when you use pyqtgraph
orvispy
).
# Since a `Line` is only a line, properties are under the layer itself.
line.color = "blue"
line.style = "--"
line.width = 3
# Since a `Markers` has both faces and edges, properties are under `face` and `edge`.
markers.face.color = "pink"
markers.edge.color = "purple"
# Properties specific to `Markers` are under the layer itself.
markers.size = 18
markers.symbol = "D"
Update axis labels and ranges
I also reorganized these properties into namespaces of Canvas
.
-
canvas.x
: Namespace of the x-axis. Same for the y-axis.-
canvas.x.label
: Namespace of the x-axis label. -
canvas.x.ticks
: Namespace of the x-axis ticks.
-
-
canvas.title
: Namespace of the title.
Following example shows how to use these properties. Let's switch to the bokeh
backend here.
canvas = new_canvas("bokeh")
canvas.add_bars([0, 1, 2], [5, 4, 6]) # bar plot for now
canvas.x.lim = (-1, 3) # update the range
canvas.x.label.text = "X axis" # update the text string
canvas.x.label.color = "red" # update the color
canvas.y.ticks.set_labels([4, 5, 6], ["[4]", "[5]", "[6]"]) # override the tick labels
canvas.title.text = "Test namespaces" # update the title text
canvas.title.size = 28 # update the font size of the title text
canvas.show()
Drawing complex plots
The more complex a plot is, the more cumbersome its API will becomes. whitecanvas
solved this problem by method chaining.
Here's an example of visualizing time-series data with a line, markers and errorbars. Let's try pyqtgraph
this time.
# sample data
time = np.linspace(0, 5, 25)
observations = np.exp(-time/3) * np.cos(time*3)
errors = np.abs(observations) / 3 + 0.3
# only plot the observations
canvas = new_canvas("pyqtgraph")
canvas.add_line(time, observations)
canvas.show()
Even if we only focused on the colors, there are four:
- Color of the line.
- Color of the marker faces.
- Color of the marker edges.
- Color of the errorbars.
and if we want to specify the line dash and the marker sizes, the API will immediately become "hopeless". Nevertheless, if we plot each component one by one, there will be duplicate of arguments, which makes the code redundant and error-prone.
In whitecanvas
, new components are added by with_*
methods. In this case, we'll call with_markers
to add markers and with_yerr
to add errorbars, just like you're drawing a graph with your hands.
canvas = new_canvas("pyqtgraph")
layer = (
canvas
.add_line(time, observations)
.with_markers()
.with_yerr(errors, capsize=0.2)
)
canvas.show()
Isn't it much better? Arguments are split into different methods, and the chained methods (with_markers
and with_yerr
) have the same signatures as the original methods (add_markers
and add_errorbars
) except for the arguments for the data coordinates.
Visualize DataFrames
seaborn
and plotly.express
are designed to make data visualization and exploration easier, and actually, they are very convenient for the daily data analysis tasks. In order to follow this idea and make it better, whitecanvas
was designed as follows:
- Do not stick to
pandas
. If one haspolars.DataFrame
ordict
already, importingpandas
is waste of time and don't even need to be installed. - Make a plotter using the input data and x/y column names instead of providing methods directly. I got this idea from seaborn.objects.Plot. Indeed, it's reasonable to make a new object since it seems very rare that we have to change the data of x/y axis for a single DataFrame (if it changes, there's no mean plotting them in the same canvas anymore).
- Split plotters for categorical data and numeric data. Don't make functions too "smart"! We will never use violin plot for numeric × numeric data.
As an example, we'll visualize the open dataset iris. We'll use seaborn
to load the dataset.
Numeric × Numeric
from seaborn import load_dataset
from whitecanvas import new_canvas
# load iris as a pandas.DataFrame
df = load_dataset("iris")
canvas = new_canvas("matplotlib:qt")
(
canvas
.cat(df, x="sepal_length", y="sepal_width")
.add_markers(color="species")
)
canvas.add_legend()
canvas.show()
Here, cat()
is used to create a plotter for numeric × numeric data visualization. DataFrame df
and the x/y columns are specified here. The add_markers()
method then determines how the markers will be displayed. Here, colors are separated by "species," and a legend item can be added automatically with canvas.add_legend()
.
Categorical × Numeric
Next, let's plot a data with categorical x-axis. A column named "is_petal_wide" is added to show whether "petal_width" is large, which is distinguished by the marker symbols. A nice thing about plotly
and bokeh
is that you can tag information on each data point and see them by hovering. I implemented the same functionality to matplotlib
and pyqtgraph
.
from seaborn import load_dataset
from whitecanvas import new_canvas
# load iris as a pandas.DataFrame and add a new column
df = load_dataset("iris")
df["is_petal_wide"] = df["petal_width"] > df["petal_width"].mean()
canvas = new_canvas("matplotlib:qt")
(
canvas
.cat_x(df, x="species", y="sepal_length")
.add_stripplot(symbol="is_petal_wide")
)
canvas.add_legend()
canvas.show()
cat_x()
interprets the x-axis to be a categorical axis. Because cat_x()
and cat_y()
are implemented, we don't need the arguments like orient
in seaborn
.
Aggregation
Plotters for aggregations such as mean of each category are also implemented. Method mean()
returns a new plotter with aggregated data.
from seaborn import load_dataset
from whitecanvas import new_canvas
# load iris as pandas.DataFrame
df = load_dataset("iris")
canvas = new_canvas("matplotlib:qt")
plotter = canvas.cat_x(df, x="species", y="sepal_width")
plotter.add_stripplot(color="orange") # all data
plotter.mean().add_line(width=3, color="red") # mean
canvas.show()
Method chaining
The idea of method chaining is also very helpful to keep the API tidy. For example, box plot is usually combined with strip plot that shows the outliers. Arguments specific to the outliers can be split into the chained method.
from seaborn import load_dataset
from whitecanvas import new_canvas
# load dataset "tips"
df = load_dataset("tips")
canvas = new_canvas("matplotlib:qt")
(
canvas
.cat_x(df, x=["smoker", "sex"], y="tip")
.add_boxplot(color="time")
.with_outliers(size=9, symbol="o", update_whiskers=False)
)
canvas.add_legend()
canvas.show()
And others ...
Multi-dimensional data
Methods for dealing with multi-dimensional data are under the dims
namespace. Although only low-level functions are implemented at the moment, they work pretty nicely.
from skimage.data import brain
from whitecanvas import new_canvas
img = brain() # 3D image
canvas = new_canvas("matplotlib:qt")
# Show 3D image along the "time" axis
canvas.dims.in_axes("time").add_image(img)
# Make and show a slider widget
canvas.dims.create_widget().show()
canvas.show()
Mouse event handling
Event handling is implemented in canvas.events
namespace. The canvas.events.mouse_moved
can listen the mouse-moved event. Please take a look at the source code if you are interested in it!
Top comments (0)