FigCanvas

Posted on May 27

How we test color palettes before they ship in scientific figures

#datascience #dataviz #research #a11y

The journals don't reject papers because the palette is wrong. They just give you another revision round, and the senior reviewer writes "figures are difficult to read in print." Same outcome, slower.

We work on FigCanvas, an AI tool for scientific figures, so we end up looking at a lot of bad palettes. After a few hundred of them we built a checklist for testing a palette before any plot using it ever ships. Most of it is engineering-style: deterministic, runnable, fast.

The four tests that catch ~90% of palette failures

1. Colorblind simulation, not "ask someone with deuteranopia"

The most common test people do is "send the figure to a friend." This is fine and you should still do it. But it's not a unit test — your friend isn't going to look at every iteration.

Run the palette through a deterministic simulator first. For each color in the palette, compute the deuteranopia and protanopia versions (these cover ~95% of the colorblind population). Then check pairwise distance in CIELAB. If any two colors collapse to a distance under ~10 in either simulation, the palette is broken — pick another.

A few tools that do this server-side: colorspace in R has deutan() / protan(); in Python, colorblind or daltonize. You can hand-roll the transform from the Brettel/Viénot matrices in about 20 lines.

2. Grayscale fallback

A surprising number of reviewers print papers. A larger number of journals still publish print editions. If two colors in your palette collapse to similar lightness, they will be indistinguishable when the journal's production team rasters everything to grayscale for proofing.

Test it the dumb way: take a 1×N strip of swatches, desaturate, look. Or compute the L* channel for each and make sure the spread is >5 between adjacent categorical colors.

Both Okabe-Ito and viridis pass this. Most "I built a custom palette for our lab" palettes fail.

3. Perceptual uniformity (continuous only)

For continuous scales — heatmaps, density plots, anything with a colorbar — the palette needs to be perceptually uniform. Equal numerical steps should look like equal visual steps. Rainbow / jet famously aren't, which is why every modern visualization library has moved away from them as default.

viridis, magma, inferno, cividis, and plasma all pass. So does mako from seaborn. The deltaE-2000 metric is the right one if you want to check programmatically.

4. Shape stress test

Palettes that look fine as 200×200 swatches in a swatch grid often collapse when applied to actual chart shapes. A small scatter point picks up far less color than a large bar. Lines on a noisy background read worse than filled areas.

We test every palette on five shapes before shipping:

bar chart (large filled regions)
scatter (small isolated points)
line plot (thin lines)
heatmap (large continuous gradient)
volcano plot (mixed sparse + dense)

This catches palettes where, e.g., the "light blue" you picked reads fine in a bar but vanishes as scatter points against the gridlines.

What we ended up shipping

After enough manual testing we wrapped this into a small in-browser tool — a figure tool for researchers that runs all four tests on any palette you paste in, including Okabe-Ito, viridis variants, and Nature/Science-style limited sets. The interesting part is the preview pane: it renders each candidate palette on all five shape types simultaneously, so you can see at a glance which palette holds up under your actual chart type.

This is not a deep technical contribution. It's tooling around the boring testing step that everyone skips because it takes 20 minutes by hand. Compressed to 5 seconds, it suddenly becomes worth doing.

Reviewer time is the bottleneck

The reason any of this matters is that figure review costs reviewer time, which is the most expensive resource in the academic publication pipeline. A figure that's wrong by half a deltaE in deuteranopia simulation is going to send a paper back for one more revision round. That's two weeks. For one color choice.

Palette testing is the cheapest leverage point in scientific figure work. It's also the one most authors skip because the existing tooling is annoying. Most of the engineering value of "AI for science" right now is just removing the annoyance from steps that, in principle, were always doable.

TL;DR

Run pairwise colorblind simulation, fail if any pair collapses
Desaturate, fail if L* spread is too narrow
For continuous palettes, check perceptual uniformity (deltaE-2000)
Render on all five chart shape types before shipping

Doing this once per palette saves us at least one revision round per paper we touch. Easy ROI.

DEV Community