adousa

Posted on May 23

Picking black or white text: a tiny trained model vs WCAG luminance

#webdev #javascript #a11y #machinelearning

If your UI lets users pick their own colors — tags, labels, calendar events, avatars generated from a username — you've eventually written this:

backgroundColor = anythingUserPicked;
textColor = isDark(backgroundColor) ? "white" : "black";

The textbook answer for the right side is the W3C's relative luminance formula from WCAG 2.0 §1.4.3: convert sRGB to linear light, weight by 0.2126·R + 0.7152·G + 0.0722·B, threshold at 0.179. It's principled. It's the de facto standard. It's also wrong about one in seven colors when you ask actual humans.

This post is about a four-number alternative — three coefficients and an intercept — that gets the same job done with higher agreement with human judgment, smaller bytecode, and one transcendental call instead of three.

I'm not claiming to replace WCAG. WCAG measures contrast for low-vision users; this picks text color for typical readers. Different problems, related answers, very different priorities. With that out of the way:

The training data

The model is a binary logistic regression fit on just over 600 hand-labeled colors. Each sample is an (r, g, b) triple paired with 0 (use white text) or 1 (use black text), assigned by a human looking at a swatch and deciding which text looked more readable. The dataset is roughly balanced — about half "white-text", half "black-text".

{ "backgroundColor": { "r": 84, "g": 179, "b": 73 }, "textColor": 1 }

It's a small dataset by ML standards, but the task is also small — the decision surface lives in a 3-dimensional space (RGB), and the boundary is genuinely close to a plane.

The model

Fit a plain logistic regression on the raw RGB triples (sklearn defaults, 90/10 train/test split). The result is four numbers:

const COEFFICIENTS = [0.027291, 0.0688366, 0.006275];
const INTERCEPT = -13.9369834;

And the runtime function is one line of arithmetic, one exp, and a compare:

function isLightText([r, g, b]) {
  const s = r * 0.027291 + g * 0.0688366 + b * 0.006275 - 13.9369834;
  return 1 / (1 + Math.exp(-s)) <= 0.5;
}

Returns true if white text wins on that background. ~80 bytes of logic. No string parsing, no branches, no gamma curve.

What the coefficients are telling you

Look at the relative weights: green ≈ 0.069, red ≈ 0.027, blue ≈ 0.006. Almost exactly the same ordering as WCAG's perceptual weights (0.2126·R + 0.7152·G + 0.0722·B). The model rediscovered, from scratch, that green contributes most to perceived brightness, red contributes moderately, blue contributes least. Without telling it anything about vision science.

The ratio is different, though — the model says green matters ~2.5× more than red, where WCAG says ~3.4×. And it underweights blue more aggressively. That's where the disagreements live.

How it compares to WCAG on this dataset

Method	Accuracy on labeled set
WCAG relative luminance	83.1%
`isLightText`	92.0%

The two algorithms disagree on roughly 14.5% of colors (89 out of ~600). On those disagreements:

isLightText matched the human label 72 times
WCAG matched 17 times

In other words, when these two methods give you different answers, the trained model is right four times out of five.

A caveat I want to flag honestly: those numbers are on the same labeled set used to fit the coefficients. The model was trained on 90% of it, but the comparison above counts the whole set. The 14.5% disagreement rate and the 4:1 ratio on disagreements are the more robust takeaways than the headline accuracy gap.

Where it disagrees — concrete cases

These are colors where the two algorithms pick different text colors. Open them in any color tool to judge for yourself.

Cases the trained model gets right:

Color	RGB	Human label	`isLightText`	WCAG
`#0b9cd5`	(11, 156, 213)	white	white ✓	black
`#1ba3f5`	(27, 163, 245)	white	white ✓	black
`#ec4a89`	(236, 74, 137)	white	white ✓	black
`#d15952`	(209, 89, 82)	white	white ✓	black
`#209165`	(32, 145, 101)	white	white ✓	black
`#8c7e06`	(140, 126, 6)	white	white ✓	black

These are exactly the colors that look unambiguously "dark enough for white text" to a human, but sit just above WCAG's 0.179 threshold. Saturated mid-luminance blues, pinks, reds, olives — colors with one dominant channel pushing them up the WCAG scale without actually making them feel light.

Cases WCAG gets right and the model misses:

Color	RGB	Human label	`isLightText`	WCAG
`#fb50e0`	(251, 80, 224)	black	white	black ✓
`#f9492d`	(249, 73, 45)	black	white	black ✓
`#e53af1`	(229, 58, 241)	black	white	black ✓

The model's failures are concentrated in the saturated magenta/pink/orange corner. It tends to read these as darker than they are. Honestly, opinions vary on these — #f9492d is the kind of color where reasonable people will argue for either text color.

A geometric way to think about it

A logistic regression on raw RGB is just a plane through 3D color space:

0.0273·R + 0.0688·G + 0.0063·B = 13.9370

Everything below the plane → white text. Everything above → black text. The probability score is just how far you are from the plane, squashed into [0, 1] by the sigmoid.

WCAG's luminance threshold is a surface through the same space, but a curved one because of the gamma decoding step (pow((v+0.055)/1.055, 2.4)). It's a more sophisticated boundary, which is why it dominates on textbook colors — but it was designed for contrast measurement, not for the binary text-color decision. Optimizing the wrong objective gets you a more elegant surface for a different question.

Side-by-side demo

If you want to feel where the two algorithms diverge, I built an interactive comparison:

→ demo: isLightText vs W3C luminance

Drag the picker around. Both decision boundaries are drawn directly on the saturation/brightness plane. As you change hue, you'll see them slide past each other — wide gaps in the cyan/orange ranges, near-overlap in true greys.

Performance

This is a UI primitive that gets called a lot — once per chip, badge, swatch, table cell, generated avatar, syntax-highlighted token. So the cost matters a little.

Per call:

Method	Multiplies	Adds	Branches	`exp`/`pow`	Notes
WCAG	~6	2	3	up to 3	each `pow` is `exp(2.4·log(x))`
`isLightText`	3	3	0	1	linear scan + one sigmoid

Roughly 3× fewer transcendental calls. Both finish in microseconds, so this only matters if you're rendering tens of thousands of swatches in a tight loop — but if you are, isLightText is the right pick.

Limitations

It's not an accessibility tool. WCAG measures contrast ratios for low-vision users; this picks between two pre-chosen text colors for typical readers. If you need to pass an audit, use WCAG (or APCA).
It can be wrong. Saturated magentas and oranges (#f9492d, #fb50e0) sit in a region where humans themselves disagree.

Try it

The model is published as a tiny package:

npm install black-or-white-text

import { isLightText } from "black-or-white-text";

isLightText("#000000"); // true  — use white text on black
isLightText("#ffffff"); // false — use black text on white
isLightText("#0b9cd5"); // true  — the case WCAG misses
isLightText([44, 62, 80]); // true  — RGB tuple input works too

demo: adousa.github.io/is-light-text/compare-with-luminance.html
npm: [black-or-white-text](https://www.npmjs.com/package/black-or-white-text)
source + data: github.com/adousa/is-light-text

If you find a color where it picks wrong, file an issue with the hex and what you expected. That's exactly the kind of feedback that grows the labeled dataset and improves the next set of coefficients.

Top comments (1)

Harjot Singh • Jun 1

this is a great exploration of how user experience can conflict with strict standards like WCAG. using a simpler model that aligns better with human judgment is a smart approach. at Moonshift, we help developers get a full next.js + postgres + auth app deployed in about 7 minutes, and you keep the code on your github. if you're curious, I can set you up for a complimentary build.