[Tanwydd]

Posted on Apr 25

How Your Canvas Fingerprint Gets You Caught (And Why Random Noise Makes It Worse)

#webdev #security #automation #javascript

You fixed the TLS fingerprint. You patched navigator.webdriver. Your User-Agent is perfect. And you're still getting blocked.

Chances are it's the Canvas.

What canvas fingerprinting actually is

Every browser renders graphics slightly differently. The GPU, the driver version, the OS font rendering engine, the antialiasing settings — all of these introduce tiny variations in how pixels end up on screen.

Canvas fingerprinting exploits this. The detection script draws something on an invisible canvas element — usually a mix of text, shapes and gradients — then reads back the pixel data with toDataURL() or getImageData(). The resulting string is hashed and becomes your fingerprint.

The variations are tiny. We're talking about differences at the level of individual pixel values, often invisible to the human eye. But they're consistent — the same browser on the same machine produces the same hash every time. And they're unique enough to identify you across sessions, across IPs, across proxies.

Your IP changes. Your canvas hash doesn't.

Why headless Chromium is obvious

Here's the specific problem with running Playwright or Puppeteer in headless mode: without a GPU pipeline, the browser falls back to software rendering.

Software rendering is deterministic and perfect. No GPU quirks, no driver variations, no antialiasing artifacts. Every headless Chromium instance on every machine produces an identical canvas output.

That's not how real browsers work. Real browsers are slightly imperfect in ways that are consistent per device. A perfectly identical canvas hash across thousands of sessions is a massive red flag.

The fix for this specific problem is --headless=new — Playwright's modern headless mode that preserves the full GPU pipeline:

context = await playwright.chromium.launch_persistent_context(
    headless=False,
    args=["--headless=new"],  # preserves GPU stack
)

But even with --headless=new, your canvas hash is still consistent across sessions. Which brings us to the noise problem.

Why random noise makes things worse

The obvious solution seems to be: add random noise to the canvas output on every render. Randomize the pixel values slightly so the hash changes.

This is wrong. And it's worse than doing nothing.

Here's why: real browsers produce a consistent canvas hash per device. The same machine always gives the same result. If your canvas hash changes on every page load, every request, every session — that's not how any real browser behaves. Detection systems don't just check what your hash is. They check whether it's stable.

A canvas hash that changes randomly is as obvious as navigator.webdriver = true. It's a different signal, but it's still a signal.

The right approach: deterministic per-session noise

What you want is noise that is:

Consistent within a session — the same hash for the same browser instance
Different across sessions — different profile directories produce different hashes
Realistic in magnitude — tiny variations, not wholesale pixel changes

The way to achieve this is a seeded pseudo-random number generator, where the seed is derived from something stable per profile — like the profile directory name.

import hashlib

def _session_seed(profile_dir_name: str) -> int:
    return int(hashlib.md5(profile_dir_name.encode()).hexdigest()[:8], 16) % (2**31)

Then use that seed to drive a simple LCG (Linear Congruential Generator) in JavaScript, and apply tiny pixel-level noise based on it:

// Seeded LCG — deterministic per session
let _seed = SESSION_SEED;
const _lcg = () => {
    _seed = (_seed * 1664525 + 1013904223) % 4294967296;
    return _seed / 4294967296;
};

// Apply noise without mutating the original canvas
const _applyNoise = (imgData) => {
    for (let i = 0; i < imgData.data.length; i += 4) {
        if (_lcg() < 0.05) {
            imgData.data[i] += (_lcg() > 0.5 ? 1 : -1);
        }
    }
};

The key detail: modify a copy of the canvas, not the original. If you mutate the original canvas, you break legitimate rendering on the page. The correct approach intercepts toDataURL(), getImageData() and toBlob(), draws to a temporary off-screen canvas, applies noise there, and returns the result.

const _origToDataURL = HTMLCanvasElement.prototype.toDataURL;
HTMLCanvasElement.prototype.toDataURL = function() {
    const ctx = this.getContext('2d');
    if (ctx) {
        const off = document.createElement('canvas');
        off.width = this.width;
        off.height = this.height;
        const octx = off.getContext('2d');
        octx.drawImage(this, 0, 0);
        const img = _origGetImageData.call(octx, 0, 0, off.width, off.height);
        _applyNoise(img);
        octx.putImageData(img, 0, 0);
        return _origToDataURL.apply(off, arguments);
    }
    return _origToDataURL.apply(this, arguments);
};

The toString() problem

There's a secondary issue that most canvas spoofing implementations miss: Function.prototype.toString().

When you replace a native browser function with your own JavaScript wrapper, any script that calls .toString() on that function sees JavaScript source code instead of function toDataURL() { [native code] }. That's detectable.

The fix is to maintain a registry of patched functions and override Function.prototype.toString to return the native code string for any function in that registry:

const _patchedFns = new WeakSet();

const _native = (fn, name) => {
    try {
        Object.defineProperty(fn, 'name', { value: name, configurable: true });
    } catch(_) {}
    _patchedFns.add(fn);
    return fn;
};

const _origFnToString = Function.prototype.toString;
Function.prototype.toString = _native(function() {
    if (_patchedFns.has(this)) {
        return `function ${this.name || ''}() { [native code] }`;
    }
    return _origFnToString.call(this);
}, 'toString');

Any function wrapped with _native() now looks indistinguishable from a browser built-in when inspected.

Canvas is one signal among many

Fixing canvas fingerprinting is necessary but not sufficient. Detection systems correlate multiple signals:

WebGL fingerprinting — same concept, different API. The GPU vendor string, renderer string, and the output of readPixels() all contribute to a fingerprint. The same deterministic noise approach applies.

AudioContext fingerprinting — the Web Audio API processes a signal through an oscillator and reads back the output. Again, tiny hardware-level variations create a unique hash. Tiny noise on getChannelData() output breaks this.

Font enumeration — document.fonts.check() and measureText() reveal which fonts are installed, which varies by OS. The browser's reported font list should match the OS implied by the User-Agent.

getBoundingClientRect() noise — font rendering affects element dimensions. Tiny noise on bounding rect values breaks font fingerprinting via layout measurement.

These signals are correlated. A Windows User-Agent with a Linux font list is suspicious. A Mac User-Agent with an NVIDIA GPU renderer is suspicious. Coherence across all signals matters as much as any individual fix.

The timing problem

One more thing that's easy to miss: your JavaScript wrappers add overhead. toDataURL() now does extra work — copying the canvas, applying noise, returning the result. That takes time.

Detection scripts measure how long canvas operations take. A toDataURL() call that takes 3x longer than expected is a signal.

The fix is to add a small amount of noise to performance.now() so timing measurements are slightly unpredictable:

const _origPerfNow = performance.now.bind(performance);
performance.now = _native(function() {
    return _origPerfNow() + (_lcg() - 0.5) * 0.2;
}, 'now');

±0.1ms of jitter is enough to mask the overhead of your wrappers without being detectable itself.

Next: mouse movement, typing speed, and why behavioral fingerprinting is harder to fake than canvas — and what that looks like in code.

DEV Community