SEN LLC

Posted on Jun 2

Try the Tech Radar #5 — Mutation Testing in 500 Lines of Vanilla JS (and Why High Coverage Lies)

#webdev #javascript #frontend #testing

Thoughtworks Technology Radar Vol 34 (April 2026) put Mutation testing in the Trial ring with a sharp note: in an era of LLM-generated tests, coverage numbers stopped meaning what they used to. Mutation testing is what tells you whether your tests actually assert anything. I built a 500-line vanilla JS playground that runs entirely in-browser — paste a function and its tests, see how many mutants the suite kills.

🌐 Demo: https://sen.ltd/portfolio/mutation-testing/
📦 GitHub: https://github.com/sen-ltd/mutation-testing

100% coverage with 0% mutation score

Consider isAdult(age) { return age >= 18; } with these tests:

expect(isAdult(25)).toBe(true);
expect(isAdult(10)).toBe(false);

Coverage report: 100% of lines hit. Both branches of the comparison execute. Tooling green-lights the change.

Now flip >= to > and rerun:

isAdult(25) → still true (25 > 18 ✓)
isAdult(10) → still false (10 > 18 ✗)

The tests still pass. Coverage was a lie — the boundary value age = 18 was never asserted, so the comparison operator could change and nobody would notice. Mutation score: 0%.

That's the whole pitch. Mutation testing tells you which "tests that execute code but don't assert anything meaningful" hide in your suite.

Operator catalog (the source-string approach)

I went with regex-based source substitution rather than AST manipulation — bundling a JS parser into a browser demo defeats the "zero deps" rule. 19 operators across 5 categories:

export const OPERATORS = [
  // arithmetic
  { id: "plus-to-minus",  re: /\+/g, replace: () => "-", desc: "+ → -" },
  { id: "mul-to-div",     re: /\*/g, replace: () => "/", desc: "* → /" },
  // comparison — the boundary catchers
  { id: "lt-to-le",       re: /(?<![<])<(?!=)/g, replace: () => "<=", desc: "< → <=" },
  { id: "gt-to-ge",       re: /(?<![>])>(?!=)/g, replace: () => ">=", desc: "> → >=" },
  { id: "ge-to-gt",       re: />=/g, replace: () => ">",   desc: ">= → >" },
  { id: "le-to-lt",       re: /<=/g, replace: () => "<",   desc: "<= → <" },
  { id: "eqeq-to-neq",    re: /===?/g, replace: () => "!==", desc: "== → !=" },
  // logical
  { id: "and-to-or",      re: /&&/g, replace: () => "||", desc: "&& → ||" },
  // constants
  { id: "true-to-false",  re: /\btrue\b/g, replace: () => "false", desc: "true → false" },
  { id: "zero-to-one",    re: /\b0\b/g, replace: () => "1", desc: "0 → 1" },
  // control flow
  { id: "negate-if",      re: /\bif\s*\(([^()]+)\)/g, replace: (_, c) => `if (!(${c}))`,
    desc: "if(x) → if(!x)" },
  // update operators
  { id: "inc-to-dec",     re: /\+\+/g, replace: () => "--", desc: "++ → --" },
  // ... 19 total
];

Three things had to be right:

Both directions of comparison. My first version had > → >= but not >= → >. Result: the isAdult-weak-tests demo generated zero mutants and looked broken. The diagnostic ability lives in the bidirectional operator pair.
Lookbehind to protect compound operators. < → <= needs (?<![<])<(?!=) so it doesn't fire inside << or <=. Get this wrong and your mutant generator produces invalid JS.
Word boundary for constants. 0 → 1 uses \b0\b. Without it, x0 = 1 mutates to x1 = 1 and the test suite explodes at parse time, not at logic time.

The string-and-comment safety net

The classic source-mutation trap: operators inside string literals and comments get mutated.

const greeting = "Hello, 1 + 2 = 3";

A naive mutator turns the literal "1 + 2" into "1 - 2". Tests that assert on the string fail — but for the wrong reason. The mutation didn't expose a logic gap; it broke a literal.

Solution: a one-pass tokenizer that builds a skip mask covering string contents and line comments:

function buildSkipMask(src) {
  const mask = new Array(src.length).fill(false);
  let i = 0;
  while (i < src.length) {
    const c = src[i];
    if (c === "/" && src[i + 1] === "/") {
      while (i < src.length && src[i] !== "\n") { mask[i++] = true; }
      continue;
    }
    if (c === '"' || c === "'" || c === "`") {
      mask[i++] = true; // opening quote
      while (i < src.length && src[i] !== c) {
        if (src[i] === "\\") mask[i++] = true; // escape
        if (i < src.length) mask[i++] = true;
      }
      if (i < src.length) mask[i++] = true; // closing quote
      continue;
    }
    i++;
  }
  return mask;
}

When a mutant match occurs at src.index, skipMask[src.index] tells the generator whether to skip it. The relevant test:

test("operators inside strings are not mutated", () => {
  const ms = generateMutants(`const s = "1 + 2";`);
  assert.equal(ms.filter((m) => m.operatorId === "plus-to-minus").length, 0);
  assert.equal(ms.filter((m) => m.operatorId === "one-to-zero").length, 0);
});

Block comments (/* */) aren't handled — they're rare enough in test code that I left the simpler implementation in place.

In-browser test runner

Each mutant gets executed as original_helpers + mutated_source + tests inside new Function. Pass → survived; throw → killed.

const HELPERS = `
function expect(actual) {
  return {
    toBe(expected) {
      if (actual !== expected) throw new Error(\`expected \${expected}, got \${actual}\`);
    },
    toEqual(expected) {
      if (!deepEqual(actual, expected)) throw new Error(...);
    },
  };
}
`;

export function runOne(source, tests) {
  const body = HELPERS + "\n" + source + "\n" + tests;
  try {
    new Function(body)();
    return { passed: true };
  } catch (e) {
    return { passed: false, error: String(e?.message ?? e) };
  }
}

Security caveat: new Function evaluates whatever the user types in the same realm as the page. Acceptable for a portfolio demo where the user pastes their own code; not acceptable for a product. Real mutation testers (Stryker, Pitest, cargo-mutants) isolate via worker processes with timeouts.

The teaching moment: weak vs strong tests

The four-preset setup is the demo's whole point. Same isAdult function, two test suites:

Weak (mutation score 0%):

expect(isAdult(25)).toBe(true);
expect(isAdult(10)).toBe(false);

The >= → > mutant survives because neither 25 nor 10 is the boundary value. Coverage report says 100%; mutation score says 0%. The tests touch the code but don't probe it.

Strong (mutation score 100%):

expect(isAdult(17)).toBe(false);   // boundary - 1
expect(isAdult(18)).toBe(true);    // ← the load-bearing assertion
expect(isAdult(19)).toBe(true);
expect(isAdult(0)).toBe(false);

Specifically asserting on age = 18 kills the >= → > mutation. The strong suite is what "good boundary tests" looks like, and mutation testing puts a number on the difference.

Coverage vs mutation, made visual

The sum_weak preset has one assertion:

function sum(arr) {
  let total = 0;
  for (let i = 0; i < arr.length; i++) {
    total = total + arr[i];
  }
  return total;
}
// expect(sum([1, 2, 3])).toBe(6);

Coverage: 100%. Mutation results:

+ → - in the body → -1+2-3=-2 → killed ✓
0 → 1 for total initial → 1+1+2+3=7 → killed ✓
< → <= in the loop condition → reads arr[arr.length] → NaN → killed ✓
+ → - for i + 1 style positions → loop counter regresses → killed ✓

Single-assertion tests can have high mutation scores when the assertion is on a downstream value the mutation affects. The metric isn't "more asserts is better" — it's "do the asserts you have probe the behaviour you claim."

Operator categories

Category	Operators
arithmetic	`+ → -`, `- → +`, `* → /`, `/ → *`
comparison	`< → <=`, `> → >=`, `>= → >`, `<= → <`, `== → !=`, `!= → ==`
logical	`&& → \
constant	{% raw %}`true → false`, `false → true`, `0 → 1`, `1 → 0`
control	`if(x) → if(!x)`
update	`++ → --`, `-- → ++`

Production tools (Stryker, Pitest, cargo-mutants) ship 30–50 operators. The above is the diagnostic core; it catches the most common test-quality failures.

Known limitations

No AST analysis — source-string substitution flags some equivalent mutants (syntactically different, semantically identical) as survived
No block comments — /* */ regions aren't in the skip mask
No timeout / sandbox — infinite-loop mutants can hang the playground; real tools use worker processes
No test framework integration — the runner uses a tiny custom expect, not Jest / Mocha / Vitest

All flagged in the in-page caveat and the README. The educational value — "carry-on-coverage doesn't equal carry-on-assertion" — still lands.

Try it

Demo: https://sen.ltd/portfolio/mutation-testing/
GitHub: https://github.com/sen-ltd/mutation-testing

Try "isAdult — weak tests," hit Run, see the 0%. Switch to "isAdult — strong tests," hit Run again, see 100%. The boundary assertion is what changed.

Takeaways

High coverage isn't proof of strong tests — boundary-blind assertions execute the code without probing it.
Mutation testing puts a number on test sensitivity — what % of small breakages your suite catches.
Bidirectional comparison operators (both > → >= and >= → >) are essential for surfacing off-by-one tests.
A tokenizer-based skip mask is the minimum needed to keep source-string mutation honest.
new Function-based execution is fine for demos, not for products — real mutation tools run mutants in isolated workers with timeouts.
Strong vs weak preset pairs make the metric viscerally legible in a way "here's mutation testing as a concept" never does.

This is OSS portfolio #251 from SEN LLC (Tokyo), the fifth entry in the "Try the Tech Radar" series. Previous: #250 Server-driven UI, #249 Schema → LLM Prompt, #248 Markdown → Typst, #247 TOON converter. The series wraps next with Semantic layer. https://sen.ltd/portfolio/

Top comments (1)

Pranav Gore • Jun 3

Hi, I hope you are doing well. We are a software development team. We hunt for US jobs using Us job profile. So we are looking for a senior developer who can work with us.
Your role is to take part in the job interviews and pass the interviews. If your English is fluent, we can work together. If you are interested, please kindly send me message. I will explain more detail. Thank you!
Whatsapp: +1 (351) 234-6532
Telegram: @lionking06230810