SDET Code

Posted on Apr 7

Boundary Value Mutations: The Bug Category That's Easiest to Catch — and Hardest to Cover Completely

#testing #qa #python #career

Here is a fact that looks reassuring on the surface.

When we ran a baseline AI model through 195 benchmark sessions on the SDET Code challenge library, boundary bugs had the highest detection rate of any mutation category: 63.8%. Logic bugs came in at 47.5%. Validation bugs at 46.2%. Type bugs at 28.6%.

So boundary mutations are the easiest to catch. Good news, right?

Not exactly. Because 63.8% means 36.2% of boundary bugs survived — and boundary bugs are the ones that cause payment processing to accept invalid amounts, age verification gates to pass 17-year-olds, and shipping calculators to apply the wrong rate on orders just above the threshold.

The reason boundary bugs score highest is mechanical: they produce obviously wrong outputs on edge values. If a function should return True for inputs >= 18 but a mutation changes it to > 18, testing with the value 18 produces a clearly wrong result. A basic model can spot it.

The 36.2% that get missed are the subtle ones — boundaries embedded in multi-condition logic, thresholds defined by business rules rather than obvious numbers, or cases where the wrong boundary produces a wrong result that happens to look plausible.

This article covers how boundary mutations work, how to write tests that kill them systematically, and a technique that will reliably close most of that 36.2% gap.

A Concrete Starting Point

Here is a shipping cost function with multiple boundaries:

def calculate_shipping_cost(weight_kg: float, distance_km: int) -> float:
    """
    Calculate shipping cost based on weight and distance.

    Weight tiers:
    - Up to 5 kg: base rate
    - 5 kg to 20 kg: medium rate
    - Over 20 kg: heavy rate

    Distance surcharge:
    - Distance > 500 km: add 15% surcharge
    """
    if weight_kg <= 5:
        base_cost = 8.00
    elif weight_kg <= 20:
        base_cost = 15.00
    else:
        base_cost = 25.00

    if distance_km > 500:
        return base_cost * 1.15
    return base_cost

This function has three explicit boundaries: 5, 20, and 500. A mutation testing system can inject at least four plausible mutations on the comparison operators alone:

Mutation 1 — Change weight_kg <= 5 to weight_kg < 5:

if weight_kg < 5:       # mutation: <= becomes <
    base_cost = 8.00
elif weight_kg <= 20:
    base_cost = 15.00
else:
    base_cost = 25.00

A 5 kg package now costs $15 instead of $8. The function still returns a number. No exception is raised. Most test suites miss this because they test with 3 kg and 10 kg — values comfortably inside each tier — and never test exactly at 5.

Mutation 2 — Change weight_kg <= 20 to weight_kg < 20:

if weight_kg <= 5:
    base_cost = 8.00
elif weight_kg < 20:    # mutation: <= becomes <
    base_cost = 15.00
else:
    base_cost = 25.00

A 20 kg package now costs $25 instead of $15. Same pattern. Same miss.

Mutation 3 — Change distance_km > 500 to distance_km >= 500:

if distance_km >= 500:  # mutation: > becomes >=
    return base_cost * 1.15

A 500 km shipment now incurs the surcharge incorrectly. The output is wrong by 15%, but only for that exact value.

Mutation 4 — Remove the distance surcharge entirely:

if distance_km > 500:
    return base_cost * 1.15
return base_cost
# mutation: the if block is removed, always returns base_cost

This is the simplest mutation. It is also the most likely to be missed by a test suite that only checks base costs without verifying the surcharge applies.

Tests That Miss vs Tests That Kill

Here is a test suite that looks reasonable but misses all four mutations:

def test_light_package_short_distance():
    assert calculate_shipping_cost(3, 200) == 8.00

def test_medium_package_short_distance():
    assert calculate_shipping_cost(10, 200) == 15.00

def test_heavy_package_short_distance():
    assert calculate_shipping_cost(25, 200) == 25.00

def test_long_distance_surcharge():
    assert calculate_shipping_cost(10, 600) == 17.25

Kill ratio against our four mutations: 1 out of 4. The surcharge test catches Mutation 4 (remove surcharge). The rest survive.

The problem is obvious in hindsight: every weight test uses a value well inside the tier. Nothing touches a boundary.

Here is a suite that kills all four:

# Boundary triplets for weight tier 1 (boundary at 5)
def test_weight_just_below_first_tier():
    assert calculate_shipping_cost(4.9, 200) == 8.00

def test_weight_exactly_at_first_tier():
    assert calculate_shipping_cost(5.0, 200) == 8.00   # kills Mutation 1

def test_weight_just_above_first_tier():
    assert calculate_shipping_cost(5.1, 200) == 15.00

# Boundary triplets for weight tier 2 (boundary at 20)
def test_weight_just_below_second_tier():
    assert calculate_shipping_cost(19.9, 200) == 15.00

def test_weight_exactly_at_second_tier():
    assert calculate_shipping_cost(20.0, 200) == 15.00  # kills Mutation 2

def test_weight_just_above_second_tier():
    assert calculate_shipping_cost(20.1, 200) == 25.00

# Boundary triplets for distance surcharge (boundary at 500)
def test_distance_just_below_surcharge():
    assert calculate_shipping_cost(10, 499) == 15.00

def test_distance_exactly_at_boundary():
    assert calculate_shipping_cost(10, 500) == 15.00    # kills Mutation 3

def test_distance_just_above_surcharge():
    assert calculate_shipping_cost(10, 501) == 17.25   # kills Mutation 4

Kill ratio: 4 out of 4.

The Boundary Triplet Technique

The pattern in the second suite has a name. Call it the boundary triplet: for every boundary value N, test with N-1, N, and N+1.

Boundary at N:
  test(N - epsilon)  → should be in the lower tier
  test(N)            → should be in the specific tier (confirms the inclusive/exclusive rule)
  test(N + epsilon)  → should be in the upper tier

Where epsilon is the smallest meaningful step for the data type. For integers, that is 1. For floats, it is whatever precision the domain requires — for weights, 0.1 kg is usually sufficient.

The N test is the one that kills operator mutations. It is the difference between <= and <, between > and >=. Without it, that entire class of mutations is invisible to your test suite.

The N-1 and N+1 tests are what catch removal mutations and wrong-tier mutations. They verify that the correct behavior applies on either side of the line.

Three tests. One boundary. Every common operator mutation covered.

A Harder Example: Multiple Interacting Boundaries

The calculate_shipping_cost example has independent boundaries. Each one can be tested in isolation. More realistic code has boundaries that interact.

def apply_tier_discount(order_total: float, membership_years: int) -> float:
    """
    Apply loyalty discount based on order total and membership length.

    Rules:
    - Orders >= 100 AND membership >= 2 years: 10% discount
    - Orders >= 250 AND membership >= 1 year: 15% discount
    - Orders >= 500: 20% discount regardless of membership
    - Otherwise: no discount
    """
    if order_total >= 500:
        return order_total * 0.80

    if order_total >= 250 and membership_years >= 1:
        return order_total * 0.85

    if order_total >= 100 and membership_years >= 2:
        return order_total * 0.90

    return order_total

This function has five boundary values across two dimensions: 100, 250, 500 on order total, and 1, 2 on membership years. But the interactions matter. A mutation that changes membership_years >= 1 to membership_years > 1 only surfaces when order_total is between 250 and 499 — and nowhere else.

Applying the boundary triplet naively gives you 15 test cases. That is correct but not sufficient here, because you also need to combine boundary values across dimensions.

The full strategy for multi-boundary functions:

Step 1 — List all boundary values per dimension:

order_total: 99, 100, 101, 249, 250, 251, 499, 500, 501
membership_years: 0, 1, 2, 3

Step 2 — For each condition, identify which dimension combination makes it active:

order_total >= 500 is independent — test triplet at 500 with any membership value
order_total >= 250 and membership_years >= 1 — test triplet at 250 with membership_years = 1, and triplet at 1 year with order_total = 300
order_total >= 100 and membership_years >= 2 — test triplet at 100 with membership_years = 2, and triplet at 2 years with order_total = 150

Step 3 — Write tests that hold one dimension at its boundary while varying the other:

# order_total boundary at 500 (independent)
def test_order_just_below_top_tier():
    assert apply_tier_discount(499, 5) == 499 * 0.85  # still gets 250+ discount

def test_order_exactly_top_tier():
    assert apply_tier_discount(500, 0) == 400.0       # 20% discount, no membership needed

def test_order_just_above_top_tier():
    assert apply_tier_discount(501, 0) == 501 * 0.80

# membership_years boundary at 1 (active when order is 250-499)
def test_membership_zero_years_mid_order():
    assert apply_tier_discount(300, 0) == 300 * 0.90  # falls to 100+ rule if >= 2 years, else no discount
    # Actually: 0 years, 300 total -> only matches >= 100 rule if membership >= 2, fails -> no discount
    assert apply_tier_discount(300, 0) == 300.0

def test_membership_exactly_one_year_mid_order():
    assert apply_tier_discount(300, 1) == 300 * 0.85  # kills >= vs > mutation on membership_years >= 1

def test_membership_two_years_mid_order():
    assert apply_tier_discount(300, 2) == 300 * 0.85

# order_total boundary at 250 (active when membership >= 1)
def test_order_just_below_250_with_membership():
    assert apply_tier_discount(249, 1) == 249 * 0.90  # should fall to 100+ rule if membership >= 2
    # 249, 1 year: doesn't meet 250 rule, doesn't meet 100+2year rule -> no discount
    assert apply_tier_discount(249, 1) == 249.0

def test_order_exactly_250_with_membership():
    assert apply_tier_discount(250, 1) == 250 * 0.85  # kills >= vs > mutation on order_total >= 250

def test_order_just_above_250_with_membership():
    assert apply_tier_discount(251, 1) == 251 * 0.85

This is more work than a simple triplet. But when you skip it, you leave mutations alive in the intersections — exactly the mutations that produce wrong discounts for customers at the edge of a loyalty tier.

Why AI Catches 63.8% But Misses 36.2%

The benchmark result makes sense once you understand the structure.

A model testing calculate_shipping_cost with inputs like [1, 5, 10, 20, 25] for weight — a reasonable spread — will hit the boundaries at 5 and 20 by chance. That is why straightforward boundary mutations get caught at a high rate. The output is clearly wrong when you test at the right value, and a good input set includes those values.

The 36.2% that survive are a different kind of boundary bug:

Business logic boundaries — The threshold is not a round number embedded in an obvious comparison. It is derived: a discount applies when days_since_last_purchase * spend_tier_multiplier > 90. The boundary at 90 is not visible in the function signature. A model generating inputs without domain knowledge will not probe it.

Interaction boundaries — The bug only manifests when two conditions are simultaneously at their edges. A model testing one dimension at a time will miss the intersection.

Implicit boundaries — A function processes discount_code: str and the boundary is between empty string and non-empty string, or between a code that existed pre-2024 and one that did not. The boundary is in the data model, not the numeric comparison.

These are not exotic cases. They appear in real production code constantly. And they are what mutation testing practice teaches you to look for — not by memorizing a checklist, but by repeatedly encountering them and learning to ask "what is the boundary here, and where is it defined?"

Building the Habit

The boundary triplet is a mechanical technique. You can apply it as a checklist. But the goal is to internalize it until the question "what are the boundaries in this spec?" becomes automatic.

That takes practice on real problems, not just reading about the technique.

SDET Code has 670 challenges focused on mutation testing, including a dedicated set built around boundary value mutations across different domains — financial calculations, validation logic, tiered pricing, date range checks. Each challenge shows your kill ratio immediately, so you know whether your boundary triplets are landing.

The feedback loop is the point. You write the test, see the kill ratio, then look at which mutants survived. That is how you learn to identify the boundaries you missed.

Recap

Boundary mutations have the highest detection rate of any category because testing at obvious edge values catches the obvious mutations. The gap — the 36.2% — comes from boundaries embedded in business logic, boundaries that only activate when multiple conditions interact, and boundaries that are not numeric comparisons at all.

The boundary triplet — test at N-1, N, and N+1 for every threshold — closes most of the first category. Combining boundary values across dimensions closes most of the second. Understanding where business logic hides its thresholds closes the rest.

None of this is complicated in isolation. What takes practice is applying it consistently, across different problem shapes, until it becomes the default way you read a specification.

This is Part 2 of the "Mutation Testing for QA Engineers" series. Part 3 will cover logic mutations — wrong operators, inverted conditions, and the and/or swaps that are the hardest category to cover systematically.