Should Test Expected Values Come from the Same Place as the Implementation?

#testing #design #softwaretesting #qa

What Started This

Here are two tests that verify the same behavior:

test('calculates tax correctly', () => {
  const price = 1000;
  const tax = calculateTax(price);
  expect(tax).toBe(100);
});

import { TAX_RATE } from './config';

test('calculates tax correctly', () => {
  const price = 1000;
  const tax = calculateTax(price);
  expect(tax).toBe(price * TAX_RATE);
});

Both tests check the same thing, but the expected value comes from a completely different place. The first hardcodes 100; the second imports it from config. Does this difference matter? Looking into it led me to a more fundamental question about where test expected values should come from. I am not confident I have arrived at the right answer, but I hope this can at least serve as food for thought.

The Concept of a Test Oracle

In 1978, William E. Howden introduced the concept of a "test oracle." He later articulated the idea more clearly in a 1981 publication:

The use of testing requires the existence of an external mechanism which can be used to check test output for correctness. This mechanism is referred to as the test oracle.

The word "external" seems to be the key here. The information used to judge correctness in a test must come from something other than the program itself.

This idea was subsequently developed in various directions: Elaine Weyuker's definition of "non-testable" programs, Cem Kaner's emphasis on the oracle problem as one of the defining challenges in software testing education, and James Bach and Michael Bolton's oracle consistency heuristics, among others.

Bolton in particular has argued that the purpose of an oracle is not to "prove correctness" but to "detect problems." This distinction seems especially relevant to the question at hand.

Tests That Import from Config

Let us take a closer look at the second version from the introduction — the one that imports from config:

import { TAX_RATE } from './config';

test('calculates tax correctly', () => {
  const price = 1000;
  const tax = calculateTax(price);
  expect(tax).toBe(price * TAX_RATE);
});

This test looks reasonable. It follows the DRY principle, and if the value changes, you only need to update it in one place.

However, when viewed through the lens of Howden's principle, this test may have a problem. The expected value comes from the same config as the implementation. If someone accidentally changes TAX_RATE to 0.01, the test will continue to pass. The test has become a mirror of the implementation — and a mirror reflects the same distortions as the original.

Does Hardcoding Solve the Problem?

Then what about the first version — the one with the hardcoded 100?

test('calculates tax correctly', () => {
  const price = 1000;
  const tax = calculateTax(price);
  expect(tax).toBe(100); // assuming 10% tax rate
});

By hardcoding 100, the expected value becomes independent of the implementation. If TAX_RATE is accidentally changed, the test will fail and catch the problem.

But this introduces a different issue. The 100 is a hardcoded value, and the test will also fail when TAX_RATE is intentionally changed. That is not "problem detection" — it is just a test maintenance oversight.

Furthermore, if the reasoning behind 100 is simply "the config file says 0.1, so 1000 × 0.1 = 100," then you have just manually copied the config value. That is essentially the same as duplicating a hardcoded constant.

On the other hand, if the reasoning is "the law mandates a 10% tax rate," then the expected value is grounded in an external fact that is independent of the implementation — exactly the kind of "external mechanism" Howden described.

In other words, even for the same hardcoded 100, the meaning changes depending on whether you can explain the expected value without looking at the code. This seems to be the line that separates a duplicated hardcode from an independent oracle. There may be better criteria, but this is what I arrived at in my research.

Reconsidering the Relationship with DRY

To summarize the story so far: importing from config follows DRY, while hardcoding expected values appears to violate it. Following DRY undermines oracle independence; preserving oracle independence seems to violate DRY. The two principles appear to be in conflict.

However, when we go back to the original definition of DRY, it turns out they may not conflict at all.

Andy Hunt and Dave Thomas addressed this misconception explicitly in the 20th Anniversary Edition of The Pragmatic Programmer:

DRY is about the duplication of knowledge, of intent. It's about expressing the same thing in two different places, possibly in two totally different ways.

In a section titled "Not All Code Duplication is Knowledge Duplication," they explain that even if the code for validating age and validating quantity is identical, those validations represent different pieces of knowledge. That is a coincidence, not a DRY violation.

Let us apply this to test expected values. The production code TAX_RATE = 0.1 represents the knowledge "this system operates with a tax rate of 0.1." The test expect(tax).toBe(100) represents the knowledge "for a 10% tax rate, the tax on 1000 should be 100."

These two happen to involve the same value of 0.1, but they express different knowledge. The former is a system configuration; the latter is a verification of a business rule. If we follow the definition in The Pragmatic Programmer strictly, hardcoding 0.1 in a test is not a DRY violation — it is simply "different knowledge that happens to share the same value."

In other words, DRY and oracle independence may not actually be in conflict, once we return to DRY's original definition. The apparent conflict only arises when DRY is narrowly interpreted as "eliminate code duplication."

I cannot say with certainty that this interpretation is correct. But based on the original definition, calling a hardcoded test expected value a DRY violation does seem like a bit of a stretch.

When Sharing Config Values Becomes Dangerous

Let me return to the config-sharing problem. Even when hardcoding is justified, there is a more structural concern worth considering.

When every unit in a system references the same config constant, an incorrect value in that config propagates to every unit that depends on it. If the tests also reference the same config, then no one — not even the tests — catches the mistake. This risk exists regardless of whether the system is a monolith or a microservice architecture. It is an inherent risk of depending on a single source of truth.

Thinking about this at the unit level, as long as each unit's tests trust input values as "presumably correct" and use them as-is, an error in those values will propagate through every layer of tests without being caught. This may be a somewhat strong claim, but logically, it should hold.

Testing Properties Instead of Exact Values

That said, preparing independent hardcoded expected values for every config value is not practical. Tax rates have legal grounds, but what about a retry count? There is usually no external authority that dictates "this should be 3." For values like these, finding an independent oracle is genuinely difficult.

One approach that may help is testing properties and relationships rather than exact values:

test('retry count is within a reasonable range', () => {
  expect(MAX_RETRY).toBeGreaterThan(0);
  expect(MAX_RETRY).toBeLessThanOrEqual(10);
});

test('retry count and timeout are consistent', () => {
  expect(MAX_RETRY * RETRY_INTERVAL).toBeLessThanOrEqual(TIMEOUT);
});

Tests like these do not need to be updated when values change. They only catch unintended breakage. And because the constraints are grounded in design intent — which serves as an external justification — they seem consistent with Howden's principle.

It Is a Tradeoff After All

Howden's principle is theoretically sound. The argument that all expected values should be independent of the implementation is logically correct. But applying independent verification to every single config value is not realistic.

So this is, I believe, a tradeoff.

What matters most is whether you are aware that you are making a tradeoff. There is a significant difference between deciding "this value should ideally be verified independently, but we are sharing it from config as a conscious cost-benefit decision" and simply sharing it without any thought.

Coming back to the original question: does it matter whether the expected value is hardcoded or imported from config? I believe it does — but not because one approach is universally better. What matters is whether the choice was made deliberately, with an understanding of what is gained and what is lost.

Perhaps the value of a test is determined not by its form, but by the quality of the judgment behind it. I may be off the mark, but that is the conclusion I reached through this research.