The Test That Lied to Me
A practical guide to writing unit tests that actually mean something
A quick note before we start: there is a lot of enthusiasm lately about AI generating unit tests automatically. I have tried it. The results were technically valid, consistently green, and almost completely useless — which, if you think about it, is a perfect description of most of the unit tests in the industry.
So here we are. Maybe this helps the AI write better tests. Maybe it just helps the engineers. Either way, I felt the need to write it down.
This guide is not about coverage percentages or which framework to use. It is about the decisions that determine whether your test suite is an asset or an alibi.
The examples in this guide use Java with JUnit 5 and Mockito. The principles apply everywhere.
Part 1: The Name Is the Specification
Before a test does anything, it declares what it expects. That declaration lives in the name.
A bad test name is a lost opportunity. Not just for documentation — for thinking. If you cannot write a clear name for a test, it is usually because you have not yet decided what you are actually testing.
The convention that works:
Should_ExpectedBehaviour_When_StateUnderTest
Not because the format is sacred, but because it forces two decisions: what should happen, and under what condition. Both need to be explicit.
// ❌ Tells you nothing
@Test
public void testValidation() {
// ...
}
// ✓ Tells you everything
@Test
public void Should_ReturnValidationError_When_AmountIsNegative() {
// ...
}
The second name is a spec. If this test fails, you know exactly what broke before reading a single line of code. That matters at 11pm when something is on fire.
One more thing: consistency. A codebase where half the tests say should_ and half say test_ and half say verify_ is a codebase where nobody agreed on anything. Pick a convention and apply it. It costs nothing and saves more than you expect.
Part 2: One Test, One Reason to Fail
The most common mistake in test suites is not bad assertions. It is too many of them, testing too many things at once.
A test that can fail for three different reasons gives you almost no information when it does fail. You know something is wrong. You do not know what.
One condition, one test
Even if two different invalid inputs produce the same error, they belong in separate tests. The failure condition is not the same, even if the output is.
// ❌ Two causes, one test
@Test
public void Should_ThrowException_When_Invalid() {
assertThrows(NotFoundException.class, () -> victim.findOrder(unknownId));
assertThrows(NotFoundException.class, () -> victim.findOrder(deletedId));
}
// ✓ One cause, one test
@Test
public void Should_ThrowNotFoundException_When_OrderDoesNotExist() {
when(orderRepository.findById(unknownId)).thenReturn(Optional.empty());
assertThrows(NotFoundException.class, () -> victim.findOrder(unknownId));
}
@Test
public void Should_ThrowNotFoundException_When_OrderIsDeleted() {
when(orderRepository.findById(deletedId)).thenReturn(Optional.of(deletedOrder()));
assertThrows(NotFoundException.class, () -> victim.findOrder(deletedId));
}
The tests are longer this way. That is the point. Diagnostic value is worth the extra lines.
Parameterization: the right use
Parameterized tests are useful when you are testing the same behaviour with different values of the same input. They are not useful when you are testing different behaviours and bundling them together to make the test file look shorter.
The rule: one axis of variation per parameterized test. If your test method takes a flag name as a parameter alongside the flag value, you are probably mixing two different concerns.
// ❌ Mixed concerns — iterates over unrelated flags
static Stream<Arguments> flags() {
return Stream.of(
of("include_tax", true),
of("include_tax", false),
of("include_discount", true),
of("include_discount", false)
);
}
@ParameterizedTest
void Should_HandleFlag_When_Set(String flag, boolean value) {
// tests multiple unrelated flags in one method
}
// ✓ One concern, two values
static Stream<Arguments> taxFlagVariants() {
return Stream.of(
of(true, "with-tax.json"),
of(false, "without-tax.json")
);
}
@ParameterizedTest
void Should_GenerateCorrectInvoice_When_TaxFlagSet(boolean includeTax, String expectedFile) {
// tests one flag, both states
}
When the parameterized test fails, you want to know exactly which value caused it. A test that iterates over unrelated flags gives you a row number. A focused test gives you a reason.
Part 3: Structure Is Not Optional
A test has three jobs: set up the world, do the thing, check what happened. If you cannot identify which line belongs to which job, the test is already a problem.
Given, When, Then is not ceremony. It is the minimum structure required for a test to be readable by someone who did not write it — including future you.
@Test
void Should_CalculateAggregatedData_When_ValidDataProvided() {
// Given
when(repository.returnDataToBeAggregated())
.thenReturn(List.of(
new DataPoint("Sensor1", 100),
new DataPoint("Sensor2", 200)
));
AggregatorService victim = new AggregatorService(repository);
// When
int result = victim.calculateAggregatedData();
// Then
assertEquals(300, result);
}
The comments (
// Given,// When,// Then) are for illustration only. In real tests, the structure should be clear enough that the comments are unnecessary. If they are not, the test probably needs restructuring.
Naming conventions within the test
Two naming decisions that pay consistent dividends:
- Pick a consistent name for the class under test and use it everywhere. Common choices are
victim,underTest, orsut(system under test). Which one you pick matters less than the fact that everyone on the team picks the same one — it makes the subject of the test immediately identifiable at a glance. - Do the same for the output.
resultis a common choice. Whatever you pick, the consistency is what makes tests scannable.
AggregatorService victim = new AggregatorService(repository);
int result = victim.calculateAggregatedData();
Keep variables where they matter
There is a tendency to extract every string into a named variable. In production code, this is usually right. In tests, it can be counterproductive.
If a value is self-explanatory, inline it. Extracting "ADMIN" into a variable called adminRoleName adds a line and removes information. The reader now has to look up what adminRoleName is.
// ❌ Over-extracted
String adminRole = "ADMIN";
String userRole = "USER";
String userId1 = "usr-001";
String userId2 = "usr-002";
String userId3 = "usr-003";
insertUsers(
createUser(userId1, adminRole),
createUser(userId2, adminRole),
createUser(userId3, userRole)
);
// ✓ Inlined — the values speak for themselves
insertUsers(
createUser("usr-001", "ADMIN"),
createUser("usr-002", "ADMIN"),
createUser("usr-003", "USER")
);
List<User> result = requestUsersByRole("ADMIN");
assertThat(result)
.extracting(User::getId)
.containsOnly("usr-001", "usr-002");
Part 4: The Setup Trap
@BeforeEach is one of the most misused tools in unit testing. It exists to avoid repeating identical setup across tests. It is routinely used to hide setup that is not actually identical — just similar enough to seem like it should be shared.
The result is tests that look short but are not self-contained. To understand what a test actually does, you have to read the test and the setup method and remember how they interact. If a later test overrides one of the behaviours defined in setup, that is three places to look.
// ❌ Hidden dependency — where does the mock behaviour come from?
@BeforeEach
void setup() {
when(repo.findById(anyLong()))
.thenReturn(Optional.of(new Entity()));
victim = new MyService(repo);
}
@Test
void Should_ThrowException_When_NotFound() {
// Silently overrides BeforeEach
when(repo.findById(anyLong()))
.thenReturn(Optional.empty());
assertThrows(NotFoundException.class, () -> victim.findById(99L));
}
// ✓ Self-contained — everything you need is right here
@Test
void Should_ReturnEntity_When_IdIsValid() {
when(repo.findById(1L))
.thenReturn(Optional.of(new Entity()));
MyService victim = new MyService(repo);
Entity result = victim.findById(1L);
assertNotNull(result);
}
@Test
void Should_ThrowException_When_NotFound() {
when(repo.findById(99L))
.thenReturn(Optional.empty());
MyService victim = new MyService(repo);
assertThrows(NotFoundException.class, () -> victim.findById(99L));
}
The second version is longer. It is also the one that tells you, immediately, what each test needs to be true. There is no invisible context. There is no setup to remember.
The alternative to @BeforeEach is not duplication — it is helper functions used correctly. A helper that builds an object with sensible defaults and accepts explicit parameters for the values that matter keeps the test readable without hiding what is being tested.
// ❌ Helper hides what matters
prepareData(
createInvalidData(),
createPartialData()
);
// ✓ Helper is explicit about what varies
prepareData(
createOrderLine("item-a", price(100.0), quantity(2)),
createOrderLine("item-b", price(0.0), quantity(1))
);
The difference is what the reader learns from looking at the test. The first version tells you the data is invalid and partial. The second tells you exactly why — one line has a zero price. That is the information that matters for understanding what the test is actually verifying.
@BeforeEachhas legitimate uses: initialising mocks, setting up state that is genuinely shared across every test, preparing infrastructure. The problem is using it to define mock behaviour — which is almost never truly shared.
@BeforeAll has even fewer legitimate uses at the unit test level. Loading a config file once, compiling a regex, spinning up something expensive. Not setting up test data. If your unit test needs data set up once for all tests, that is usually a sign the tests are not as independent as they should be.
Test utilities — use sparingly, test thoroughly
There is a tension worth acknowledging here. On one hand, test utility classes tend to become bloated, overgeneralised, and quietly wrong. On the other hand, utility classes extracted from production code need to be tested — properly, not incidentally.
These are different problems.
On production utility classes: a utility method extracted for reuse is a first-class unit of logic. It deserves its own tests covering valid inputs, invalid inputs, and boundary conditions. The fact that it is called from another class does not mean it is implicitly tested — it means it is tested indirectly, which is not the same thing. Indirect coverage tells you something broke. Direct tests tell you what.
That said, direct unit tests alone are not enough. A utility method that works correctly in isolation can still behave unexpectedly in context — with real data, in combination with other logic, under conditions the unit tests did not anticipate. Test it directly and verify it in context.
On test utility classes: the bar should be high. A helper that creates a test object with sensible defaults is useful. A helper that encodes business logic, makes assertions, or accumulates enough behaviour to need its own documentation is a liability. When a test utility class becomes complex enough that you have to understand it to understand a test, it has defeated its own purpose.
The rule of thumb: if the logic is genuinely shared across many tests and has no natural home elsewhere, a utility makes sense. If it exists mainly to save a few lines, inline it. Explicit setup in the test is almost always easier to read than a call to a helper that hides what is actually being prepared.
Part 5: Keep Tests Simple
A test that contains an if statement can be wrong in two different ways: the production code can be wrong, or the test logic can be wrong. At that point, who is testing the test?
Tests should verify behaviour, not implement it. The moment a test starts making decisions — branching on input, looping over results, switching on type — it has become code that needs its own tests. That is not a metaphor. That is the actual problem.
// ❌ The test is doing too much thinking
@Test
void Should_ApplyDiscount_When_Eligible() {
for (Customer customer : testCustomers) {
if (customer.isEligible()) {
assertEquals(0.9, victim.calculateDiscount(customer));
} else {
assertEquals(1.0, victim.calculateDiscount(customer));
}
}
}
// ✓ Two cases, two tests, no ambiguity
@Test
void Should_ApplyDiscount_When_CustomerIsEligible() {
Customer customer = new Customer("alice", eligible(true));
double result = victim.calculateDiscount(customer);
assertEquals(0.9, result);
}
@Test
void Should_NotApplyDiscount_When_CustomerIsNotEligible() {
Customer customer = new Customer("bob", eligible(false));
double result = victim.calculateDiscount(customer);
assertEquals(1.0, result);
}
When test logic grows complex, the instinct is to make it smarter. The right move is almost always to make it simpler — and split it.
Part 6: Mock the Behaviour, Not the Data
Mocking is one of the easier things to do wrong in unit testing. The tell is when you find yourself mocking a data object — a DTO, a plain record, a value container.
Mocking data objects produces fragile tests. They break when the object's structure changes, even when the logic being tested has not changed at all. Worse, they test almost nothing meaningful. A mock DTO that returns a hardcoded string is not telling you anything about your system.
What you actually want to mock is the behaviour of the things your code depends on — repositories, external services, anything that makes a decision or crosses a boundary.
// ❌ Mocking data — fragile and meaningless
OrderDTO dto = Mockito.mock(OrderDTO.class);
when(dto.getProductId()).thenReturn("123");
when(dto.getQuantity()).thenReturn(2);
// If OrderDTO gains a new field, this test might break for no reason
// ✓ Mocking behaviour — tests how the service handles real scenarios
Order order = new Order("123", "Product A", 2);
when(orderRepository.findById("123"))
.thenReturn(Optional.of(order));
The distinction matters most when thinking about what you are actually verifying. A test that mocks a DTO is verifying that Mockito works. A test that mocks a repository is verifying that your service behaves correctly when data is and is not found.
Verifying interactions
Not everything worth testing produces a return value. Sometimes the important thing is that a method was called, was called with the right arguments, or was never called at all.
@Test
void Should_NotPersist_When_InputIsInvalid() {
victim.processData("invalid");
verify(repository, never()).save(anyString());
}
@Test
void Should_PersistParsedValue_When_InputIsValid() {
ArgumentCaptor<String> captor = ArgumentCaptor.forClass(String.class);
victim.processData("raw-value");
verify(repository).save(captor.capture());
assertEquals("parsed-value", captor.getValue());
}
The first test tells you nothing was saved. The second tells you exactly what was saved. Neither requires a return value. Both tell you something real about how the system behaves.
That said, if verifying an interaction is the only way to test something meaningful — a calculation, a transformation, a decision — it is worth pausing before reaching for verify. Code that can only be validated through its side effects is often code that is doing too much in one place. Extracting the logic into a dedicated class or utility makes it directly testable by its output, which is almost always cleaner.
Needing to spy on a method to confirm a calculation happened is not just a testing problem. It is usually a separation of concerns problem wearing a testing costume.
Part 7: The Edge Cases Are the Point
Unit tests are the cheapest place to cover edge cases. Cheaper than component tests, cheaper than integration tests, infinitely cheaper than a production incident.
Yet most test suites are weighted toward the happy path. The valid input goes in, the correct output comes out. Green. Done. The test suite grows, coverage climbs, and the system still breaks on the first null it encounters.
Edge cases worth explicitly testing:
- Null inputs, empty strings, empty collections
- Boundary values: zero, negative numbers, maximum allowed values
- The case where a dependency returns nothing (
Optional.empty(), empty list) - The case where a dependency throws
- Inputs that are technically valid but semantically unusual
@Test
void Should_ThrowException_When_OrderLineIsNull() {
assertThrows(IllegalArgumentException.class,
() -> victim.calculateUnitPrice(null));
}
@Test
void Should_ThrowException_When_PriceIsNull() {
OrderLine line = new OrderLine("item", null, quantity(2));
assertThrows(IllegalArgumentException.class,
() -> victim.calculateUnitPrice(line));
}
@Test
void Should_ReturnZeroUnitPrice_When_QuantityIsZero() {
OrderLine line = new OrderLine("item", totalPrice(50.0), quantity(0));
Money result = victim.calculateUnitPrice(line);
assertEquals(Money.ZERO, result);
}
Edge cases are not extra credit. They are the specification for how your code behaves when the world does not cooperate. Which is most of the time.
Part 8: Tests That Lie
A passing test should mean something. If it passes when the feature is broken, it is not a test — it is a green light that gives false confidence, which is worse than no test at all.
Tests lie in predictable ways.
The assertion that does not assert
The most common form. The test runs, it calls the method, but the assertion is checking something that is always true regardless of what the method does.
// ❌ This always passes — it has nothing to do with process()
@Test
void Should_ProcessOrder_When_Valid() {
Order order = new Order("123", "Product", 2);
victim.process(order);
// We're asserting the id we assigned, not anything the method did
assertEquals("123", order.getId());
}
// ✓ This actually tests something
@Test
void Should_PersistOrder_When_Valid() {
Order order = new Order("123", "Product", 2);
victim.process(order);
verify(repository).save(order);
}
The test that survives deletion
Delete the implementation. Run the tests. If the same tests still pass, they were not testing the implementation.
This sounds extreme, but it is one of the most useful checks you can do on a test suite. Tests that survive the deletion of the thing they claim to test are guaranteed liars.
The complex test
Tests that contain if statements, switch cases, or loops are tests that can have bugs. A test with a bug is not a test — it is a liability that produces a false sense of coverage.
If the test logic is becoming complex, the answer is almost always to split it into simpler tests, not to make the existing test smarter.
Part 9: Write Code That Wants to Be Tested
Some code is hard to test because the test is poorly written. But some code is hard to test because the code itself is poorly designed — and the difficulty of testing is the most honest feedback you will get about that.
A class that requires a running database to do anything is a class that has not separated its concerns. A method that produces no output and mutates hidden state is a method that has made itself invisible to assertions. A function that does five things is a function that needs five different test setups to cover each path.
Testability is not a property you add after the fact. It emerges from design decisions made while writing the code.
A few patterns that consistently produce untestable code — and what to do instead:
Hidden dependencies. If a class creates its own collaborators internally, there is no way to replace them in tests.
// ❌ No way to control what the repository does
public class OrderService {
private final OrderRepository repository = new OrderRepository();
}
// ✓ Inject it — now tests can provide their own
public class OrderService {
private final OrderRepository repository;
public OrderService(OrderRepository repository) {
this.repository = repository;
}
}
Logic buried in private methods. Private methods are not directly testable. If a private method contains meaningful logic, that logic either gets tested indirectly through the public interface — which is fine — or it is complex enough that it should be extracted into its own class and tested directly.
// ❌ Complex logic hidden where tests cannot reach it
private BigDecimal applyTieredPricing(BigDecimal base, int quantity) {
// 30 lines of pricing logic
}
// ✓ Extracted — now testable on its own terms
public class TieredPricingCalculator {
public BigDecimal calculate(BigDecimal base, int quantity) {
// same logic, now directly testable
}
}
Static method calls and global state. Static calls are impossible to mock and global state makes tests order-dependent (not impossible, but highly inadvisable). Both are usually avoidable.
Methods that do too much. A method that validates input, fetches data, applies business rules, formats output, and persists the result cannot be tested cleanly. Each responsibility it sheds becomes a unit that can be tested independently.
The uncomfortable version of this principle: if writing the test feels like a struggle, read the production code before blaming the test. The test is probably trying to tell you something.
Part 10: The Test Suite As a Document
The best test suites are the best onboarding material. Not because someone planned them that way, but because tests that are named correctly, structured clearly, and focused on one thing each naturally become a readable description of what the system does.
When a new developer joins the team and wants to understand how order processing works, they have two options. They can read the production code, which tells them how. Or they can read the tests, which tell them what — what inputs are valid, what happens on the edge cases, what the system refuses to do and why.
That is the difference between a test suite written for coverage and a test suite written with intention.
A useful exercise: read through your test names without reading the test bodies. Can you reconstruct what the system does from the names alone? If not, the names are not doing their job.
Inheritance in tests works against this. When a test class extends a base class to inherit setup, the test is no longer self-contained. Understanding it requires reading two files, understanding how they relate, and tracking what the parent does and does not override. That is more archaeology than onboarding.
As discussed in Part 4, the same applies to test utility classes that grow large enough to have their own logic.
Part 11: What Not to Test
As important as knowing what to test is knowing what not to.
Do not test the web layer
A service does not return a 404. It throws a NotFoundException. What happens to that exception — how it gets mapped to an HTTP status code, how the response body is shaped, what headers are attached — is the web layer's responsibility, not the service's.
Unit tests live below that boundary. At the unit level, test the logic: services, domain classes, utility methods, calculations, decisions. Anything that can be verified in isolation, without starting a server or talking to a database, belongs here. The moment a test depends on another party — an HTTP layer, a real database, a message broker, an external service — it has crossed into component or integration territory. Keep the two separate and both become easier to reason about.
Do not bootstrap the application context
If your unit test has @SpringBootTest on it, it is not a unit test anymore. It is a component test that happens to live in the wrong folder.
Bootstrapping a full application context to test a single service method is overkill by definition. It is slow, it introduces dependencies that have nothing to do with what you are testing, and it blurs the line between levels of testing in ways that tend to get worse over time.
Unit tests should start fast and run in isolation. Mocks and stubs exist precisely so you do not need a running application to verify that a service behaves correctly. If you find yourself reaching for @SpringBootTest at the unit level, the question worth asking is not "how do I make this work" but "why does this feel necessary" — because the answer usually points to a design problem.
This applies beyond Spring. Any equivalent mechanism that boots a full application context — dependency injection containers, embedded servers, framework runners — has no place in a unit test.
Do not test the framework
If you are using a framework such as Spring Boot, do not test that @Autowired works or that the application can read the application.yml. Spring tests that. If you are using Jackson, do not test that it serializes an object. Jackson tests that. The job of your tests is to verify that your code, given correct inputs from the framework, produces the right outputs.
Testing framework behaviour wastes time, adds maintenance burden, and produces tests that break when you upgrade a dependency — not because your code changed, but because the framework's internals did.
Do not write production code for tests
If the only reason a getter exists is to make a test easier to write, that getter should not exist. The test should find another way.
Production code written for tests is production code that does not serve production. It inflates the API, exposes internals that should stay internal, and misleads anyone who reads the class and wonders what that method is for.
The constraint is useful. If a class is difficult to test without special access, that is usually feedback about the design. A class that is hard to test without poking at its internals is often a class that is doing too much, or a class whose dependencies are not properly injected.
Test the observable behaviour. If that is not enough to verify the class is working, the class probably needs to be redesigned.
The Uncomfortable Part
Most of these guidelines are not hard to understand. They are hard to apply consistently when you are under pressure, when the ticket is already late, when the test suite has a hundred tests written the wrong way and adding one more wrong one is faster than fixing the pattern.
The longer version of this guide would be about that. About how a test suite degrades gradually, one convenience at a time, until the green board means almost nothing and everyone has quietly stopped trusting it.
The short version is this: a test that you do not trust is not a safety net. It is a ritual.
Writing tests that you actually believe is harder than writing tests that pass. It requires deciding what you are testing before you write the test. It requires resisting the urge to share setup that is not really shared. It requires accepting that ten focused tests are better than one test that checks everything.
None of that is complicated.
It just requires not taking shortcuts.
Quick Reference
| Guideline | The point |
|---|---|
| Name clearly |
Should_ExpectedBehaviour_When_StateUnderTest — readable without reading the code |
| One reason to fail | Separate tests for separate failure causes, even when the outcome is the same |
| Given / When / Then | Three sections, always. If you cannot identify them, restructure. |
| Consistent naming | Pick a name for the class under test and the output, and use them everywhere |
| Self-contained setup | Avoid @BeforeEach for mock behaviour. Put setup where it belongs: in the test. |
| Mock behaviour, not data | Mock repositories, services, decisions — not DTOs or plain objects |
| Inline simple values | Do not extract constants that are already readable as literals |
| No logic in tests | No if, no loops, no switch. If the test needs to think, split it. |
| Cover edge cases | Null, empty, zero, boundary, missing. These are the interesting cases. |
| Do not test the web layer | Services throw exceptions. HTTP status codes are someone else's job. |
| Do not bootstrap the context |
@SpringBootTest in a unit test is a component test in the wrong folder. |
| Do not test the framework | Test your logic. Assume Spring, Jackson, and JPA work. |
| No test-only production code | If a getter exists only for tests, the test is wrong, not the class. |
Top comments (0)