Patoliya Infotech

Posted on May 20

Advanced Mocking Strategies: Mastering Test Doubles & Behavior Verification

#javascript #testing #kotlin #software

Every experienced developer has been there: a test suite that passes on your machine but fails in CI, a green build that hides a broken integration, or a test that's so tightly coupled to implementation that it breaks every time you refactor. These aren't testing failures, they're mocking failures.

Mocking exists to solve a fundamental tension in software testing: the need to test a unit of code in complete isolation while that code inevitably depends on the outside world, databases, APIs, queues, third-party services, the clock itself. Done well, mocking gives you surgical precision. Done poorly, it gives you false confidence and a maintenance nightmare.

This article is not about the basics. You already know what a mock is. What we're going after here is the craft, the judgment to know when mocking improves your tests and when it quietly destroys them, how to differentiate between the five types of test doubles, and how to write behavior-driven tests that survive aggressive refactoring.

If you're building production systems, payment processors, healthcare platforms, logistics engines, microservices that talk to each other, the strategies in this article will materially improve your test architecture.

Understanding Test Doubles: Beyond "Just Use a Mock"

Martin Fowler's taxonomy of test doubles is two decades old, but most developers still conflate them. This conflation leads to the wrong tool for the job. Let's fix that.

Dummy Objects

A dummy is the simplest test double. It's passed into a method to satisfy a parameter requirement but is never actually used in the execution path you're testing.

// Java / Mockito
// We need a Logger to construct PaymentProcessor, but we're testing charge()
// which doesn't log anything in the success path
Logger dummyLogger = mock(Logger.class);
PaymentProcessor processor = new PaymentProcessor(paymentGateway, dummyLogger);
processor.charge(order);
// dummyLogger is never called we don't verify or configure it

The key insight: if you find yourself configuring behavior on a dummy or verifying calls to it, you've misidentified it. It's now a mock or stub.

Stubs

A stub provides canned answers to method calls. It doesn't care how it's called only that when asked, it returns what you've configured. Stubs are for state-based scenarios where you need to control what a dependency returns.

# Python / unittest.mock
from unittest.mock import MagicMock

user_repo = MagicMock()
user_repo.find_by_id.return_value = User(id=42, email="dev@example.com", tier="premium")

service = BillingService(user_repo)
invoice = service.generate_invoice(user_id=42)

assert invoice.discount_rate == 0.20  # premium tier discount

Notice: we're asserting on the state of invoice, not on whether find_by_id was called. That's the defining characteristic of stub usage, state verification, not interaction verification.

Fakes

Fakes are working implementations with simplified behavior. They're not configured per-test, they behave like the real thing but take shortcuts unsuitable for production.

// JavaScript an in-memory fake for a UserRepository
class InMemoryUserRepository {
  constructor() {
    this.users = new Map();
  }

  async findById(id) {
    return this.users.get(id) ?? null;
  }

  async save(user) {
    this.users.set(user.id, user);
    return user;
  }
}

// Tests use the fake, no database, no network, but real logic flows
const repo = new InMemoryUserRepository();
await repo.save({ id: 1, email: "test@example.com" });
const found = await repo.findById(1);
expect(found.email).toBe("test@example.com");

Fakes shine in integration-style tests where you want realistic behavior across multiple operations without external dependencies. They're higher effort to build but pay dividends when you have many tests hitting the same abstraction.

Spies

A spy wraps a real object and records how it was used. Unlike mocks, spies let the real implementation run, they just observe.

// Jest
const emailService = {
  send: async (to, subject, body) => {
    // real implementation that would send email
    return { messageId: "real-id" };
  }
};

const sendSpy = jest.spyOn(emailService, 'send');

await notificationService.notifyUser(user, "Your order shipped");

expect(sendSpy).toHaveBeenCalledWith(
  "user@example.com",
  expect.stringContaining("shipped"),
  expect.any(String)
);

Spies are excellent for auditing: you want to verify that something happened, but you don't want to replace the real behavior. Use them when the real implementation is fast, side-effect-free (or the side effects are acceptable in tests), and you care about interaction patterns.

Mocks

Mocks are the full package, pre-programmed with expectations, configured with behavior, and verified after the fact. They fail loudly when those expectations aren't met.

// Mockito - strict mock with behavior expectations
@Test
void shouldSendConfirmationEmailAfterSuccessfulPayment() {
    // Arrange
    EmailService emailService = mock(EmailService.class);
    PaymentGateway gateway = mock(PaymentGateway.class);
    when(gateway.charge(any(ChargeRequest.class)))
        .thenReturn(ChargeResult.success("txn-123"));

    OrderService service = new OrderService(gateway, emailService);

    // Act
    service.placeOrder(order);

    // Assert - interaction verification
    verify(emailService).sendConfirmation(
        eq("customer@example.com"),
        argThat(receipt -> receipt.transactionId().equals("txn-123"))
    );
    verifyNoMoreInteractions(emailService);
}

The mock here asserts behavior: not just that the order was placed successfully, but that the system communicated correctly with the email subsystem. This is interaction-based testing, the subject of our next section.

State Verification vs. Behavior Verification

This distinction is the conceptual core of advanced mocking. Getting it wrong leads to tests that test the wrong thing.

State Verification

After exercising the system under test, you examine the resulting state of the objects involved.

# State verification: did the order end up in the right status?
def test_cancel_order_updates_status():
    order = Order(id=1, status="pending")
    order_service = OrderService(InMemoryOrderRepository())
    order_service.cancel(order)

    retrieved = order_service.find(1)
    assert retrieved.status == "cancelled"     # ← state check
    assert retrieved.cancelled_at is not None  # ← state check

State verification is more robust. It survives refactoring because it doesn't care how the cancellation happens, only that it did happen correctly.

Behavior Verification

After exercising the system, you verify which methods were called on collaborators, with which arguments, and in what sequence.

# Behavior verification: did the system tell the right things to the right collaborators?
def test_cancel_order_notifies_fulfillment():
    fulfillment = MagicMock()
    order_service = OrderService(repo, fulfillment_client=fulfillment)

    order_service.cancel(order)

    fulfillment.cancel_shipment.assert_called_once_with(order.shipment_id)

Behavior verification becomes essential when the observable outcome is the interaction. If your method's job is to coordinate between subsystems, trigger a webhook, publish an event, call a third-party API, there may be no state to assert on. The interaction is the behavior.

The Tradeoff

Behavior verification couples your tests to implementation. If you refactor cancel_order to batch cancellations instead of calling cancel_shipment individually, your test breaks even though the system behavior is unchanged.

The practical rule: use state verification as your default. Escalate to behavior verification when the side effects are the requirement, when you're specifying that a notification must fire, an audit log must be written, a downstream service must be informed.

Advanced Mocking Strategies

Partial Mocks: The Compromise You Should Question

A partial mock, sometimes called a spy, mocks some methods while leaving others real. Most frameworks support it, and most experienced developers use it rarely.

// Mockito partial mock (spy)
UserService realService = new UserService(realRepo);
UserService partialMock = spy(realService);

// Override just the email-sending part
doNothing().when(partialMock).sendWelcomeEmail(any());

partialMock.registerUser(newUser);

verify(partialMock).sendWelcomeEmail(newUser);

When are partial mocks appropriate? When testing legacy code that hasn't been refactored for testability, you want to isolate one expensive or side-effectful collaborator without rewriting the entire class. Treat them as a refactoring stepping stone, not a destination.

Strict vs. Loose Mocks

Strict mocks (Mockito's STRICT_STUBS, Jest's default mock behavior with jest.config strictness) fail if you configure behavior that's never called, or call methods you didn't configure.

// Strict stubs - fail if configured interaction doesn't happen
@ExtendWith(MockitoExtension.class) // uses STRICT_STUBS by default
class PaymentServiceTest {
    @Mock PaymentGateway gateway;

    @Test
    void shouldChargeCorrectAmount() {
        when(gateway.charge(eq(new Money(100, "USD")))).thenReturn(SUCCESS);
        // If gateway.charge() is never called, Mockito fails the test
        // This catches "dead stubs" - configured interactions that became irrelevant after refactoring
        service.processOrder(order);
        verify(gateway).charge(new Money(100, "USD"));
    }
}

Strict mocking is worth the friction on payment flows, security-critical paths, and integration boundaries. The noise it creates is signal - it tells you when your mock setup has drifted from the actual behavior.

Deep Stubbing: Handle With Care

Deep stubbing lets you chain method calls on mocks without configuring each intermediate object.

// Jest deep mock chaining
const stripe = {
  customers: {
    retrieve: jest.fn().mockResolvedValue({
      subscriptions: {
        data: [{ status: 'active', plan: { amount: 2999 } }]
      }
    })
  }
};

// Now you can call stripe.customers.retrieve().subscriptions.data[0].plan.amount

The problem with deep stubbing: it's a strong signal that your code violates the Law of Demeter. If your production code chains a.getB().getC().getD(), consider whether a should expose a higher-level method instead. Deep stubs work - but they often indicate a design smell worth addressing.

Mocking Async Workflows

Modern systems are overwhelmingly async. Mocking async behavior requires explicit attention to Promise resolution, error paths, and sequencing.

// Jest - async mock with sequential responses
const repository = {
  findUser: jest.fn()
    .mockResolvedValueOnce({ id: 1, status: 'active' })   // first call
    .mockResolvedValueOnce(null)                            // second call
    .mockRejectedValueOnce(new DatabaseError('timeout'))   // third call (error path)
};

// Test the retry logic
await expect(service.getUser(1)).resolves.toEqual({ id: 1, status: 'active' });
await expect(service.getUser(1)).resolves.toBeNull();
await expect(service.getUser(1)).rejects.toThrow('timeout');

For event-driven systems, mock the event emitter and test that handlers are registered and invoked correctly:

const eventBus = {
  publish: jest.fn(),
  subscribe: jest.fn()
};

orderService = new OrderService(eventBus);
await orderService.confirmOrder(orderId);

expect(eventBus.publish).toHaveBeenCalledWith('order.confirmed', {
  orderId,
  timestamp: expect.any(Number)
});

Mocking External Services and APIs

For payment gateways, shipping carriers, and communication platforms, you have two good options: HTTP-level mocking (intercept and respond at the network layer) or facade-level mocking (mock the wrapper you wrote around the third-party SDK).

# Python - mocking the Stripe client at the SDK level
from unittest.mock import patch, MagicMock

@patch('app.services.stripe.stripe.PaymentIntent.create')
def test_creates_payment_intent_with_correct_amount(mock_create):
    mock_create.return_value = MagicMock(
        id='pi_test_123',
        status='requires_payment_method',
        client_secret='pi_test_123_secret_abc'
    )

    result = payment_service.initiate_payment(amount=4999, currency='usd')

    mock_create.assert_called_once_with(
        amount=4999,
        currency='usd',
        automatic_payment_methods={'enabled': True}
    )
    assert result.intent_id == 'pi_test_123'

The critical practice: always wrap third-party clients behind an interface you own. This gives you a clean seam to mock without deep-stubbing through vendor SDKs. This is a foundational principle in custom software development - abstraction isn't just good architecture, it's good testability.

Contract-Focused Mocking

In microservice architectures, mocks can diverge from the actual service contracts over time - your mock says the service returns { user_id: 42 } but the real service now returns { userId: 42 }. Consumer-driven contract testing (via Pact or similar) keeps mocks honest.

// Pact consumer test - the contract becomes a test artifact
const { Pact } = require('@pact-foundation/pact');

const provider = new Pact({ consumer: 'OrderService', provider: 'UserService' });

await provider.addInteraction({
  state: 'user 42 exists',
  uponReceiving: 'a request for user 42',
  withRequest: { method: 'GET', path: '/users/42' },
  willRespondWith: {
    status: 200,
    body: { id: 42, email: 'dev@example.com', tier: 'premium' }
  }
});

// Your tests run against this mock, but the contract is shared with the provider team
// Provider tests verify they can satisfy the contract

Contract testing is the right answer to the question: "How do I know my mocks are realistic?" This matters enormously for microservice communication and distributed system testing. It's one of the more powerful advanced patterns available to teams doing continuous delivery.

Dependency Isolation Strategies

Before you can mock a dependency, you need a seam, a place in the code where you can substitute the real thing for a test double. The three main strategies:

Constructor injection (cleanest, most testable):

public class NotificationService {
    private final EmailClient emailClient;
    private final SmsClient smsClient;

    public NotificationService(EmailClient emailClient, SmsClient smsClient) {
        this.emailClient = emailClient;
        this.smsClient = smsClient;
    }
}

Method injection (useful for per-call variation):

def process_payment(order, gateway=None):
    gateway = gateway or StripeGateway()
    return gateway.charge(order.total)

Interface-based abstraction (enforces the contract):

interface MessageQueue {
  publish(topic: string, payload: unknown): Promise<void>;
  subscribe(topic: string, handler: (msg: unknown) => void): void;
}

class OrderProcessor {
  constructor(private queue: MessageQueue) {}
  // Now you can mock MessageQueue freely in tests
}

Common Anti-Patterns That Quietly Destroy Your Test Suite

Over-Mocking: The Test That Tests Nothing

When you mock everything, repositories, services, validators, factories, you're no longer testing behavior. You're testing that your code calls its dependencies in a specific order. This is brittle and meaningless.

// ❌ Over-mocked: this test will pass even if the business logic is completely wrong
@Test
void overMockedAntiPattern() {
    when(validator.validate(order)).thenReturn(true);
    when(pricer.calculateTotal(order)).thenReturn(money);
    when(inventoryChecker.isAvailable(order)).thenReturn(true);
    when(paymentProcessor.charge(money)).thenReturn(receipt);
    when(orderRepo.save(any())).thenReturn(savedOrder);
    when(emailService.send(any())).thenReturn(true);

    service.placeOrder(order);

    verify(validator).validate(order);
    verify(pricer).calculateTotal(order);
    // ... etc
    // What did we actually test? Just the call order. Not correctness.
}

The fix: mock only the external boundary - things outside your process (databases, HTTP calls, queues). Let the internal logic run for real.

Testing Implementation Details

The most common cause of brittle tests is asserting on how something is done rather than what it accomplishes.

// ❌ Brittle - breaks on every refactor
expect(userService.hashPassword).toHaveBeenCalledWith('rawpassword', { rounds: 12 });

// ✅ Resilient - tests the outcome
const savedUser = await userRepo.findByEmail('test@example.com');
expect(savedUser.password).not.toBe('rawpassword');
expect(await bcrypt.compare('rawpassword', savedUser.password)).toBe(true);

Implementation details are the "how." Your tests should own the "what" and the "whether." This is the single most impactful principle in building test suites that survive feature development. It's especially relevant in software testing and QA practices, where test maintainability is as important as coverage.

Brittle Interaction Assertions

verifyNoMoreInteractions() and verifyZeroInteractions() are useful in narrow security contexts (audit logging, rate limiting) but catastrophic as general-purpose assertions.

// ❌ This breaks whenever you add legitimate logging, metrics, or tracing
verifyNoMoreInteractions(emailService);

// ✅ Only verify what the test is specifically about
verify(emailService).sendOrderConfirmation(eq(order.getId()), any());
// Don't assert on what wasn't called unless it's a security/compliance requirement

Mocking Third-Party Libraries Unnecessarily

When you mock java.util.List or moment.js or lodash, you're not testing anything real. Mock the boundary between your code and the world, not the utilities your code uses internally.

# ❌ Mocking datetime directly is fragile and unnecessary
with patch('datetime.datetime.now') as mock_now:
    mock_now.return_value = datetime(2024, 1, 15, 10, 0, 0)
    # ...

# ✅ Inject a clock abstraction - testable and flexible
class SystemClock:
    def now(self): return datetime.now()

class TestClock:
    def __init__(self, fixed_time): self.fixed_time = fixed_time
    def now(self): return self.fixed_time

# Pass clock into your service - mock the abstraction you own
service = AuditService(clock=TestClock(datetime(2024, 1, 15)))

Designing Better Tests: The Architecture of Reliable Suites

The Testing Pyramid Revisited

Mocking strategy should align with where in the pyramid a test lives. Unit tests mock aggressively (all external dependencies). Integration tests mock sparingly (only truly external systems). End-to-end tests mock minimally or not at all.

Misaligning these levels is how teams end up with 90% mocked "integration tests" that provide no real integration confidence.

Making Tests Intention-Revealing

A test should read like a specification. The mock setup should be invisible to the reader of the intent.

// MockK (Kotlin) - intention-revealing structure
@Test
fun `new premium customers receive welcome discount`() {
    // Arrange: focus on business context, not technical setup
    val customer = aCustomer(tier = PREMIUM, isNew = true)
    every { customerRepo.find(customer.id) } returns customer

    // Act: single business operation
    val discount = discountService.calculateWelcomeDiscount(customer.id)

    // Assert: business outcome
    assertThat(discount.percentage).isEqualTo(20)
    assertThat(discount.expiresIn).isEqualTo(Duration.ofDays(30))
}

// Builder helper keeps test intent clear
fun aCustomer(tier: Tier = STANDARD, isNew: Boolean = false) =
    Customer(id = UUID.randomUUID(), tier = tier, registeredAt = 
        if (isNew) Instant.now() else Instant.now().minus(Duration.ofDays(365)))

Reducing Test Coupling with Object Mothers and Builders

When ten tests all create the same Order object with 15 fields, one change to Order breaks all ten. Object Mothers and Test Builders centralize fixture creation and absorb these changes.

// TypeScript - test builder pattern
class OrderBuilder {
  private order: Partial<Order> = {
    id: 'order-1',
    status: 'pending',
    items: [],
    total: new Money(0, 'USD'),
    createdAt: new Date()
  };

  withTotal(amount: number, currency = 'USD') {
    this.order.total = new Money(amount, currency);
    return this;
  }

  withItem(product: Product, quantity: number) {
    this.order.items = [...(this.order.items || []), { product, quantity }];
    return this;
  }

  build(): Order {
    return this.order as Order;
  }
}

// Tests become readable and stable
const order = new OrderBuilder()
  .withTotal(9999, 'USD')
  .withItem(laptopProduct, 1)
  .build();

Framework-Specific Patterns

Mockito (Java)

Mockito's ArgumentCaptor is invaluable for verifying complex objects passed to mocks:

@Test
void capturesCorrectAuditEvent() {
    ArgumentCaptor<AuditEvent> captor = ArgumentCaptor.forClass(AuditEvent.class);

    service.deleteUser(userId, adminId);

    verify(auditLogger).log(captor.capture());
    AuditEvent event = captor.getValue();
    assertThat(event.action()).isEqualTo("USER_DELETED");
    assertThat(event.performedBy()).isEqualTo(adminId);
    assertThat(event.timestamp()).isCloseTo(Instant.now(), within(1, SECONDS));
}

Jest (JavaScript/TypeScript)

Jest's jest.fn() with implementations is ideal for callbacks and higher-order function testing:

test('retries failed requests up to 3 times', async () => {
  let callCount = 0;
  const unstableApi = jest.fn().mockImplementation(async () => {
    callCount++;
    if (callCount < 3) throw new NetworkError('Connection refused');
    return { data: 'success' };
  });

  const result = await withRetry(unstableApi, { maxAttempts: 3 });

  expect(unstableApi).toHaveBeenCalledTimes(3);
  expect(result.data).toBe('success');
});

unittest.mock (Python)

Python's patch.object is cleaner than patch strings when you have direct access to the class:

from unittest.mock import patch, call

def test_sends_sms_to_all_emergency_contacts():
    patient = Patient(
        id=1,
        emergency_contacts=["+1-555-0100", "+1-555-0200"]
    )

    with patch.object(SmsGateway, 'send') as mock_send:
        alert_service.send_critical_alert(patient)

        mock_send.assert_has_calls([
            call("+1-555-0100", message=ANY),
            call("+1-555-0200", message=ANY)
        ], any_order=False)  # order matters - primary contact first

Sinon (JavaScript)

Sinon's sandbox pattern keeps test state isolated and teardown automatic:

const sinon = require('sinon');

describe('InventoryService', () => {
  let sandbox;

  beforeEach(() => { sandbox = sinon.createSandbox(); });
  afterEach(() => { sandbox.restore(); });

  it('reserves stock atomically', async () => {
    const dbStub = sandbox.stub(db, 'transaction').callsFake(async (fn) => fn(db));
    const lockStub = sandbox.stub(db, 'acquireLock').resolves(true);

    await inventory.reserve(productId, quantity);

    sinon.assert.calledBefore(lockStub, dbStub);
  });
});

MockK (Kotlin)

MockK's coEvery and coVerify are built for Kotlin coroutines:

@Test
fun `processes queue messages concurrently`() = runBlocking {
    coEvery { messageProcessor.process(any()) } coAnswers {
        delay(10) // simulate async processing
        ProcessResult.SUCCESS
    }

    val messages = (1..10).map { Message(id = it, payload = "data-$it") }
    queueConsumer.processAll(messages)

    coVerify(exactly = 10) { messageProcessor.process(any()) }
}

Real-World Scenarios

Payment Gateway Integration

Payment testing is where brittle mocks cause real damage - a mock that doesn't reflect actual Stripe error codes will leave you with production bugs your tests never caught.

# Model the actual Stripe error taxonomy
class MockStripeGateway:
    def __init__(self, scenario='success'):
        self.scenario = scenario
        self.charges = []

    def charge(self, amount, currency, source):
        if self.scenario == 'card_declined':
            raise StripeCardError('card_declined', 'Your card was declined.')
        if self.scenario == 'insufficient_funds':
            raise StripeCardError('insufficient_funds', 'Your card has insufficient funds.')
        if self.scenario == 'network_error':
            raise StripeAPIError('Could not connect to Stripe.')

        charge = {'id': f'ch_{uuid4().hex[:16]}', 'amount': amount, 'status': 'succeeded'}
        self.charges.append(charge)
        return charge

# Test the full error handling matrix
@pytest.mark.parametrize('scenario,expected_error', [
    ('card_declined', 'Payment declined. Please use a different card.'),
    ('insufficient_funds', 'Insufficient funds. Please use a different card.'),
    ('network_error', 'Payment service temporarily unavailable. Please try again.'),
])
def test_payment_error_messages(scenario, expected_error):
    gateway = MockStripeGateway(scenario=scenario)
    service = CheckoutService(gateway)
    result = service.checkout(cart)
    assert result.error_message == expected_error

This approach - modeling the real error taxonomy in your fake - is far more valuable than a simple side_effect=Exception. It's what separates tests that find bugs from tests that just maintain coverage numbers.

Email Service Mocking

For email services, mock at the transport level but test at the content level:

@Test
void passwordResetEmailContainsSecureToken() {
    InMemoryEmailTransport transport = new InMemoryEmailTransport();
    EmailService emailService = new EmailService(transport, templateEngine);

    service.requestPasswordReset("user@example.com");

    SentEmail email = transport.findEmailTo("user@example.com").orElseThrow();
    assertThat(email.subject()).isEqualTo("Reset your password");
    assertThat(email.htmlBody()).contains("https://app.example.com/reset/");
    assertThat(email.htmlBody()).doesNotContain("password"); // never include password in email

    // Extract and validate the reset token format
    String resetUrl = extractResetUrl(email.htmlBody());
    assertThat(resetUrl).matches("https://app\\.example\\.com/reset/[a-f0-9]{64}");
}

Database Abstraction Testing

For repositories, integration tests against a real test database (H2, SQLite, or Testcontainers) are almost always better than heavily mocked unit tests. But when you must test business logic that touches persistence, the in-memory fake pattern excels.

// Integration test with Testcontainers - more valuable than mocking
describe('UserRepository (integration)', () => {
  let container: StartedPostgreSqlContainer;
  let repo: UserRepository;

  beforeAll(async () => {
    container = await new PostgreSqlContainer().start();
    const pool = createPool(container.getConnectionUri());
    await runMigrations(pool);
    repo = new UserRepository(pool);
  });

  it('finds users by tier with pagination', async () => {
    await seedUsers(repo, [
      { tier: 'premium', count: 15 },
      { tier: 'standard', count: 8 }
    ]);

    const result = await repo.findByTier('premium', { page: 1, pageSize: 10 });

    expect(result.items).toHaveLength(10);
    expect(result.total).toBe(15);
    expect(result.items.every(u => u.tier === 'premium')).toBe(true);
  });
});

Real database tests catch query bugs, index issues, and transaction semantics that mocks never will. This is particularly important in database and cloud transformation work where data integrity is non-negotiable.

Queue/Event System Testing

Event-driven systems need tests that verify both publication and consumption contracts:

// MockK - testing event publication
@Test
fun `order cancellation publishes correct domain event`() {
    val eventBus = mockk<EventBus>(relaxed = true)
    val service = OrderService(orderRepo, eventBus)

    service.cancel(orderId, reason = "Customer request")

    val slot = slot<OrderCancelledEvent>()
    verify { eventBus.publish(capture(slot)) }

    with(slot.captured) {
        assertThat(this.orderId).isEqualTo(orderId)
        assertThat(this.reason).isEqualTo("Customer request")
        assertThat(this.cancelledAt).isNotNull()
    }
}

For event consumption, test the handler directly with real event objects - don't mock the event itself:

// Test the handler logic, not the subscription mechanism
test('order.cancelled handler cancels related shipments', async () => {
  const shipmentService = { cancelShipment: jest.fn().mockResolvedValue(undefined) };
  const handler = new OrderCancelledHandler(shipmentService);

  // Use a real event object - test the handler contract
  await handler.handle(new OrderCancelledEvent({
    orderId: 'ord-123',
    shipmentId: 'ship-456',
    cancelledAt: new Date()
  }));

  expect(shipmentService.cancelShipment).toHaveBeenCalledWith('ship-456');
});

Microservice Communication

For microservice architectures, WireMock and similar HTTP-level stubs let you test the full serialization/deserialization pipeline:

// WireMock - test HTTP client behavior at the protocol level
@WireMockTest
class InventoryClientTest {

    @Test
    void retriesOn503WithBackoff(WireMockRuntimeInfo wmRuntimeInfo) {
        stubFor(get("/inventory/sku-123")
            .inScenario("retry")
            .whenScenarioStateIs(STARTED)
            .willReturn(serviceUnavailable())
            .willSetStateTo("first-retry"));

        stubFor(get("/inventory/sku-123")
            .inScenario("retry")
            .whenScenarioStateIs("first-retry")
            .willReturn(ok().withBody("""
                {"sku": "sku-123", "quantity": 42, "available": true}
            """)));

        InventoryClient client = new InventoryClient(wmRuntimeInfo.getHttpBaseUrl());
        InventoryStatus status = client.getStatus("sku-123");

        assertThat(status.quantity()).isEqualTo(42);
        verify(2, getRequestedFor(urlEqualTo("/inventory/sku-123")));
    }
}

This catches serialization mismatches, retry logic bugs, and timeout handling, none of which SDK-level mocking can surface.

Conclusion: The Principles That Outlast Frameworks

Mocking frameworks change. Mockito gets updated. Jest gets rewritten. The underlying principles don't.

When mocks are powerful:

When testing code that coordinates between subsystems and the coordination is the behavior
When external dependencies have real latency, costs, or side effects (payment processors, email providers, SMS gateways)
When you need to test error paths that would be difficult to reproduce with real infrastructure
When you're specifying a contract between services and need that contract enforced

When mocks become dangerous:

When they replace real integration tests entirely, giving false confidence in distributed system behavior
When they're coupled to implementation details that change frequently
When the mock no longer reflects the real dependency's actual behavior
When you're using them to paper over poor design instead of fixing the underlying coupling

The principles that sustain healthy test architectures:

Test behavior, not implementation. Write tests that describe what your system does, not how it does it. If a refactoring improves the code but breaks five tests, those tests are measuring the wrong thing.

Mock the boundary, not the interior. Your mock seams should align with your system boundaries, the edge of your process, the network interface, the file system, the clock. Mocking within your own code is usually a sign that the code needs to be restructured, not that it needs a better mock.

Make mocks honest. Use contract tests, realistic fake implementations, and careful attention to the actual behavior of the things you're replacing. A mock that doesn't reflect reality is worse than no mock, it's active misinformation.

Balance isolation with realism. Pure unit tests with total isolation are fast and precise. Integration tests with real infrastructure are slow but trustworthy. A healthy test suite needs both, at the right ratio, for the right purposes. This balance is at the heart of software quality assurance in production-grade systems.

The goal of all of this the test doubles, the behavior verification, the anti-pattern avoidance, is a test suite that gives you genuine confidence to ship. One that catches real bugs, not phantom ones. One that stays green when you refactor and turns red when you break something. That's what the craft is for.