Gabriel Anhaia

Posted on Apr 28

I Replaced 2,000 Lines of Go Mocks With 200 Lines of Fakes

#go #testing #architecture #programming

Book: Hexagonal Architecture in Go
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A team I work with had a Go service with roughly 90 test files. Good coverage numbers. Green CI. The kind of metrics that make engineering managers smile in sprint reviews.

Then they changed the signature of one repository method. Added a context.Context parameter. The diff touched 6 lines of production code and hundreds of lines of mock setup. Dozens of tests broke. Not because the behavior was wrong. Because the mock expectations no longer matched the exact call sequence.

They spent a full afternoon updating .EXPECT() chains, .Return() values, and argument matchers. By the end, every test was green again. Nothing about the behavior of the system had changed. The tests were testing the same thing they tested before. The mocks needed to be told about it, in excruciating detail.

When your test suite is a mirror of your implementation rather than a specification of your behavior, this is what you get.

The Mock Tax

Tools like mockgen and mockery generate mock implementations of your interfaces. You get .EXPECT(), .Times(), .Return(), .InOrder(). You can specify exactly which methods get called, with which arguments, in which sequence, returning which values.

The problem is that you will specify all of that. And your tests will break when any of it changes, even when the behavior stays the same.

A typical test using gomock for an order service looks like this:

func TestPlaceOrder_WithMocks(t *testing.T) {
    ctrl := gomock.NewController(t)
    defer ctrl.Finish()

    mockUsers := NewMockUserRepository(ctrl)
    mockOrders := NewMockOrderRepository(ctrl)
    mockNotifier := NewMockNotifier(ctrl)

    mockUsers.EXPECT().
        FindByID(gomock.Any(), "user-1").
        Return(&User{
            ID: "user-1", Name: "Alice",
        }, nil).
        Times(1)

Every dependency gets its own .EXPECT() chain specifying the exact call, arguments, return value, and call count:

    mockOrders.EXPECT().
        NextID(gomock.Any()).
        Return("ord-99", nil).
        Times(1)

    mockOrders.EXPECT().
        Save(gomock.Any(), gomock.Any()).
        DoAndReturn(
            func(
                _ context.Context,
                o Order,
            ) error {
                if o.UserID != "user-1" {
                    t.Errorf(
                        "user = %s, want user-1",
                        o.UserID,
                    )
                }
                return nil
            }).
        Times(1)

    mockNotifier.EXPECT().
        OrderPlaced(gomock.Any(), gomock.Any()).
        Return(nil).
        Times(1)

After all that ceremony, the actual test is five lines:

    svc := NewOrderService(
        mockUsers, mockOrders, mockNotifier,
    )

    _, err := svc.PlaceOrder(
        context.Background(),
        "user-1",
        []Item{{ProductID: "p1", Qty: 2}},
    )
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }
}

Count the lines. Around 50, and you are testing one happy path. The test specifies which methods get called, how many times, and in what order. Change the internal implementation so that NextID is called before FindByID instead of after? Test fails. Add a logging call inside PlaceOrder? If the logger is an interface, you need another mock expectation or the test fails.

The test is not describing what PlaceOrder should do. It is describing how PlaceOrder does it, step by step.

The Fake Alternative

A fake is a working, in-memory implementation of your port interface. It stores real data. It enforces real constraints. It skips the infrastructure.

type FakeOrderRepository struct {
    orders map[string]Order
    nextID int
    mu     sync.Mutex
}

func NewFakeOrderRepo() *FakeOrderRepository {
    return &FakeOrderRepository{
        orders: make(map[string]Order),
    }
}

Each method mirrors the real contract with in-memory storage:

func (r *FakeOrderRepository) NextID(
    _ context.Context,
) (string, error) {
    r.mu.Lock()
    defer r.mu.Unlock()
    r.nextID++
    return fmt.Sprintf("ord-%d", r.nextID), nil
}

func (r *FakeOrderRepository) Save(
    _ context.Context, o Order,
) error {
    r.mu.Lock()
    defer r.mu.Unlock()
    if o.ID == "" {
        return errors.New("order ID required")
    }
    r.orders[o.ID] = o
    return nil
}

func (r *FakeOrderRepository) FindByID(
    _ context.Context, id string,
) (Order, error) {
    r.mu.Lock()
    defer r.mu.Unlock()
    o, ok := r.orders[id]
    if !ok {
        return Order{}, ErrOrderNotFound
    }
    return o, nil
}

Forty lines, and it covers everything. It handles ID generation, enforces a non-empty ID constraint, and returns a real ErrOrderNotFound when the order does not exist.

Now compare the same test:

func TestPlaceOrder_WithFakes(t *testing.T) {
    users := NewFakeUserRepo()
    users.Add(User{ID: "user-1", Name: "Alice"})

    orders := NewFakeOrderRepo()
    notifier := &SpyNotifier{}

    svc := NewOrderService(users, orders, notifier)

    got, err := svc.PlaceOrder(
        context.Background(),
        "user-1",
        []Item{{ProductID: "p1", Qty: 2}},
    )
    if err != nil {
        t.Fatalf("unexpected error: %v", err)
    }

Now verify the outcome, not the journey:

    stored, err := orders.FindByID(
        context.Background(), got.ID,
    )
    if err != nil {
        t.Fatalf("order not persisted: %v", err)
    }
    if stored.UserID != "user-1" {
        t.Errorf(
            "user = %s, want user-1",
            stored.UserID,
        )
    }
    if len(notifier.Placed) != 1 {
        t.Errorf(
            "notifications = %d, want 1",
            len(notifier.Placed),
        )
    }
}

Around 30 lines. But the difference is not the count alone. Read what this test says: place an order for user-1, then verify the order is stored with the right user, and that one notification was sent. It says nothing about call order or how many times FindByID ran internally. Argument matchers do not appear.

If you refactor PlaceOrder to call NextID first, or to do two reads instead of one, or to cache the user lookup, this test does not break. It only breaks when the behavior changes.

What Mocks Catch That Fakes Miss

Nothing, in practice. The argument for mocks is that they verify interaction protocols — that your code calls dependencies in the expected sequence. The theory is that this catches bugs where the right outcome happens for the wrong reason.

In reality, the interaction protocol changes every time you refactor, and the "right sequence" is rarely part of your actual contract. Your order service's contract is: given a valid user and items, persist an order and send a notification. Whether it reads the user before or after generating the ID is an implementation detail that your callers and your users do not care about.

If you genuinely need to verify call ordering — say, you must acquire a lock before writing — that constraint belongs in the fake or in the production code itself, not in a test's .InOrder() chain.

What Fakes Catch That Mocks Miss

Fakes exercise the contract. Because they store and return real data, they catch a class of bugs that mocks silently pass.

Consider a test where PlaceOrder saves an order and then immediately reads it back for confirmation. With mocks, you return a canned response from FindByID — whatever you hardcoded in the .Return(). The test passes even if Save was never called, or if Save stored the order under the wrong ID. The mock does not care. You told it what to return, and it returned it.

With a fake, FindByID reads from the same map that Save wrote to. If Save is broken, FindByID returns ErrOrderNotFound. If Save stores the order under the wrong key, FindByID cannot find it. The fake catches real integration bugs within the unit test, at zero infrastructure cost.

The Line Count

The numbers break down like this. A team with a service layer behind three port interfaces (repository, notifier, external API client), each with 3-5 methods, ends up with:

Generated mocks: Each interface produces 100-200 lines of generated code (the MockXxx struct, the EXPECT() recorder, per-method matchers). Across three interfaces, that is 300-600 lines of generated mock code. Then each test function adds 20-40 lines of .EXPECT() setup. With 40-50 test functions, mock setup alone accounts for 800-2,000 lines.

Hand-written fakes: Each interface gets a fake of 20-50 lines (struct, constructor, method implementations with in-memory storage). Across three interfaces, that is 60-150 lines. A spy for the notifier adds another 10-15 lines. Total fake infrastructure: roughly 100-200 lines.

The fakes are reusable across every test -- no per-test setup ceremony, no regeneration step in CI, no extra code-gen tool pinned in your go.mod.

Fakes Belong Next to the Port

Put the fake in the same package as the port interface, in a _test.go file or a testing.go file with a build tag:

internal/
  order/
    port.go           // OrderRepository interface
    service.go        // business logic
    service_test.go   // tests using the fake
    fake_repo_test.go // FakeOrderRepository

The fake is a first-class test artifact. When the port interface changes, the fake fails to compile. You fix it in one place, and every test that uses it keeps working. Compare that to mocks: when the interface changes, you regenerate, and then every test that set up .EXPECT() for the old signature needs a manual update.

When Mocks Still Make Sense

There are two cases where generated mocks earn their weight.

Third-party interfaces you do not own. If you are writing an adapter for cloud.google.com/go/storage and need to test error handling for ObjectHandle.NewReader, you cannot write a fake ObjectHandle without reimplementing half of GCS. A mock that returns a specific error for a specific call is the pragmatic choice.

Verifying that a method is NOT called. If your service must skip notification when the order amount is below a threshold, a spy (a minimal struct that records calls) is the clearest way to assert "zero calls to OrderPlaced." But notice: a spy is not a generated mock. It is 5 lines of code.

type SpyNotifier struct {
    Placed []Order
}

func (s *SpyNotifier) OrderPlaced(
    _ context.Context, o Order,
) error {
    s.Placed = append(s.Placed, o)
    return nil
}

A spy. It records what happened. It does not prescribe what should happen. Five lines, no framework, no generation step.

The Refactoring Test

Try this on your own codebase. Pick a service method. Refactor its internals without changing the behavior — reorder two independent calls, extract a helper function, cache a lookup. Run the tests.

If tests break, they are testing implementation, not behavior. Mocks make this almost inevitable. Fakes make it almost impossible.

Your test suite should be a safety net for refactoring, not a barrier to it.

If this resonated

The fake-over-mock pattern falls naturally out of hexagonal architecture, where every external dependency hides behind a port interface. Small interfaces, explicit contracts, in-memory fakes for testing, real adapters for production. The architecture makes the testing strategy obvious.

I wrote about this in depth in Hexagonal Architecture in Go — including conformance tests that run the same suite against both the fake and the real adapter, so you know the fake actually behaves like the database it replaces.

If you're writing Go services and the test setup takes longer than the test itself, the problem is not the tests. It is the boundary design. Fix the boundaries, and the mocks disappear.