DEV Community

JF Meyers
JF Meyers

Posted on

Your MediatR Command Just Lost an Email. Here's Why.

You ship a CheckoutCommand. It writes an Order, charges the card, then calls _mediator.Publish(new OrderPlaced(...)) which fires off SendConfirmationEmailHandler and NotifyWarehouseHandler.

A week later, support pings you: a customer was charged, sees the order in their history, but never got a confirmation email. The warehouse has no record either. You dig into the logs.

[18:42:13] INF  Order 4f2a... created
[18:42:14] INF  Payment captured
[18:42:15] ERR  SMTP timeout (System.Net.Sockets.SocketException)
[18:42:15] ERR  Unhandled: OrderPlaced handler failed
[18:42:15] WRN  Transaction rolled back
Enter fullscreen mode Exit fullscreen mode

The SMTP call threw inside the same using var tx = ... block as your DB writes. The transaction rolled back. The payment provider does not care about your transaction — the card was already charged. You now owe the customer either a refund or a manual fix, and the only reason you know is because they emailed you.

This is not a MediatR bug. MediatR did exactly what it promises. The bug is that the pattern you picked doesn't match the job. Let me show you what does.

MediatR is not a mediator for CQRS

Read the MediatR source. It's about 400 lines. What it actually does:

// Pseudocode of MediatR.Send
var handler = serviceProvider.GetRequiredService<IRequestHandler<T, R>>();
return await handler.Handle(request, cancellationToken);
Enter fullscreen mode Exit fullscreen mode

That's the whole value proposition. A dictionary lookup plus a method call, wrapped in a pipeline. In-process. Same thread. Same transaction. Same exception path.

The things you assumed you were getting — the things the term "CQRS bus" implies — MediatR does not do:

  • Persist the command so it survives a pod kill
  • Retry transient failures with backoff
  • Dispatch the side effect after the DB transaction commits
  • Schedule work for later
  • Cross module or service boundaries
  • Dead-letter poison messages
  • Propagate tenant / user / trace context across async hops

Every single one of those is exactly what you need the moment your command does anything other than "write one row and return". SMTP, webhooks, CreateUser → ProvisionTenant, anything cross-module — MediatR shrugs.

So you bolt on Hangfire. Then a BackgroundService that polls a pending_email table you invented. Then a retry wrapper. Six months in, you've reinvented half a message bus, badly, with no tests for the failure paths.

What you actually want is an outbox

The pattern that fixes the email-lost story is the transactional outbox. It has exactly three properties:

  1. Atomic with the data. If the row commits, the side effect runs. If the row rolls back, the side effect doesn't.
  2. Durable. A crash mid-handler doesn't lose messages.
  3. Async from the caller. The HTTP request returns the moment the DB commits. The side effect runs after.

The mechanics are boring and well-known: your handler writes the side-effect message into an outbox table as part of the same SaveChanges call as the business row. A worker reads the outbox after commit and dispatches. A crash means the message is still on disk — it gets picked up on restart.

                   ┌──────────────────────────┐
   HTTP request ──▶│ SaveChanges (one commit) │
                   │  • order row             │
                   │  • outbox row            │
                   └──────────┬───────────────┘
                              │
                              ▼
                   ┌──────────────────────────┐
                   │ Dispatcher (async)       │
                   │  • read outbox           │
                   │  • send email            │
                   │  • DELETE outbox row     │
                   └──────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

You can build this. I've built it. It's ~200 lines of plumbing, one polling loop, a couple of indexes, and a lot of edge cases (what if the dispatcher crashes mid-send? what if the handler is non-idempotent? what about ordering?). Or you can use something that ships it.

Enter Wolverine

Wolverine is an MIT-licensed message bus by Jeremy D. Miller (Marten, StructureMap). In Granit it's wrapped in two packages — the bus itself, and a PostgreSQL transport that doubles as the outbox table.

No RabbitMQ pod. No Kafka. No extra broker in your deployment diagram. The same database that holds your orders also holds your outbox, and the same Postgres connection is the transport.

Here's the handler that replaced the buggy MediatR one above:

public class PlaceOrderHandler
{
    public static IEnumerable<object> Handle(
        PlaceOrderCommand command,
        OrdersDbContext db,
        IClock clock)
    {
        var order = Order.Create(command.CartId, clock.Now);
        db.Orders.Add(order);

        // Local — in-process, same transaction
        yield return new OrderPlacedEvent(order.Id);

        // Distributed — written to the outbox, dispatched post-commit
        yield return new SendOrderConfirmationEto(order.Id, command.Email);
        yield return new NotifyWarehouseEto(order.Id, command.ShippingAddress);
    }
}
Enter fullscreen mode Exit fullscreen mode

Three things worth pausing on:

  • No IRequestHandler<T> interface. Wolverine discovers handlers by convention from any assembly marked with [assembly: WolverineHandlerModule]. One attribute, zero ceremony.
  • yield return is the fan-out mechanism. Every emitted message is written into the outbox as part of the same SaveChanges. Atomic by construction — no if (emailSent && !dbCommitted) bug is reachable.
  • Two suffixes, two guarantees. *Event (in-process, best-effort, dies with the request) vs *Eto (durable integration event, crosses module boundaries, survives crashes). Granit enforces this split with architecture tests so nobody accidentally uses the wrong one.

The HTTP request returns the moment COMMIT lands. Everything past that point runs asynchronously with Wolverine's configured backoff (5s → 30s → 5min, then dead-letter). A pod kill between commit and email send doesn't lose the email — the message is on disk.

The thing nobody mentions: context propagation

A command published at 14:00 from an HTTP request runs in some worker thread at 14:12. The handler still needs to know:

  • Who sent it, so AuditedEntity.CreatedBy is correct
  • Which tenant, so query filters apply
  • Which trace, so the Grafana span tree stays connected

Wolverine stuffs all three into the message envelope (X-Tenant-Id, X-User-Id, traceparent) and rehydrates them into ICurrentTenant, ICurrentUserService, and Activity.Current at handler entry. Your CreatedBy column stays right even when the handler runs 12 hours after the user logged out. Your traces stay connected.

With MediatR, you write this wiring yourself. Every single request type. For every new async path. And you re-test it every time.

When is MediatR still the right call?

This isn't a hit piece. MediatR is excellent at one thing: in-process synchronous dispatch. There are codebases where that's the whole story:

  • Small monoliths where every command is "validate → write → return", no side effects
  • Libraries that want internal CQRS without pushing infrastructure on consumers
  • Teams that genuinely don't need durability and never will

For everything else — anything that sends an email, hits a third party, fans out across modules, or must survive a redeploy — you want a real bus. Granit's Granit.Wolverine module is what that looks like when the framework owns the boilerplate.

Takeaways

  • The customer's missing email wasn't a MediatR bug. It was a pattern mismatch. MediatR is a function dispatcher; you needed an outbox.
  • The outbox pattern is three properties — atomic, durable, async — that together eliminate the "DB committed, side effect didn't" failure mode.
  • Wolverine ships the outbox, retries, DLQ, scheduling and validation as middleware. PostgreSQL is the transport; no broker needed.
  • *Event vs *Eto makes the durability boundary visible in the type name. Architecture tests enforce the split.
  • Tenant, user, and trace context travel with the message. Audit and observability stay correct across async hops.

If you want the deep dive — sequence diagrams, the full comparison against MediatR+MassTransit and hand-rolled outboxes, and the handler visibility rules — the original long-form is on the Granit blog.

Granit is Apache-2.0 open source. Repo, docs and ADRs at granit-fx.dev.

Top comments (0)