CQRS in NestJS: When It's Worth the Complexity

#ddd #node #cqrs #nestjs

The person most responsible for the industry-standard explanation of CQRS is also the person telling you not to use it. Martin Fowler's bliki entry on CQRS, the page half of the backend world links to when they explain the pattern, says plainly: "you should be very cautious about using CQRS." Not a hedge. Not a "consider the tradeoffs." A direct warning, from the guy whose writeup made the term mainstream.

That's not a contradiction. It's the actual state of the pattern: genuinely useful in a narrow set of cases, and a productivity tax everywhere else. The hard part isn't understanding CQRS. It's telling which side of that line your service is on.

What CQRS Separates

CQRS traces back further than most people assume. Bertrand Meyer's Command-Query Separation principle, from his work on Eiffel, said a method should either change state or return data, never both, at the level of a single function. CQRS takes that idea and moves it up a layer: instead of one function doing one thing, it's your write model and your read model that split. Udi Dahan was applying this to service-oriented systems as early as 2008, formalizing it as "Clarified CQRS" in 2009. Greg Young popularized the term and the modern shape of the pattern around 2010, usually alongside domain-driven design.

The core move: commands change state and return nothing meaningful. Queries return data and change nothing. Once you accept that split, you're free to give commands and queries completely different models, different validation rules, even different storage, because nothing has to reconcile a single "the truth" schema that serves both jobs at once.

The Read/Write Skew That Justifies It

Fowler names two legitimate reasons to reach for CQRS: a domain complex enough that a unified model would need to compromise for both directions, or a real performance requirement to scale reads and writes independently. Everything else is optional complexity dressed up as architecture.

The skew that matters isn't "we have more reads than writes." Almost every system does. It's a structural mismatch: your write side needs rich validation, business rules, and multi-step invariants, while your read side needs to answer questions your write model was never shaped to answer efficiently. A booking system that validates availability, conflicts, and cancellation policy on write, but needs to answer "show me every booking across three time zones grouped by room" on read, has structural skew. A blog with a posts table and a title/body/published_at doesn't, no matter how many more reads than writes it serves.

If your honest answer to "why can't the same model serve both" is "it could, we just don't want to write two queries," that's not skew. That's an excuse.

What the NestJS CQRS Module Gives You

@nestjs/cqrs gives you three buses: CommandBus, QueryBus, and EventBus. All three are built on RxJS Observables, so you can subscribe to the whole stream of commands, queries, or events flowing through your application, not just the individual handler results.

src/scheduling/commands/create-swap-request.handler.ts

@CommandHandler(CreateSwapRequestCommand)
export class CreateSwapRequestHandler
  implements ICommandHandler<CreateSwapRequestCommand>
{
  constructor(private readonly shifts: ShiftRepository) {}

  async execute(command: CreateSwapRequestCommand): Promise<string> {
    const shift = await this.shifts.findById(command.shiftId);
    shift.requestSwap(command.requestedBy, command.reason);
    await this.shifts.save(shift);
    return shift.id;
  }
}

The command handler owns validation and business rules. It doesn't return the shift, a DTO, or anything the frontend would render, just enough to confirm the write happened. A matching query handler pulls from whatever shape is fastest to read, which does not have to be the same tables the command touched.

One detail that catches people off guard: CommandBus, QueryBus, and EventBus are singletons, so combining them with NestJS's request-scoped providers takes extra care. The library handles this by spinning up a new instance of a request-scoped handler for each command, query, or event it processes, rather than sharing one instance across requests. It works, but it's not obvious from the decorator syntax that anything special is happening underneath.

Event handlers run asynchronously and are expected to handle their own exceptions. An event handler that throws doesn't crash the request that published the event, it gets caught, wrapped into an UnhandledExceptionInfo, and pushed onto a separate UnhandledExceptionBus stream.

If nothing is listening to that stream, the failure disappears silently. First time I saw this, I assumed a failing event handler would at least log something by default. It doesn't. You have to wire that up yourself.

CQRS Is Not Event Sourcing

These two get bundled together constantly, partly because they show up in the same tutorials and partly because event sourcing naturally produces a write side and a read side that already look like CQRS. But they're independent decisions.

Event sourcing means your source of truth is an append-only log of events, and current state is a projection computed by replaying them. CQRS means your read and write models are separate. You can event-source without CQRS (replay events into the same model you write through). You can do CQRS without event sourcing (a normal relational write model, plus a denormalized read table kept in sync by a projector). NestJS's CQRS module happens to ship with AggregateRoot and event-emitting building blocks that make it easy to slide into event sourcing, but nothing about CommandBus and QueryBus requires it.

Conflating the two is how projects end up with the operational cost of an event store (replay logic, event versioning, snapshotting) when all they actually needed was a read-optimized table.

The Failure Mode: Applying CQRS to Everything

The most common way teams burn the investment is treating CQRS as a system-wide default instead of a targeted tool. Fowler's own framing: many systems fit a CRUD mental model just fine, and forcing CQRS onto them is "a significant mental leap for all concerned" with no matching payoff.

The failure compounds in a microservices context. If you don't identify your aggregate boundaries correctly first, and then apply CQRS on top of that mistake, you end up splitting a single aggregate's data across services, because the read side "needs" a view that spans what should have been one consistency boundary. That's not CQRS creating the mess, it's CQRS making an existing modeling mistake more expensive to carry.

The rule that scales: apply CQRS to the specific bounded contexts where the skew is real, never to a whole application by default. A platform can have three services where CQRS earns its keep and twelve where a repository and a DTO are all anyone needed.

The Eventual Consistency Tax You're Paying

Splitting the models means the read side can lag the write side. How much depends entirely on your projection mechanism, synchronous in the same transaction, asynchronous via a message queue, or somewhere in between, but the lag is never exactly zero once the models are meaningfully separate.

The concrete version of this problem: a user places an order, and it can take anywhere from a couple of seconds to a couple of minutes before that order shows up in their order history, depending on how the projection pipeline is built. That's not a bug. It's the actual cost of the pattern, and it has to be a product decision, not something the write side discovers by accident when a support ticket comes in asking why a booking "disappeared."

If your UI can't tolerate that lag anywhere in the flow, either the projection needs to be synchronous for that specific path, defeating some of the scaling benefit, or CQRS is the wrong tool for that particular read.

Where I'd Reach for This

On the healthcare platform I work on now, the scheduling side has exactly this shape: writing a shift change means checking coverage rules, skill-level requirements, and conflict windows across a rotation, real domain complexity on the command side, while the read side mostly needs to answer "show me the week" in a dozen different groupings (by provider, by department, by shift type) fast enough for a calendar UI that gets hit constantly. That's read/write skew, not a preference.

Credentialing, on the same platform, mostly isn't. A verification record has a status, an expiration date, and an audit trail. The queries against it are close enough to the write shape that splitting the models would add a synchronization problem without buying anything back. One service, two very different answers to "should this use CQRS," and the difference is the skew, not the domain's importance.

That's the actual decision procedure: not "is this domain important" or "are we doing microservices," but "would the read model and the write model genuinely want to be different shapes, and is the eventual-consistency cost something the product can absorb." If both answers are yes, the complexity Fowler warns about is the complexity you're supposed to be paying for.

Originally published at andriiboyko.com.