Jay Freestone

Posted on May 23 • Originally published at jayfreestone.com

You might not need… the repository pattern

#architecture #backend #database

This post is mostly about CRUD-heavy backend applications in TypeScript, Go, and Rust, especially ones using modern typed query builders or lightweight ORMs. I’m not arguing that repositories are useless. I’m arguing that unless a repository protects a real aggregate boundary or hides genuinely meaningful persistence complexity, it usually becomes a worse interface over your database.

The repository pattern originates from Patterns of Enterprise Application Architecture, with Domain-Driven Design and 'layered' (i.e. hexagonal/clean/onion etc) architecture expanding on it from there.

There are legitimate reasons it gained in popularity:

It provides a clean way to separate IO from business logic.
Testing business logic becomes a lot easier/faster, since there's a clean seam to swap in a test-double and keep everything in-memory.
It theoretically makes it easier to switch out the backing store if you change database/provider.
It fits well into the OOP world of Java, C# and friends, where ORMs map to entities.

What is a repository?

Let's define the strict, traditional version of a repository:

A repository operates on aggregate roots. These are the invariant boundaries of your domain model. In traditional DDD, only one aggregate should be committed per operation, with everything else becoming eventually consistent (which helps to minimize locking). The C# documentation suggests relaxing this if strong consistency is important.
A repository returns aggregates, fully hydrated. Vernon is explicit on this: repositories are not Data-Access-Objects. He spends an entire section distinguishing them. A DAO is expressed in terms of database tables and provides CRUD over them, while a repository operates on aggregates.
Since a repository is (theoretically) persistence-ignorant, it should in no way orchestrate a 'unit of work', i.e. a transaction boundary. If you're following the one-aggregate-per-operation rule, this becomes a lot easier. If not, you probably need to cheat and pass an open transaction through ALS or some kind of thread-local storage equivalent. Nest CLS is a great example of this working really well. If you're using Go, I guess you can pretend you're not funneling it through the ctx grab-bag on every method.
The querying interface depends on context. Evans permits dedicated query methods (findCancelledOrders), and suggests specifications when things become unwieldy. Vernon's strictest version of a repository is just add, save, fromId, but this is only practical when splitting reads from writes (CQRS).

Vernon also distinguishes between persistence-style repositories (where you have to explicitly 'save' or update objects) and collection-oriented ones (which auto-track dirty objects). In TS/Go/Rust you're probably implementing a persistence-style one.

Repositories in the wild

If you follow a strict domain model, and propagate changes to other aggregates through events, you can adhere to the above criteria. You're probably writing Java, C#, or JS with Nest (which really, really wants to be Java).

Your repositories might look like this:

interface SupplierRepository {
  save(supplier: Supplier): Promise<void>
  getById(id: string): Promise<Supplier | null>
}

The problem is that most applications of the repository pattern in the wild aren't this.

They're this:

interface SupplierRepository {
  create(supplierData: SupplierPojo): Promise<void>
  update(supplierData: SupplierPojo): Promise<void>
  publish(id: string): Promise<void>
  list(criteria: Conditions, pagination: Pagination): Promise<Supplier[]>
  get(id: string): Promise<Supplier | null>
  getWithProduct(id: string): Promise<{ supplier: Supplier; product: Product } | null>
  findActiveById(id: string): Promise<Supplier | null>
  // etc.
}

In fact, I've seen all kinds:

// I'm not making it up, I have genuinely seen this stuff. 
interface SupplierRepository {
  // Leak of tx!
  create(tx: Transaction, supplier: Supplier): Promise<void>
  // Leak of tx... and no longer concerned with an aggregate.
  createLinkToSupplier(tx: Transaction, id: string, supplierId: string): Promise<SupplierLink>
  // This one doesn't take a tx, because it's a convenience
  // method which also calls the analytics service (and then
  // writes the data).
  createLinkToSupplierCommitAndEmitEvent(id: string): Promise<SupplierLink>
}

This is the cursed offspring of a repository and a DAO wearing DDD clothing. The explicit transactions, methods that aren't aggregate-scoped, and side-effect ridden convenience helpers are exactly what Vernon warns against. If your 'repository' needs to take a Transaction parameter, you've lost your abstraction.

The interface bloat (findActiveById, getWithProduct, list(criteria, pagination)) usually means you've conflated commands (which legitimately want aggregate-shaped objects) with queries (which want view-appropriate projections). The textbook answer here is CQRS: split the repository in two, with the write side handling aggregates and a separate query model handling reads.

But CQRS only solves part of the problem. Even on the write side, you'll have legitimate criteria queries: 'find pending orders to cancel', 'users with outstanding invoices to remind' etc. These aren't display queries, they're locating aggregates that need business logic run against them. Even if you adopt CQRS you'll likely end up with extra criteria-finding methods on the write-side repo.

If any bit of this sounds/looks familiar, I'm here to tell you that you don't need a repository (or, you don't have one) and that is totally ok.

You might not have a domain model

Most of us writing Rust, Go and TypeScript are not really writing 'object-oriented' software in the traditional sense.

While many of us dream of the beautiful ubiquitous language from the blue book, most of us don't really have a true domain model in code. We may have a shared language between product, UXD and engineering, but when the chips are down it's essentially just data. Data we rip out, transform, and put back into place. It doesn't have much of an in-memory lifecycle.

As Casey Muratori says, OOP makes more sense when something has a real lifecycle, like a server. It's a thing, it's not just data briefly masquerading as an entity in memory before it's re-serialized.

Are you enforcing invariants at the aggregate root level? Or even creating aggregates instead of POJOs?

class Order {
  // Constructor etc...

  cancel() {
    if (this.shipping.hasBeenShipped()) {
      throw new HasShippedError()
    }

    this.canceledAt = new Date()
  }

  static hydrate(order: OrderData): Order {
    // Bypass the constructor
    const instance = Object.create(Order.prototype);
    Object.assign(instance, order);
    return instance
  }
}

Probably not, and again, that's ok!

The interesting cases are invariants that can't sit inside an aggregate at all. The traditional example is the unique-email case: throw new EmailAlreadyRegisteredToUser(). It can't be checked from inside the User aggregate, because the rule spans all users. DDD has a specific term for this: set-based invariants, and there's no good solution. You either push it into a service, enforce it at the persistence layer, or accept eventual consistency and handle the error later.

The point is that aggregate-as-invariant-boundary doesn't extend to relationships between aggregates, and most of the invariants people actually care about in enterprise OLTP applications are exactly this set-based kind.

You might be reinventing the ORM

Many repositories I see end up reinventing the modern ORM.

The ORM as it's referred to in classic programming books (e.g. Hibernate and friends) is a very different beast from today's ORM. Modern ORMs like Drizzle (and to some extent Prisma) are more akin to typed query builders.

They don't map to entities, they give you a freeform typed canvas to build queries from. What you do with that is up to you. It's beautiful:

const recentOrders = await db
  .select({
    orderId: orders.id,
    placedAt: orders.placedAt,
    customerName: customers.name,
    itemCount: sql<number>`count(${orderItems.id})`,
  })
  .from(orders)
  .innerJoin(customers, eq(orders.customerId, customers.id))
  .leftJoin(orderItems, eq(orderItems.orderId, orders.id))
  .where(and(
    eq(orders.status, 'completed'),
    gt(orders.placedAt, thirtyDaysAgo),
  ))
  .groupBy(orders.id, customers.name)
  .orderBy(desc(orders.placedAt))
  .limit(20);

This leads to small, targeted queries and performant, targeted updates.

If you're writing a repository method which has filtering, pagination, or god forbid a specification, you're probably just reinventing a worse version of the syntax your ORM provides you:

interface FindOrdersOptions {
  customerId?: string;
  status?: OrderStatus | OrderStatus[];
  placedAfter?: Date;
  placedBefore?: Date;
  minTotal?: number;
  includeCanceled?: boolean;
  // Do we use TS trickery here to strengthen the return type when this is `true`?
  includeItems?: boolean;
  sortBy?: 'placedAt' | 'total' | 'customerName';
  sortDirection?: 'asc' | 'desc';
  limit?: number;
  offset?: number;
}

interface OrderRepository {
  find(opts: FindOrdersOptions): Promise<Order[]>
}

Even if you exercise discipline, you're probably over-fetching data just for the sake of working with a discrete 'entity'.

If you do have a domain model...

If you do have a traditional domain model, a desire for strong consistency leaves you with a lot of little repositories which need to be coordinated (losing the aggregate-root-as-invariant idea):

class PlaceOrderCommand {
  constructor(
    private orderRepository: OrderRepository,
    private inventoryRepository: InventoryRepository,
  ) {}

  @UnitOfWork()
  async execute() {
    // Business logic...
    await this.orderRepository.save(order);
    await this.inventoryRepository.save(inventoryItem)
  }
}

You'll also either have to do this:

const order = await orderRepository.getWithInventory('123');

Or this:

const order = await orderRepository.getById('123');
const inventory = await inventoryRepository.getByOrder('123');

Or you'll have to forever fetch inventory alongside an order. In which case, is Order the unit of consistency (aggregate root) for Inventory? Maybe?

If you have a language which supports lazy loading (masquerading as sync property calls) inside a transactional boundary, then you can kind of fake fetching sub-entities since order.inventory can arrive late.

For many languages though, you can't (async/await sneaks in). Even if you do, and even though I admire Vlad, I feel like this really breaks the 'clean' and IO-less domain model concept.

You can't just 'swap' your database

The few times when I have had to swap the persistence layer, it has never been as clean as swapping out the guts of the repository. Persistence layers have drastically different characteristics, such as:

Transaction handling.
Performance.
Key constraints (or lack thereof).

As Mike Acton says, if your data changes then your entire problem changes. I love the idealistic view of carving up the problem domain, but realistically unless you're swapping MySQL for Postgres, it is never this straightforward.

If the move changes joins, transactional guarantees, indexing strategies, consistency, latency, or bulk-access patterns, then good luck.

Perhaps if you model tiny aggregates (as recommended) and accept eventual consistency, then this is more feasible. But I have to say, as someone who had to rip out DynamoDB in favor of Postgres, no level of abstraction or layered architecture is going to save you.

If you don't follow the grain of the persistence layer in some way, then even if you get it to work, you will be doing it the wrong way. Follow the grain.

You should be running tests against your real database

One reason people keep repositories around is testability: 'we can stub the repository and keep the unit tests fast'. This has aged badly.

I'm a huge fan of integration tests, even if their definition is nebulous.

Modern DBs are fast enough to run your real test suite against them. I'm not going to rehash the argument here, but running against an in-memory hashmap provides zero confidence anything works in your CRUD app. Your entire app is making sure you extract, transform and store back the right data. Have something more complex? It goes in a unit test and shouldn't be I/O-bound anyway.

This is even more compelling thanks to things like PGLite, but just spinning up Postgres in a container is more than fast enough today, and provides a huge amount of confidence in the code you've written.

So, do I need a repository?

Maybe, but make sure you're actually getting value out of it.

You probably don't have invariant-enforcing aggregates in the DDD sense, and you probably don't have a Unit of Work sitting above your repos.

You can still do DI if you want, and you can (and should) extract data-layer helpers. But the repository pattern quickly degenerates into a thin and leaky wrapper unless you really commit to it.

You might argue that it doesn’t matter whether you call it a repository, or whether it fits some formal definition, as long as it’s useful. Fair enough. But the abstraction is pointless unless it protects a real domain boundary, improves testability in a way your integration tests don't, or hides meaningful persistence complexity.

As Evans says:

In general, don't fight your frameworks. Seek ways to keep the fundamentals of domain-driven design and let go of the specifics when the framework is antagonistic. Look for affinities between the concepts of domain-driven design and the concepts in the framework.

Domain Driven Design, Eric Evans

If your language, framework or setup doesn't fit the pattern, don't adopt it.

DEV Community