Writing

You might not need… the repository pattern

Most codebases implement a cargo-cult ‘repository pattern’ that Evans and Vernon would shudder at. What actually gets shipped is usually a grab-bag collection of methods that provide no real abstraction, and no clear value.

An isometric illustration of orange call paths passing from app code panels through a thin facade panel into database cylinders and tables.

The repository pattern originates from Patterns of Enterprise Application Architecture, with Domain-Driven Design and ‘layered’ (i.e. hexagonal/clean/onion etc) architecture expanding on it from there.

There are legitimate reasons it gained in popularity:

  • It provides a clean way to separate IO from business logic.
  • Testing business logic becomes a lot easier/faster, since there’s a clean seam to swap in a test-double and keep everything in-memory.
  • It theoretically makes it easier to switch out the backing store if you change database/provider.
  • It fits well into the OOP world of Java, C# and friends, where ORMs map to entities.

What is a repository?

Let’s define the strict, traditional version of a repository:

  • A repository operates on aggregate roots. These are the invariant boundaries of your domain model. In traditional DDD, only one aggregate should be committed per operation, with everything else becoming eventually consistent (which helps to minimize locking). The C# documentation suggests relaxing this if strong consistency is important.
  • A repository returns aggregates, fully hydrated. Vernon is explicit on this: repositories are not Data-Access-Objects. He spends an entire section distinguishing them. A DAO is expressed in terms of database tables and provides CRUD over them, while a repository operates on aggregates.
  • Since a repository is (theoretically) persistence-ignorant, it should in no way orchestrate a ‘unit of work’, i.e. a transaction boundary. If you’re following the one-aggregate-per-operation rule, this becomes a lot easier. If not, you probably need to cheat and pass an open transaction through ALS or some kind of thread-local storage equivalent. Nest CLS is a great example of this working really well. If you’re using Go, I guess you can pretend you’re not funneling it through the ctx grab-bag on every method.
  • The querying interface depends on context. Evans permits dedicated query methods (findCancelledOrders), and suggests specifications when things become unwieldy. Vernon’s strictest version of a repository is just add, save, fromId, but this is only practical when splitting reads from writes (CQRS).

Repositories in the wild

If you follow a strict domain model, and propagate changes to other aggregates through events, you can adhere to the above criteria. You’re probably writing Java, C#, or JS with Nest (which really, really wants to be Java).

Your repositories might look like this:

interface SupplierRepository {
  save(supplier: Supplier): Promise<void>
  getById(id: string): Promise<Supplier | null>
}TypeScript

The problem is that most applications of the repository pattern in the wild aren’t this.

They’re this:

interface SupplierRepository {
  create(supplierData: SupplierPojo): Promise<void>
  update(supplierData: SupplierPojo): Promise<void>
  publish(id: string): Promise<void>
  list(criteria: Conditions, pagination: Pagination): Promise<Supplier[]>
  get(id: string): Promise<Supplier | null>
  getWithProduct(id: string): Promise<{ supplier: Supplier; product: Product } | null>
  findActiveById(id: string): Promise<Supplier | null>
  // etc.
}TypeScript

In fact, I’ve seen all kinds:

// I'm not making it up, I have genuinely seen this stuff. 
interface SupplierRepository {
  // Leak of tx!
  create(tx: Transaction, supplier: Supplier): Promise<void>
  // Leak of tx... and no longer concerned with an aggregate.
  createLinkToSupplier(tx: Transaction, id: string, supplierId: string): Promise<SupplierLink>
  // This one doesn't take a tx, because it's a convenience
  // method which also calls the analytics service (and then
  // writes the data).
  createLinkToSupplierCommitAndEmitEvent(id: string): Promise<SupplierLink>
}TypeScript

This is the cursed offspring of a repository and a DAO wearing DDD clothing. The explicit transactions, methods that aren’t aggregate-scoped, and side-effect ridden convenience helpers are exactly what Vernon warns against. If your ‘repository’ needs to take a Transaction parameter, you’ve lost your abstraction.

The interface bloat (findActiveById, getWithProduct, list(criteria, pagination)) usually means you’ve conflated commands (which legitimately want aggregate-shaped objects) with queries (which want view-appropriate projections). The textbook answer here is CQRS: split the repository in two, with the write side handling aggregates and a separate query model handling reads.

But CQRS only solves part of the problem. Even on the write side, you’ll have legitimate criteria queries: ‘find pending orders to cancel’, ‘users with outstanding invoices to remind’ etc. These aren’t display queries, they’re locating aggregates that need business logic run against them. Even if you adopt CQRS you’ll likely end up with extra criteria-finding methods on the write-side repo.

If any bit of this sounds/looks familiar, I’m here to tell you that you don’t need a repository (or, you don’t have one) and that is totally ok.

You might not have a domain model

Most of us writing Rust, Go and TypeScript are not really writing ‘object-oriented’ software in the traditional sense.

While many of us dream of the beautiful ubiquitous language from the blue book, most of us don’t really have a true domain model in code. We may have a shared language between product, UXD and engineering, but when the chips are down it’s essentially just data. Data we rip out, transform, and put back into place. It doesn’t have much of an in-memory lifecycle.

As Casey Muratori says, OOP makes more sense when something has a real lifecycle, like a server. It’s a thing, it’s not just data briefly masquerading as an entity in memory before it’s re-serialized.

Are you enforcing invariants at the aggregate root level? Or even creating aggregates instead of POJOs?

class Order {
  // Constructor etc...
 
  cancel() {
    if (this.shipping.hasBeenShipped()) {
      throw new HasShippedError()
    }
    
    this.canceledAt = new Date()
  }
  
  static hydrate(order: OrderData): Order {
    // Bypass the constructor
    const instance = Object.create(Order.prototype);
    Object.assign(instance, order);
    return instance
  }
}TypeScript

Probably not, and again, that’s ok!

You might be reinventing the ORM

Many repositories I see end up reinventing the modern ORM.

The ORM as it’s referred to in classic programming books (e.g. Hibernate and friends) is a very different beast from today’s ORM. Modern ORMs like Drizzle (and to some extent Prisma) are more akin to typed query builders.

They don’t map to entities, they give you a freeform typed canvas to build queries from. What you do with that is up to you. It’s beautiful:

const recentOrders = await db
  .select({
    orderId: orders.id,
    placedAt: orders.placedAt,
    customerName: customers.name,
    itemCount: sql<number>`count(${orderItems.id})`,
  })
  .from(orders)
  .innerJoin(customers, eq(orders.customerId, customers.id))
  .leftJoin(orderItems, eq(orderItems.orderId, orders.id))
  .where(and(
    eq(orders.status, 'completed'),
    gt(orders.placedAt, thirtyDaysAgo),
  ))
  .groupBy(orders.id, customers.name)
  .orderBy(desc(orders.placedAt))
  .limit(20);TypeScript

This leads to small, targeted queries and performant, targeted updates.

If you’re writing a repository method which has filtering, pagination, or god forbid a specification, you’re probably just reinventing a worse version of the syntax your ORM provides you:

interface FindOrdersOptions {
  customerId?: string;
  status?: OrderStatus | OrderStatus[];
  placedAfter?: Date;
  placedBefore?: Date;
  minTotal?: number;
  includeCanceled?: boolean;
  // Do we use TS trickery here to strengthen the return type when this is `true`?
  includeItems?: boolean;
  sortBy?: 'placedAt' | 'total' | 'customerName';
  sortDirection?: 'asc' | 'desc';
  limit?: number;
  offset?: number;
}

interface OrderRepository {
  find(opts: FindOrdersOptions): Promise<Order[]>
}TypeScript

Even if you exercise discipline, you’re probably over-fetching data just for the sake of working with a discrete ‘entity’.

If you do have a domain model…

If you do have a traditional domain model, a desire for strong consistency leaves you with a lot of little repositories which need to be coordinated (losing the aggregate-root-as-invariant idea):

class PlaceOrderCommand {
  constructor(
    private orderRepository: OrderRepository,
    private inventoryRepository: InventoryRepository,
  ) {}
  
  @UnitOfWork()
  async execute() {
    // Business logic...
    await this.orderRepository.save(order);
    await this.inventoryRepository.save(inventoryItem)
  }
}TypeScript

You’ll also either have to do this:

const order = await orderRepository.getWithInventory('123');TypeScript

Or this:

const order = await orderRepository.getById('123');
const inventory = await inventoryRepository.getByOrder('123');TypeScript

Or you’ll have to forever fetch inventory alongside an order. In which case, is Order the unit of consistency (aggregate root) for Inventory? Maybe?

You can’t just ‘swap’ your database

The few times when I have had to swap the persistence layer, it has never been as clean as swapping out the guts of the repository. Persistence layers have drastically different characteristics, such as:

  • Transaction handling.
  • Performance.
  • Key constraints (or lack thereof).

As Mike Acton says, if your data changes then your entire problem changes. I love the idealistic view of carving up the problem domain, but realistically unless you’re swapping MySQL for Postgres, it is never this straightforward.

If the move changes joins, transactional guarantees, indexing strategies, consistency, latency, or bulk-access patterns, then good luck.

You should be running tests against your real database

One reason people keep repositories around is testability: ‘we can stub the repository and keep the unit tests fast’. This has aged badly.

I’m a huge fan of integration tests, even if their definition is nebulous.

Modern DBs are fast enough to run your real test suite against them. I’m not going to rehash the argument here, but running against an in-memory hashmap provides zero confidence anything works in your CRUD app. Your entire app is making sure you extract, transform and store back the right data. Have something more complex? It goes in a unit test and shouldn’t be I/O-bound anyway.

This is even more compelling thanks to things like PGLite, but just spinning up Postgres in a container is more than fast enough today, and provides a huge amount of confidence in the code you’ve written.

So, do I need a repository?

Maybe, but make sure you’re actually getting value out of it.

You probably don’t have invariant-enforcing aggregates in the DDD sense, and you probably don’t have a Unit of Work sitting above your repos.

You can still do DI if you want, and you can (and should) extract data-layer helpers. But the repository pattern quickly degenerates into a thin and leaky wrapper unless you really commit to it.

You might argue that it doesn’t matter whether you call it a repository, or whether it fits some formal definition, as long as it’s useful. Fair enough. But the abstraction is pointless unless it protects a real domain boundary, improves testability in a way your integration tests don’t, or hides meaningful persistence complexity.

As Evans says:

In general, don’t fight your frameworks. Seek ways to keep the fundamentals of domain-driven design and let go of the specifics when the framework is antagonistic. Look for affinities between the concepts of domain-driven design and the concepts in the framework.

Domain Driven Design, Eric Evans

If your language, framework or setup doesn’t fit the pattern, don’t adopt it.