Ivan Korostenskij

Posted on Apr 2 • Edited on Apr 3

Why Your Repository Pattern Creates Tech Debt (And How to Fix It in Python)

#python #architecture #beginners #tutorial

The Repository Pattern in Python

We've all inherited it: a critical 500-line function with raw SQL strings precariously placed between error handling, business logic, and an API call. You feel a natural instinct to refactor it, separate concerns; that's the right call! However, the pattern most tutorials teach you to accomplish this just creates a different kind of mess.

I'm talking about the repository pattern: an approach to separate your business layer (business logic) from your data layer (persistence and retrieval from a database).

Understanding what drives this pattern - and where the standard implementation goes wrong - will permanently change how you structure database logic. The end result is maintainable, testable, and readable code.

Martin Fowler describes this as mediating interaction, "… between the domain and data mapping layers using a collection-like interface." Let's break down what that actually means, and how it's been so misinterpreted.

The (short) theory

The fundamentals are exactly the same between good and bad approaches.

1) The "collections-like interface" is this: a class that defines database method signatures. In Python, we use Protocol from the typing module - this gives us structural subtyping (essentially implicit interfaces, like Go):

from typing import Protocol
from datetime import datetime

class UserRepository(Protocol):
    async def get_user_birthday_by_id(self, user_id: int) -> datetime: ...
    async def create_user(self, user: User) -> None: ...
    async def delete_user(self, user: User) -> None: ...

2) The implementation is a class with a single Unit of Work (database session) attribute.

Unit of Work

I highly recommend this article for a deep dive on the Unit of Work in Python

Ivan Korostenskij

Dec 11 '25

Stop Writing try/except Hell: Clean Database Transactions with SQLAlchemy with the Unit Of Work

#python #architecture #tutorial #learning

5 min read

from sqlalchemy import select
from unit_of_work import UnitOfWork

class UserStore:  # implements UserRepository
    def __init__(self, uow: UnitOfWork) -> None:
        self._uow = uow

    async def get_user_birthday_by_id(self, user_id: int) -> datetime:
        result = await self._uow.session.execute(
            select(User.birth_date).where(User.id == user_id)
        )
        return result.scalar_one()  # raises NoResultFound if missing

    async def create_user(self, user: User) -> None:
        self._uow.session.add(user)

    async def delete_user(self, user: User) -> None:
        await self._uow.session.delete(user)

Notice: UserStore never inherits from UserRepository. Python's Protocol uses structural subtyping - if the class has the right methods with the right signatures, it satisfies the protocol automatically. No inheritance required. This is the same behavior as Go's implicit interface satisfaction.

A single decision determines whether this approach becomes pure tech debt or maintainable, readable code: where you put it.

The Bad Way

To create a code version of the Atlantic garbage patch, follow outdated tutorials and define a single, giant protocol of database methods per table.

# Lives in: src/everything/user_repository_and_motorcycle_parts_slash_comic_book_store.py

class UserRepository(Protocol):
    async def get_user_birthday_by_id(self, user_id: int) -> datetime: ...
    async def create_user(self, user: User) -> None: ...
    async def delete_user(self, user: User) -> None: ...
    # ... 50 more methods for every niche edge case

This now serves as mini dumping grounds of every method any service needs, across your application.

This is a producer-defined interface - the repository declares everything it can do, and every consumer must accept the whole thing. Any test that touches this needs to mock all 50+ methods.

Let's see how this approach scales, starting fresh.

Two engineers - Sarah and Mike - start developing separate features, working with user data. Sarah needs an add_or_upgrade_user_subscription_tier database method to upgrade paying users (or add them if they're on a Free account).

class UserRepository(Protocol):
    async def add_or_upgrade_user_subscription_tier(
        self, user_id: int, tier: Tier
    ) -> None: ...

Mike now needs a method that adds time, in days, to a user's subscription. It's somewhat related, but he doesn't have a choice where to put it - it goes in the hole.

He can either:

1) Widen the existing method: Expand Sarah's query to accommodate his work, forcing all of its callers to follow the unrelated contract he shoved into it

class UserRepository(Protocol):
    async def add_or_upgrade_user_subscription_tier_or_time(
        self, user_id: int, tier: Tier | None = None, extend_days: int | None = None
    ) -> None: ...

2) Add a near-duplicate method: Break the interface segregation principle so every service working with users now has +1 extra useless method, adding a single-use, near-identical method

class UserRepository(Protocol):
    async def add_or_upgrade_user_subscription_tier(
        self, user_id: int, tier: Tier
    ) -> None: ...
    async def add_or_upgrade_user_subscription_time(
        self, user_id: int, extend_days: int
    ) -> None: ...

Neither are good. The interface grows either way and becomes a Pandora's box of hundreds of tangentially related queries. The implementation of this interface is even worse: easily breaking thousands of lines of unoptimized ORM code in a single file as the unavoidable blast radius of each subsequent change climbs.

"The bigger the interface, the weaker the abstraction."

Eventually, we have a god object that is injected into all parts of your code that have to touch user data. Testing your single method becomes a game of ensuring the other 50+ are mocked.

So, how can solely changing the location of these methods transform this approach into the gold standard for maintainable database logic? By spreading it back out.

The good way

Martin Fowler didn't say the "collection-like interfaces" need to be Python files that are thousands of lines long. So let's make them smaller and more focused. Instead of defining a 1000-point Swiss army knife for your application, we let each service define exactly what it needs: just a screwdriver; a hammer and 3 nails; a butter knife and some tweezers.

We switch from producer-defined interfaces to consumer-defined interfaces, making specialized, mini-repositories per feature.

In Python, the Service defines the Protocol, and the Repository satisfies it structurally - without inheritance.

# src/features/notifications/service.py
from typing import Protocol
from datetime import datetime, timedelta

class NotificationStore(Protocol):
    """Service-owned protocol: only the db methods this feature needs."""
    async def get_last_notified(self, user_id: int) -> datetime: ...
    async def mark_as_notified(self, user_id: int) -> None: ...

class NotificationService(Protocol):
    async def notify_user_by_id(self, user_id: int) -> None: ...

class Service:
    def __init__(
        self, store: NotificationStore, notifier: NotificationService
    ) -> None:
        self._store = store
        self._notifier = notifier

    async def notify_user(self, user_id: int) -> None:
        last = await self._store.get_last_notified(user_id)

        if (datetime.utcnow() - last) < timedelta(hours=24):
            return

        await self._notifier.notify_user_by_id(user_id)
        await self._store.mark_as_notified(user_id)

The implementation lives in the same feature folder, satisfying the protocol of the service.

# src/features/notifications/notification_database.py
from datetime import datetime
from sqlalchemy import select, update
from unit_of_work import UnitOfWork

class PostgresStore:
    """Satisfies NotificationStore without inheriting from it."""

    def __init__(self, uow: UnitOfWork) -> None:
        self._uow = uow

    async def get_last_notified(self, user_id: int) -> datetime:
        result = await self._uow.session.execute(
            select(UserModel.last_notified_at).where(UserModel.id == user_id)
        )
        return result.scalar_one()  # raises NoResultFound if missing

    async def mark_as_notified(self, user_id: int) -> None:
        await self._uow.session.execute(
            update(UserModel)
            .where(UserModel.id == user_id)
            .values(last_notified_at=datetime.utcnow())
        )

This gives us clean, modular interfaces; readable and maintainable; testability; and a great separation of concerns between features - at the cost of a Protocol.

Figure 1: A comparison of how services connect to their separated database methods in the bad (left) vs good (right) examples

The drawback of this approach is repetition. If 5 services need a get_user() method, are we going to implement it 5 times?

Let's look at the nuances of this approach, how those 5 services probably aren't using same get_user() method you think they are, and how, "a little copying is better than a little dependency" - Go Proverbs.

Nuance (with pushback)

A different get_user() method per feature seems crazy - I'm with you; what happened to Don't Repeat Yourself (DRY)?

Upfront: application-wide repositories are okay and can absolutely be the right move. But they are often not needed.

Let's see who's calling this get_user() method:

Authentication service
Notification service
Payment service

Each of these callers needs different parts of a user's data for radically different purposes; a user in the context of billing is fundamentally a different entity than a user in the context of authentication.

A) The authentication service wants the user's username & password
B) the payment service just needs the user's payment info
C) the notification service only needs an email and the last time they were notified.

Making a global get_user() method that returns a god User object that has these + 50 more attributes - just to satisfy all callers - sounds eerily similar to the interface explosion problem we just solved.

@dataclass
class User:
    id: int
    username: str
    email: str
    password_hash: str

    plan_id: str
    subscription_status: str
    stripe_customer_id: str
    trial_ends_at: datetime | None

    last_login_at: datetime | None
    ...
    # 50 + more

Now every test that touches user data has to construct this entire object, even if the feature only cares about two fields.

I challenge you to always start local:

# src/features/notification_service/repository.py
from typing import Protocol
from dataclasses import dataclass
from datetime import datetime

class MikesNotificationStore(Protocol):
    async def get_user_last_utc_time_notified(
        self, user_id: int
    ) -> datetime: ...
    async def get_all_premium_users_with_pending_notifications(
        self,
    ) -> list[NotificationUser]: ...

# A user specialized for the service
@dataclass(frozen=True, slots=True)
class NotificationUser:
    id: int
    contact_method: str
    contact_address: str
    last_notified_at: datetime
    has_pending_notif: bool

Allow queries to be "repeated". Forcing generic CRUD methods to work with 5 disparate features increases bad coupling, maintenance cost, and lines of code - as the User object blows up to accommodate working with everything. Specific, business-case queries are the way to go.

Lean into high-affinity coupling: components that change together belong together in the same package, not sparsely connected by a generic method. Individual implementations should be able to naturally evolve with their service - at little risk to the rest of the application. Requirements will change; the question is whether your architecture will fight or accommodate that.

A note on Go

Python's Protocol and Go's implicit interfaces work the same way conceptually: the consumer defines what it needs, and any struct/class that has the right methods satisfies it - no inheritance or explicit declaration required.

I originally wrote this article for Go. If you work in Go or want to see how this pattern looks with Go's type system, check it out here:

Ivan Korostenskij

Apr 1

The Repository Pattern Done Right: Consumer-Defined Interfaces in Go

#go #architecture #beginners #learning

6 min read

Conclusion

Bad code is rarely a skill issue; it's a pattern issue. The wrong abstraction taught confidently in a tutorial does more damage than no abstraction at all. Knowing the benefits of the right solution matters just as much as knowing the pitfalls of the wrong one; you now have a solid grasp on both.

It wasn't Mike that caused a pileup with his addition of a method, it was starting with the wrong pattern. Bad decisions compound. So set the standard: make a consumer-defined interface today so you aren't fighting someone else's 3000-line producer-defined interface six months from now.

Next time you're adding a database method to a shared repository, setting up a new feature, or want to refactor that 500-line Jenga block, pause for a second. Ask yourself: "does every function in my codebase need launch_user_into_space_query(), or just the one I'm working with?"

I'm Ivan, follow me for more content like this! :)

DEV Community

Why Your Repository Pattern Creates Tech Debt (And How to Fix It in Python)

The Repository Pattern in Python

The (short) theory

Unit of Work

Stop Writing try/except Hell: Clean Database Transactions with SQLAlchemy with the Unit Of Work

A single decision determines whether this approach becomes pure tech debt or maintainable, readable code: where you put it.

The Bad Way

The good way

Nuance (with pushback)

A note on Go

The Repository Pattern Done Right: Consumer-Defined Interfaces in Go

Conclusion

Top comments (0)