SQLAlchemy Mapper Configuration for Tenant-Scoped Relationships: Preventing Accidental Cross-Tenant Data Access in ORM Lazy-Loaded Relations
I've shipped multi-tenant systems where a single careless .organizations lazy-load returned data from every tenant in the database. That bug lived in production for three days before a security audit caught it. The query layer had tenant filters, but the ORM relationships bypassed them entirely.
This is the post I wish I'd had then.
The Problem: Silent Data Leaks Through Lazy-Loaded Relations
Here's a dangerous pattern I see constantly:
# models.py
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
tenant_id: Mapped[int] = mapped_column(ForeignKey("tenants.id"))
# This relationship has NO tenant filtering
organizations: Mapped[List["Organization"]] = relationship(
"Organization",
secondary="user_organization",
back_populates="users"
)
class Organization(Base):
__tablename__ = "organizations"
id: Mapped[int] = mapped_column(primary_key=True)
tenant_id: Mapped[int] = mapped_column(ForeignKey("tenants.id"))
users: Mapped[List["User"]] = relationship(
"User",
secondary="user_organization",
back_populates="organizations"
)
# endpoint.py
@app.get("/api/users/{user_id}/organizations")
async def get_user_organizations(user_id: int, request: Request):
user = db.query(User).filter(
User.id == user_id,
User.tenant_id == request.state.tenant_id # ✅ Tenant filter here
).first()
# ❌ BUG: This lazy-loads WITHOUT tenant filtering
return {"organizations": user.organizations}
Why is this dangerous? Because when you access user.organizations, SQLAlchemy generates:
SELECT * FROM organizations
WHERE id IN (SELECT organization_id FROM user_organization WHERE user_id = ?)
-- No tenant_id filter!
If a user somehow guesses another tenant's organization ID, they see it. The problem compounds: your query layer might have exhaustive tenant checks, but lazy-loaded relationships are invisible to that logic.
Why This Happens
SQLAlchemy's relationship loading is separate from your query filters. When you define a relationship, you're telling SQLAlchemy "here's the foreign key path." You're not telling it "and by the way, always filter by tenant_id."
The ORM assumes the database schema handles isolation (which is wrong for multi-tenancy). It also assumes you're not security-conscious enough to filter at the query level and the relationship level. It's a reasonable assumption in single-tenant systems. It's a catastrophic assumption in multi-tenant ones.
The Solution: Relationship-Level Tenant Filtering
I use three tactics, in order of preference:
1. Foreign Keys with Tenant Context (Best)
If your schema supports it, make relationships foreign key on (tenant_id, entity_id):
class UserOrganization(Base):
__tablename__ = "user_organization"
user_id: Mapped[int] = mapped_column(ForeignKey("users.id"), primary_key=True)
user_tenant_id: Mapped[int] = mapped_column(
ForeignKey("users.tenant_id"), primary_key=True
)
organization_id: Mapped[int] = mapped_column(ForeignKey("organizations.id"), primary_key=True)
organization_tenant_id: Mapped[int] = mapped_column(
ForeignKey("organizations.tenant_id"), primary_key=True
)
# Constraint that user and org must be in same tenant
__table_args__ = (
CheckConstraint("user_tenant_id = organization_tenant_id"),
)
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
tenant_id: Mapped[int] = mapped_column(ForeignKey("tenants.id"), primary_key=True)
organizations: Mapped[List["Organization"]] = relationship(
"Organization",
secondary="user_organization",
foreign_keys=[UserOrganization.organization_id, UserOrganization.organization_tenant_id],
primaryjoin="and_(User.id==UserOrganization.user_id, User.tenant_id==UserOrganization.user_tenant_id)",
secondaryjoin="and_(Organization.id==UserOrganization.organization_id, Organization.tenant_id==UserOrganization.organization_tenant_id)"
)
Why? The database itself enforces tenant boundaries. A cross-tenant join becomes impossible. This is bulletproof but requires schema redesign.
2. Relationship Loaders with Tenant Context (Most Practical)
Use selectinload() or contains_eager() with tenant filtering in your query:
from sqlalchemy.orm import selectinload
# In your service layer
def get_user_with_organizations(user_id: int, tenant_id: int):
return db.query(User).options(
selectinload(User.organizations).where(Organization.tenant_id == tenant_id)
).filter(
User.id == user_id,
User.tenant_id == tenant_id
).first()
Why? You control the loader strategy per query. No lazy-loading accidents. But you must remember to use it everywhere—it's not automatic.
I hate that it's not automatic. This is why I prefer approach #3:
3. Mapper Events with Automatic Tenant Filtering (What I Use in CitizenApp)
Use SQLAlchemy's before_all_gets and mapper events to intercept relationship loading:
from sqlalchemy import event
from sqlalchemy.orm import Mapper
from flask import request
# Global tenant context (you'd use a proper context manager in FastAPI)
_tenant_context = None
def set_tenant_context(tenant_id: int):
global _tenant_context
_tenant_context = tenant_id
@event.listens_for(Mapper, "before_all_gets")
def receive_before_all_gets(mapper_instance, clause):
"""Automatically append tenant_id filter to all loads."""
if _tenant_context is None:
return clause
# Only filter if the mapped class has tenant_id
if hasattr(mapper_instance.class_, "tenant_id"):
return clause.where(
mapper_instance.class_.tenant_id == _tenant_context
)
return clause
# In your FastAPI dependency
@app.middleware("http")
async def tenant_middleware(request: Request, call_next):
tenant_id = extract_tenant_from_token(request) # Your auth logic
set_tenant_context(tenant_id)
response = await call_next(request)
set_tenant_context(None)
return response
# Now this is safe—lazy-loads are filtered automatically
user = db.query(User).filter(User.id == user_id).first()
safe_orgs = user.organizations # Filtered by tenant!
Why I prefer this: Every relationship respects tenant context by default. No selectinload() boilerplate. No accidental cross-tenant queries.
Testing for the Regression
I ship a test that would catch the original bug:
def test_lazy_loaded_relationships_respect_tenant_isolation(db, tenant_1, tenant_2):
user_t1 = User(id=1, tenant_id=tenant_1.id)
org_t2 = Organization(id=1, tenant_id=tenant_2.id)
db.add_all([user_t1, org_t2])
db.commit()
# Manually set tenant context to tenant_1
set_tenant_context(tenant_1.id)
user = db.query(User).filter(User.id == 1).first()
# This should be empty because org belongs to tenant_2
assert user.organizations == []
This test would fail with the original unsafe code. It passes with the mapper event approach.
Gotcha: Context Propagation in Async Code
Using a global _tenant_context variable breaks with async. In FastAPI/async contexts, use contextvars:
from contextvars import ContextVar
_tenant_context: ContextVar[int | None] = ContextVar("tenant_id", default=None)
def set_tenant_context(tenant_id: int):
_tenant_context.set(tenant_id)
@event.listens_for(Mapper, "before_all_gets")
def receive_before_all_gets(mapper_instance, clause):
tenant_id = _tenant_context.get()
if tenant_id is None:
return clause
# ...
Context vars automatically isolate across async tasks. That three-day bug? Partly because I didn't know about contextvars yet.
Final Thoughts
Lazy-loading is convenient until it's a security breach. In CitizenApp, I enforce tenant filtering at three levels: the query, the relationship loader, and the mapper. Redundancy feels paranoid until an engineer is tired at 11 PM and forgets one of the safeguards.
Make tenant isolation automatic and impossible to bypass. Your future self auditing production logs will thank you.
Top comments (0)