bilel salem

Posted on May 12 • Edited on May 21

Multi-Tenancy, Layer by Layer: From Row-Level Security to Whole-Cluster Isolation

#architecture #saas #postgres #fastapi

When people say "we built it multi-tenant", they almost always mean one specific thing — usually WHERE tenant_id = ? in their queries. But multi-tenancy is not a single decision. It's a stack of decisions, one per layer of your system: database, application code, compute, network, identity. You can mix and match. Most real production SaaS does.

This article walks through every layer where you can isolate tenants, the patterns that exist at each layer, and the trade-offs that matter once you have actual paying customers. Code examples use FastAPI + PostgreSQL because that's where most of these patterns are easiest to demonstrate, but the ideas are framework-agnostic.

Update (May 2026): Two additions since publishing, both from sharp reader comments — a section on the RLS performance cliff (thanks @arvavit) and one on tenant lifecycle: delete, export, and single-tenant restore (thanks @privacyfish).

What "tenant" actually means

A tenant is a unit of data ownership and configuration — usually a customer organization, sometimes an individual user, sometimes a workspace. Multi-tenancy means one logical system serves many tenants, and the system is responsible for keeping their data, behavior, and (sometimes) performance separated.

The opposite — single-tenancy — gives each customer their own deployment. Easy isolation, expensive everything else. Most SaaS lives somewhere between "everything shared" and "everything dedicated", which is exactly why this article exists.

The mental model: think in layers, not in patterns

The most useful framing I've found comes from the AWS SaaS Lens, which classifies any resource in your stack into one of three isolation models:

Model	What it means	Trade-off
Silo	Each tenant gets a dedicated copy of the resource	Strong isolation, high cost & ops overhead
Pool	All tenants share one copy of the resource	Cheap and agile, but noisy-neighbor & blast-radius risk
Bridge	Hybrid — some tenants or some resources are siloed, the rest pooled	Flexibility, but more complex to operate

The key insight: silo/pool/bridge is a choice you make per layer, not for the whole system. Microsoft's tenancy-models guide makes the same point — your UI tier might be fully pooled while your data tier is siloed per tenant, or vice versa.

Let's walk through the layers from the bottom up.

Layer 1: Database

This is where 90% of multi-tenancy discussions actually happen, because data leakage is the failure mode that ends companies. There are three canonical patterns.

1.1 Database-per-tenant (silo)

Each tenant gets a completely separate database — separate connection string, separate backups, separate everything. The application routes the request to the right DB based on the tenant identifier (usually from a subdomain, header, or JWT claim).

┌──────────────────────────────────┐
│         FastAPI App              │
└──────┬──────────┬──────────┬─────┘
       │          │          │
       ▼          ▼          ▼
   ┌──────┐  ┌──────┐  ┌──────┐
   │ DB A │  │ DB B │  │ DB C │
   └──────┘  └──────┘  └──────┘

Pros

Strongest isolation. A bug in a query cannot leak data across tenants — they're in different databases.
Per-tenant backup, restore, encryption keys, and even DB versions are possible.
Easy to "lift out" a tenant — migration, deletion (GDPR), export, or single-tenant point-in-time restore are all per-database operations rather than cross-table surgery (see tenant lifecycle below).
Noisy-neighbor problems disappear at the DB level.

Cons

Schema migrations have to run against every tenant DB. With 5,000 tenants this becomes its own ops problem.
Connection pools multiply. You can't keep 5,000 idle pools open; you'll need a connection router like PgBouncer or a per-request connection model.
Cross-tenant analytics need a separate aggregation pipeline (data warehouse, replicated reporting DB).
Per-DB fixed overhead (storage minimums, RDS instance pricing) makes the long tail of free/small tenants expensive.

Sweet spot: regulated industries (healthcare, finance), a small number of large enterprise tenants, or when tenants demand "their data lives in their own database" as a contractual term.

A FastAPI sketch for dynamic DB routing:

from fastapi import FastAPI, Depends, Header, HTTPException
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker

app = FastAPI()
_engines: dict[str, any] = {}

def get_dsn_for_tenant(tenant_id: str) -> str:
    # In real life, look this up from a "tenants" control-plane DB
    return f"postgresql+asyncpg://app:pass@db-host/tenant_{tenant_id}"

async def get_session(x_tenant_id: str = Header(...)) -> AsyncSession:
    if x_tenant_id not in _engines:
        _engines[x_tenant_id] = create_async_engine(
            get_dsn_for_tenant(x_tenant_id),
            pool_size=2,           # keep pools small per tenant
            max_overflow=2,
        )
    Session = sessionmaker(_engines[x_tenant_id], class_=AsyncSession)
    async with Session() as s:
        yield s

@app.get("/invoices")
async def list_invoices(session: AsyncSession = Depends(get_session)):
    result = await session.execute("SELECT id, amount FROM invoices")
    return result.mappings().all()

Caveat: caching engines in a process-local dict works for a handful of tenants. Beyond that, use an LRU and consider a serverless DB (Aurora Serverless, Neon, PlanetScale) so idle tenants cost nothing.

1.2 Schema-per-tenant (bridge)

One physical database, but each tenant gets their own PostgreSQL schema (or MySQL "database", which is the same concept under a different name). Tables are duplicated per tenant: tenant_a.invoices, tenant_b.invoices, etc.

Pros

Stronger isolation than a shared schema — no chance of writing a WHERE clause wrong and leaking rows.
Migrations still touch one physical DB, though they have to be applied to each schema.
search_path lets you "switch tenants" cheaply on a single connection.

Cons

The system-catalog tables (pg_class, pg_attribute) become huge once you have thousands of schemas. Queries against catalogs slow down, and tools like pg_dump get unhappy.
Migrations across N schemas are still N migrations. They tend to be transactional per schema, which becomes painful at scale.
ORM tooling (especially SQLAlchemy with Alembic, or Django) needs explicit configuration to discover and migrate per-tenant schemas.

Sweet spot: medium-sized SaaS with tens to low hundreds of tenants and per-tenant data that isn't well-suited to a single shared table (e.g., heavily customized per-tenant tables).

Setting search path per request in FastAPI:

from sqlalchemy import event, text

@event.listens_for(engine.sync_engine, "checkout")
def set_search_path(dbapi_conn, conn_record, conn_proxy):
    # Pull current tenant from a contextvar set by middleware
    tenant = current_tenant_var.get()
    cur = dbapi_conn.cursor()
    cur.execute(f'SET search_path TO "{tenant}", public')
    cur.close()

1.3 Shared schema with `tenant_id` (pool)

One database, one set of tables, and a tenant_id column on every tenant-owned table. Every query filters by it.

CREATE TABLE invoices (
    id          UUID PRIMARY KEY,
    tenant_id   UUID NOT NULL,
    amount      NUMERIC(12, 2) NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_invoices_tenant ON invoices(tenant_id);

Pros

Cheapest by far. One DB, one schema, one set of indexes.
Migrations are a single ALTER TABLE.
Cross-tenant analytics is just a query.

Cons

Isolation is now entirely the application's responsibility. One missing WHERE tenant_id = ? and a bug becomes a security incident. As the Crunchy Data blog puts it, this approach forces you to redo the same filtering pattern in every query.
Per-tenant indexes don't exist; large tenants and small tenants share the same B-trees. Noisy-neighbor at the storage level.
"Delete this tenant's data" becomes a multi-table cascade exercise.

This is the default starting point for most SaaS, and the place most people quietly graduate from once their first enterprise customer asks "where exactly is my data?".

1.4 The upgrade path: PostgreSQL Row-Level Security (RLS)

Shared-schema isolation gets safer when you stop trusting the application to add the filter and let the database enforce it. PostgreSQL's Row-Level Security lets you write a policy once and have it appended to every query automatically.

-- 1. Enable RLS
ALTER TABLE invoices ENABLE ROW LEVEL SECURITY;

-- 2. Define a policy keyed on a session variable
CREATE POLICY tenant_isolation ON invoices
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- 3. Application sets the variable per request
SET LOCAL app.current_tenant = '4f3a...';

Now SELECT * FROM invoices only ever returns rows for the current tenant, even if a developer forgets the WHERE clause. The database becomes the last line of defense, not the first.

The FastAPI integration looks like this:

from fastapi import FastAPI, Depends, Request
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import text

app = FastAPI()

@app.middleware("http")
async def extract_tenant(request: Request, call_next):
    # Pull from JWT, subdomain, header — whatever fits your auth
    request.state.tenant_id = request.headers.get("X-Tenant-Id")
    return await call_next(request)

async def get_tenant_session(request: Request) -> AsyncSession:
    async with AsyncSessionLocal() as session:
        # SET LOCAL is scoped to the transaction — perfect for a request
        await session.execute(
            text("SET LOCAL app.current_tenant = :tid"),
            {"tid": request.state.tenant_id},
        )
        yield session

@app.get("/invoices")
async def list_invoices(session: AsyncSession = Depends(get_tenant_session)):
    rows = await session.execute(text("SELECT id, amount FROM invoices"))
    # RLS already filtered. No WHERE clause needed.
    return rows.mappings().all()

Two important details:

Use a database role that cannot bypass RLS. Superusers and table owners ignore policies by default. Create a dedicated application role and grant only what it needs.
SET LOCAL is transaction-scoped. Make sure your session/transaction lifecycle aligns with a single request, or you'll leak the tenant context across requests on a pooled connection.

AWS has a detailed walkthrough of this pattern that's worth reading before rolling your own.

1.5 The catch: RLS has a performance cliff

RLS is the right security default, but it's also where teams hit a wall they didn't budget for — so it's worth understanding the cost before you commit to it. Every policy expression is evaluated per row the query touches. If your policy calls a function — current_setting(...), or in Supabase land auth.uid() / auth.jwt() — that function runs once for every row, not once for the query. Nest it inside an EXISTS subquery against another table and the planner runs that subquery per row too; a SELECT over a million rows can quietly become a million subqueries, and the plan falls apart.

The single highest-leverage fix is to make the tenant value resolve once per query instead of once per row. Wrap the lookup in a scalar subselect so PostgreSQL treats it as an InitPlan and caches the result for the whole statement:

-- Slow: current_setting() is re-evaluated for every row
CREATE POLICY tenant_isolation ON invoices
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Fast: evaluated once, cached for the whole query
CREATE POLICY tenant_isolation ON invoices
    USING (tenant_id = (SELECT current_setting('app.current_tenant')::uuid));

The same wrapping trick is the standard fix for Supabase's auth.uid() / auth.jwt() — write (SELECT auth.uid()) = user_id instead of auth.uid() = user_id. Supabase's own RLS performance guide documents this, and the linter even flags the un-wrapped version with an auth_rls_initplan warning. It only works when the function's result doesn't depend on row data — which is exactly the case for a per-request tenant id.

Three more rules that go with it:

Index every column a policy filters on. RLS doesn't change the planner's need for an index; tenant_id should be indexed (it is, in 1.3 above). On large tables this is the difference between a sequential scan and an index seek — often a 100× swing.
Rewrite correlated EXISTS subqueries into a join-friendly form, or push them into a SECURITY DEFINER helper that you call as (SELECT my_helper()). A policy like auth.uid() IN (SELECT user_id FROM team_user WHERE team_id = invoices.team_id) is far slower than flipping it to team_id IN (SELECT team_id FROM team_user WHERE user_id = (SELECT auth.uid())).
Check the plan with EXPLAIN ANALYZE, RLS on vs. off, in a non-prod environment. If you don't measure it, the first signal you get will be a tenant complaining about latency.

The deeper point — and this is the part that bites teams six months in — is that RLS only stays a backstop if it's fast enough that nobody is tempted to route around it. The moment a heavy query feels slow, someone adds a superuser / service_role connection that bypasses RLS "just for this one job," and now your last line of defense has a hole in it (see pitfall #2). Keep the policies cheap, keep the bypass roles audited, and treat the session-variable + connection-pooling discipline above as non-negotiable rather than optional.

Layer 2: Application / Compute

Database isolation is what most people think about. Application-layer isolation is what protects you when a single tenant's traffic, a memory leak, or a bad deploy threatens to take down everyone else.

2.1 Fully pooled application

One application deployment, one process pool, all tenants share the same instances. Tenant is resolved per request from a header, subdomain, or token.

This is the default and it's fine — until it isn't:

A single tenant pushing 10× traffic monopolizes worker processes.
An OOM kill takes everyone down.
A bad deploy hits all tenants simultaneously.

Mitigations: per-tenant rate limiting at the gateway, a circuit breaker around tenant-specific code paths, and most importantly deploy progressively (canary, blue-green) so one bad release doesn't break every customer at once.

2.2 Container/pod per tenant (silo)

Each tenant gets their own application container, often behind a shared ingress that routes by subdomain or path. In Kubernetes this typically maps to one Deployment + Service per tenant, sometimes in a dedicated namespace.

Pros

Per-tenant resource limits (CPU, memory) are easy.
A crash, memory leak, or runaway loop is contained to one tenant.
You can pin specific tenants to specific node pools, regions, or even hardware classes.
Per-tenant version pinning becomes possible — useful for customers who veto your weekly deploys.

Cons

Per-pod baseline cost (sidecars, JVM warmup, language runtime memory). At ~150MB per Python pod, 1,000 tenants is 150GB of RAM doing nothing.
Deploy pipelines must roll across N deployments.
Service discovery and routing become a real piece of infrastructure rather than a single Service.

Sweet spot: when tenants have very different load profiles, or when you need to silo certain enterprise tenants while keeping smaller ones pooled (a textbook bridge model).

2.3 The hybrid pattern most companies actually land on

The AWS SaaS Tenant Isolation Strategies whitepaper calls this the bridge model: the web/API tier is pooled (one shared deployment serving everyone), but the data tier — or specific microservices that are particularly sensitive or particularly heavy — runs siloed per tenant. Most production SaaS ends up here, even if they don't name it that way.

A common shape:

Frontend: pooled (CDN + static assets)
API gateway: pooled
Application services: pooled, but with per-tenant rate limits
Background job workers: siloed for heavy/enterprise tenants (so one tenant's giant import doesn't starve everyone else's queue)
Database: shared schema with RLS for small tenants, dedicated DB for enterprise tier

Layer 3: Infrastructure & Orchestration

This is the level above your application: clusters, accounts, VPCs.

3.1 Cluster per tenant

Each tenant gets a dedicated Kubernetes cluster (or equivalent). This is the most aggressive silo. It's expensive, slow to provision, and overkill for most workloads — but it's the only way to get true isolation from container-escape vulnerabilities or to satisfy regulators who don't accept namespace isolation as sufficient.

3.2 Namespace per tenant

Kubernetes namespaces give you logical isolation with NetworkPolicies, ResourceQuotas, and RBAC, while sharing the underlying nodes. This is the workhorse pattern for cloud-native multi-tenancy.

What namespace isolation actually buys you:

NetworkPolicy prevents pods in tenant A's namespace from talking to pods in tenant B's.
ResourceQuota / LimitRange caps how much CPU and memory each namespace can consume.
RBAC restricts which users/service accounts can act inside each namespace.

What it does not buy you:

Kernel-level isolation. A container-escape exploit affects every tenant on the same node.
Per-tenant control-plane resources (CRDs, cluster-scoped objects).
True noisy-neighbor protection at the storage or network IO layer (you need taints, node pools, or virtual clusters for that).

3.3 Account/subscription per tenant

The nuclear option: each tenant gets their own AWS account, Azure subscription, or GCP project. Used by some enterprise B2B SaaS, especially those who deploy into the customer's cloud (BYOC — "bring your own cloud"). Strongest isolation, fully delegated billing per tenant, but a serious automation effort to onboard and manage.

Layer 4: Storage, Cache, and Queues

Often forgotten until something goes wrong.

Resource	Pool pattern	Silo pattern
Object storage (S3, etc.)	Shared bucket with `tenant_id/...` key prefix	Bucket per tenant
Cache (Redis)	Shared instance, keys prefixed with `tenant:{id}:`	Database number per tenant, or Redis instance per tenant
Queues	Shared topic with tenant in message metadata	Queue/topic per tenant
Search (Elasticsearch)	Single index with `tenant_id` filter	Index per tenant

The pool patterns are cheaper and easier to operate; the silo patterns matter when you need per-tenant retention policies, encryption keys, or noisy-neighbor isolation. Two specific things bite people:

Cache poisoning across tenants. If your cache key derivation forgets the tenant prefix even once, you'll serve tenant A's data to tenant B from cache. Centralize cache key construction in one helper.
Queue starvation. A single shared queue means one tenant's huge batch job will delay every other tenant's messages. Either give heavy tenants their own queues, or use per-tenant fair-share scheduling.

Layer 5: Identity & Auth

The identity layer determines who a request belongs to before any of the other layers can isolate anything. There are two common shapes:

Shared identity provider, tenant claim in the token. One IdP (Auth0, Cognito, Keycloak, etc.), and every issued token carries a tenant_id claim. Cheap, but every tenant uses the same login UI by default.
Tenant-specific identity provider / federation. Each tenant connects their own SAML or OIDC provider. Required for B2B enterprise — large customers want their own SSO, MFA policies, and user lifecycle to govern your app.

The auth layer is also where you do the first security check: validate the tenant claim against the resource being accessed. The database layer (RLS) is your second check. Defense in depth.

The cross-cutting concern: tenant lifecycle

Everything above is about the steady state — how tenants stay isolated while the system runs. But two operations expose your architecture choice more brutally than day-to-day traffic ever will: deleting a tenant and restoring a single tenant. They're easy to defer because they don't show up until you have real customers — and then they show up all at once, usually in a contract clause or an incident.

Delete everywhere — and prove it. In a pooled shared schema, "delete this tenant" is a cascade across every table that carries a tenant_id, plus all the places data quietly leaks out of the primary tables: audit logs, soft-deleted rows, the search index, the object-storage prefix, cached entries, and any analytics replica. The hard part isn't the DELETE — it's proving nothing was missed when a regulator or the customer's security team asks. In a siloed model you drop the database (or schema, or bucket) and the proof is the absence of the resource.

Export. Enterprise contracts increasingly include a data-portability or clean-exit clause. Pooled means writing and maintaining per-table extraction queries filtered by tenant_id, and keeping them correct as the schema evolves. Siloed means pg_dump of one database. One of these rots over time; the other doesn't.

Selective restore — the one that surprises people. With a single pooled database, your backup is the whole database. If one tenant corrupts their own data (a bad bulk import, a buggy migration scoped to their account), you cannot restore just that tenant from last night's backup without rolling every other tenant back in time too. The realistic recovery is: restore the full backup to a side instance, then surgically copy that one tenant's rows back into production — table by table, foreign keys and all. With database-per-tenant, the same incident is a per-tenant point-in-time restore, and you're done.

This reframes the cost conversation around silos. The standard objection to database-per-tenant is "it's expensive and operationally heavy" — true for the steady state. But the silo is also what makes deletion, export, and restore boring instead of terrifying. When you're pricing isolation models, price these lifecycle operations in, not just compute and storage. A pooled architecture that can't answer "delete this one tenant and prove it" or "restore only this tenant" in hours rather than weeks is carrying a hidden liability that the first serious enterprise customer will find.

A practical middle path that fits the tier-based model below: keep small tenants pooled and accept that their lifecycle operations are query-based, but silo the storage of any tenant whose contract includes deletion guarantees, data residency, or a defined exit path. You're already charging those tenants more — let the architecture follow the money.

Putting it together: tier-based isolation

In practice, you don't pick one model. You pick a default and then add exceptions:

                   Pooled                Siloed
                   ──────                ──────
Free tier          ████████              
Pro tier           ████████              
Enterprise tier    ████        →   →     ████████

Free and Pro tenants share the pool. Enterprise tenants get dedicated databases (and maybe dedicated workers), priced accordingly. This is the tier-based isolation pattern in AWS's whitepaper — and it's how almost every mid-sized SaaS actually charges enterprise customers more.

Decision matrix

A pragmatic shortcut when you're starting fresh:

If you...	Start with
Are pre-revenue, < 50 tenants, B2C	Pool everything. Shared schema with `tenant_id`.
Are early B2B, signing your first paid customers	Shared schema + PostgreSQL RLS. Two layers of defense for the cost of one.
Have regulated workloads (HIPAA, PCI, data residency)	Database-per-tenant for affected data, pool the rest.
Have one or two whale customers killing your shared cluster	Silo those tenants' workers and databases; bridge model.
Are deploying into customer clouds	Account-per-tenant, full silo. Build heavy automation.

The single most common mistake is over-isolating early. Database-per-tenant from day one means you'll spend the first year of your company writing migration tooling instead of features. Start pooled, harden with RLS, and silo specific tenants on demand.

Common pitfalls

A non-exhaustive list of footguns I've seen (and stepped on):

Trusting the tenant_id in the request body. It should always come from the verified token, never from the request payload.
A "god mode" admin connection that bypasses RLS used for background jobs that then accidentally serves a request. (This is also how a slow RLS policy becomes a broken one — see 1.5: when the safety rail feels slow, people route around it.)
Forgetting connection pooling reuses the same Postgres session. SET (not SET LOCAL) leaks tenant context to the next request on that connection. Always use SET LOCAL.
Per-tenant migrations that run for hours. When you have 500 tenants and a migration takes 20 seconds each, that's almost three hours of staggered downtime.
Tenant-scoped caches that aren't scoped. Memoization decorators with no tenant in the cache key are a classic data-leak vector.
No "delete tenant" (or "restore one tenant") runbook. GDPR/CCPA deletion requests and "we're leaving, hand over our data" clauses will come — and so will the day one tenant needs a point-in-time restore that doesn't roll everyone else back. Decide early whether your storage architecture can answer these in hours or in weeks (see tenant lifecycle above).

TL;DR

Multi-tenancy is not one decision — it's a decision per layer (database, application, compute, network, identity).
AWS's silo / pool / bridge model is the most useful vocabulary for talking about isolation per layer.
For most SaaS, the right starting point is a shared schema with PostgreSQL Row-Level Security — but wrap your policy lookups in (SELECT …) and index the tenant column, or RLS gets slow enough that someone bypasses it and your backstop quietly disappears. Add a plan to silo specific tenants as they grow into it.
Design for tenant deletion, export, and single-tenant restore from day one. They're the operations that expose your isolation model the hardest, and pooled storage makes all three harder than they look.
Per-tier isolation (free pooled, enterprise siloed) is the architecture that aligns with how you actually price the product.
Defense in depth: validate tenant at the auth layer, enforce it at the database layer, monitor it everywhere in between.

References

AWS Well-Architected Framework — SaaS Lens: Silo, Pool, and Bridge Models
AWS Whitepaper: SaaS Tenant Isolation Strategies
AWS Whitepaper: Multi-Tenant SaaS Storage Strategies
AWS Guidance: Multi-Tenant Architectures on AWS
AWS Database Blog: Multi-tenant data isolation with PostgreSQL Row Level Security
Microsoft Learn: Tenancy models for a multitenant solution
PostgreSQL Documentation: Row Security Policies
Crunchy Data Blog: Row Level Security for Tenants in Postgres
Rafay: What is Multi-Tenancy? Multi-Tenant Architecture
Logto Blog: Multi-tenancy implementation with PostgreSQL
Alasco Tech: Django friends with Postgres Row-level Security
Just After Midnight: Tenant isolation in SaaS — pool, silo and bridge models explained
Supabase Docs: Row Level Security — Performance and Best Practices

Top comments (5)

Vadym Arnaut • May 12

Solid breakdown. One quibble on the RLS-as-starting-point recommendation: it's the right answer for the security model, but it's also where most teams hit a perf wall they didn't expect. Every policy gets re-evaluated per row, and once you nest auth.jwt() calls inside EXISTS subqueries the query plan falls apart. The fix is wrapping auth.jwt() in a STABLE function or pinning the tenant id into a session GUC at connection time and reading that in the policy. Worth a footnote, the cost-vs-isolation table doesn't capture this one

bilel salem • May 21

recheck the article

Vadym Arnaut • May 21

The new section 1.5 nails it. (SELECT auth.uid()) with the auth_rls_initplan citation is what was missing. I ran the same rewrite across 76 policies recently and wrote it up in 76 RLS policies rewritten in one migration. One thing the linter misses: auth.uid() calls inside SECURITY DEFINER helpers silently keep the per-row pattern. Worth EXPLAIN-ing the slow ones.

Privacy.Fish • May 19

The layer-by-layer framing is useful because it avoids the usual “RLS vs database-per-tenant” false binary. In real SaaS the answer often changes by customer tier, data class, and lifecycle stage.

One operational thing I’d add is tenant deletion/export as its own design pressure. Pooled tables with tenant_id are cheap and easy to query across, but “delete this tenant everywhere, prove it, and restore only that tenant from backup” becomes much harder than people expect. Siloed storage looks expensive until the first enterprise customer asks for a clean exit path.

Vadym’s RLS performance caveat is also worth calling out. RLS is a good backstop, but it needs boring discipline around session variables, connection pooling, and query plans or teams eventually start bypassing the safety rail because it feels slow.

bilel salem • May 21

recheck the article

What "tenant" actually means

The mental model: think in layers, not in patterns

Layer 1: Database

1.1 Database-per-tenant (silo)

1.2 Schema-per-tenant (bridge)

1.3 Shared schema with tenant_id (pool)

1.4 The upgrade path: PostgreSQL Row-Level Security (RLS)

1.5 The catch: RLS has a performance cliff

Layer 2: Application / Compute

2.1 Fully pooled application

2.2 Container/pod per tenant (silo)

2.3 The hybrid pattern most companies actually land on

Layer 3: Infrastructure & Orchestration

3.1 Cluster per tenant

3.2 Namespace per tenant

3.3 Account/subscription per tenant

Layer 4: Storage, Cache, and Queues

Layer 5: Identity & Auth

The cross-cutting concern: tenant lifecycle

Putting it together: tier-based isolation

Decision matrix

Common pitfalls

TL;DR

References

1.3 Shared schema with `tenant_id` (pool)