Joud Awad

Posted on Jun 26

51/60 Days System Design Questions

#abotwrotethis #systemdesign #database #software

You're building a B2B SaaS product. 50 enterprise customers. Each one wants their data isolated. Some are on free plans. A few are paying $50k/year and demanding SLA guarantees.

Your current setup:
→ One database. One schema. A tenant_id column on every table.
→ One app server handling all traffic.
→ A free-tier customer running a badly-written bulk export just hammered your DB for 40 seconds. A paying enterprise customer's checkout flow timed out.

Your investors are not happy. Neither is that enterprise customer.

The engineering question: how do you isolate tenants without rebuilding the whole product?

A) Keep one shared DB — add row-level security + query budgets per tenant to enforce limits.

B) Schema-per-tenant — every customer gets their own schema in the same Postgres instance, migrations run per-schema.

C) Database-per-tenant (silo model) — each enterprise customer gets a dedicated DB. Free tier stays pooled.

D) Middleware bridge — route requests to tenant-specific DB clusters based on a tenant registry, free tier stays on shared pool.

One of these is a band-aid. One will collapse under 500 tenants. One is how Notion, Salesforce, and every serious B2B at scale actually operates.

Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments (including the noisy neighbor failure mode nobody warns you about).

If your team is building multi-tenant systems, share this. The wrong isolation model is a rewrite waiting to happen.

Drop your answer 👇

30DaysOfSystemDesign #SystemDesign #BackendEngineering #SoftwareArchitecture

Top comments (4)

Joud Awad • Jun 26

Why D wins (Middleware bridge — hybrid model):

This is the pattern Salesforce, HubSpot, and every mature B2B SaaS converges on eventually.

The architecture:
→ Free/small tenants → shared pool DB (cost-efficient, acceptable risk)
→ Mid-tier tenants → schema-per-tenant in a shared Postgres cluster
→ Enterprise tenants → dedicated DB cluster (full isolation, SLA-able)

A tenant registry (Redis or a lightweight service) maps tenant_id → connection config at request time. Your app server doesn't care which model a tenant uses — it asks the registry and gets a connection string back.

The payoff:
→ Noisy neighbor is contained. A runaway free-tier export can only hurt other free-tier tenants.
→ Enterprise customers can't touch each other. Period.
→ You can migrate a tenant up the isolation ladder as they grow — without touching app code.
→ GDPR deletion is one DB drop for enterprise tenants, not a cascade delete across shared tables.

Joud Awad • Jun 26

Why A fails (Row-level security + query budgets):

RLS works. Query budgets can work. But they share the same DB process. A tenant that opens 500 connections, triggers a table lock, or fills the WAL — that bleeds into every other tenant on the instance. You can throttle CPU, but you can't throttle I/O blast radius once it's in the buffer pool.

This is the band-aid answer. Fine at 5 tenants. Falls apart at 50.

Joud Awad • Jun 26

Why B is dangerous (Schema-per-tenant):

Schema-per-tenant sounds clean. It is, until:
→ 300 tenants need a migration. That's 300 sequential DDL operations on one Postgres instance. Migration windows become a multi-hour risk event.
→ At 500 tenants you're burning connection slots faster than you can PgBounce your way out.
→ One long-running query in schema_tenant_247 still holds locks on shared system tables. Noisy neighbor hasn't gone away — you just can't see it as clearly.

Schema-per-tenant is a local maximum. You hit Postgres scaling walls before you hit the point where it pays off.

Joud Awad • Jun 26

Why C is too expensive too early (Full silo):

The silo model is the correct isolation model for enterprise customers. But applying it to all tenants — including a free tier with 1,000 accounts — means 1,000 Postgres instances. That's $40k/month in RDS costs before you've made a dollar.

Full silo for everyone is the right endgame for the top 5% of your customer base. Applying it everywhere is overengineering.