DEV Community

akilesh thuniki
akilesh thuniki

Posted on • Edited on

How I Built Multi-Tenant SaaS on AWS (So You Don't Have To)

It was 2:17 AM when my phone lit up with a Slack alert.

Two enterprise customers were seeing each other’s data.

Not all of it — just enough to trigger panic. The kind of bug that doesn’t just wake you up; it makes you question every infrastructure decision you’ve ever made.

That night is why SaaSInfraLab exists.

I was tired of rebuilding the same fragile multi-tenant infrastructure for every new SaaS project and hoping I didn’t miss something critical at 2 AM again.


The Problem: Multi-Tenancy Breaks in Subtle, Expensive Ways

Multi-tenant SaaS sounds straightforward until you’re running real workloads at scale.

Here’s what broke for me repeatedly:

  • Manual tenant onboarding took 2–3 hours per customer
  • Namespace misconfigurations exposed data across tenants
  • Terraform modules were copied and pasted and drifted over time
  • CI/CD pipelines were brittle and hard to reason about
  • AWS costs grew with no per-tenant visibility

At around 40–50 tenants, everything slowed down.

One bad helm change could impact everyone.
One missed IAM permission could block a deployment.
One rushed fix could leak data.

The problem isn’t Kubernetes or AWS — it’s the lack of structure and repeatability.


The Solution: A Production-Ready, GitOps-Driven SaaS Stack

Instead of patching the same problems again, I stepped back and designed a system with one rule:

Tenant isolation must exist at every layer.

High-Level Approach

I built a modular infrastructure stack with:

  • AWS EKS as the compute foundation
  • Terraform for deterministic infrastructure
  • GitOps (ArgoCD) as the control plane
  • PostgreSQL schema isolation for data
  • Namespaces, quotas, RBAC, and network policies by default

Everything is defined once, versioned, and reused.

No click-ops. No snowflakes.

Core Design Decisions (and Why)

Kubernetes Namespaces per tenant
This gives clean workload isolation, quota enforcement, and blast-radius control.

PostgreSQL schemas instead of separate databases
Lower cost, simpler operations, and safe isolation when paired with strict search paths.

await client.query(`SET search_path TO tenant_${tenantId}`);
Enter fullscreen mode Exit fullscreen mode

GitOps for all deployments
ArgoCD watches tenant definitions and applies changes automatically. No manual deploys, no surprises.

IRSA + RBAC everywhere
Every pod gets only the AWS permissions it needs — nothing more.

CI/CD Flow

  • CI (GitHub Actions): build images, run tests, push to ECR
  • CD (ArgoCD): syncs manifests, runs per-tenant migrations, deploys safely

Adding a tenant is a config change — not a weekend task.


Lessons Learned & What I’d Do Differently

If I were starting again:

  • I’d add cost attribution from day one
  • I’d document network policies earlier
  • I’d automate tenant-isolation tests sooner

The biggest takeaway?
Tenant isolation isn’t a single feature.
It’s defense in depth: IAM, network, compute, data, and deployment workflows all working together.

That’s what SaaSInfraLab tries to encode.


Try It Yourself

The entire stack is open source.

Clone it, define your tenants, and deploy a real multi-tenant SaaS foundation in under 30 minutes.

GitHub: https://github.com/SaaSInfraLab

Questions? I’m happy to discuss design decisions or help troubleshoot edge cases.

What’s been your worst infrastructure deployment incident — and how did you prevent it from happening again?

Top comments (0)