DEV Community

Tim Kang
Tim Kang

Posted on

When Terraform Stops Scaling for Multi-Tenant Kubernetes: A Database-Driven Approach

So here's the thing. I've been running a multi-tenant kubernetes platform for a while now, and for the longest time Terraform felt like the right tool for the job.

You know the pattern. You create a "tenant template" with all the namespaces, RBAC, network policies, deployments, ingresses, whatever. Then you replicate it for each tenant. Terraform modules keep everything nice and consistent. It works great.

Until it doesn't.

What happens after 100+ tenants

Once I crossed about 100 tenants, things started falling apart. Plan and apply got slower. State operations became expensive. Drift checking took forever. The whole system felt like it was optimizing for correctness at the cost of actually being usable.

What used to be a quick "spin up a new tenant" turned into this heavy, slow process. Tenant replication became a bottleneck instead of a solved problem.

GitOps didn't really help either

Naturally I looked at GitOps next. Git is great for auditing, no question. But here's the thing, GitOps reconciliation isn't built for instant replication when something happens at runtime.

Like, my actual requirement was: "someone creates a tenant in the database, provision everything in kubernetes immediately, then keep it in sync."

But GitOps adds this whole loop. Commit, sync, reconcile. It's not slow exactly, but it's not instant either. And for my use case, I needed something closer to instant.

I also tried some terraform-based operators and controllers. None of them really fit. They were still trying to manage everything through terraform state, which was exactly the problem I was trying to escape.

The mental shift that changed everything

Here's what finally clicked for me.

The template is configuration. Version that.

But the replicated tenant instances? Those are just derived data. Why am I versioning every single one of them?

Think about it. If I'm creating a hundred near-identical environments, do I really need each one represented as its own set of git objects or terraform resources? That's a lot of overhead for something that's basically just "apply this template with these variables."

For this kind of fast replication problem, a database is honestly the simplest source of truth. Create a row, render templates, apply resources. Update or delete the row, update or cleanup resources. Done.

So I built lynq

Lynq is a kubernetes operator that syncs resources directly from your database. You define templates, point it at your MySQL (postgres and others coming), and it just works.

Insert a tenant row and resources appear. Update the row and they update. Delete it and they clean up.

Some things it does:

  • database-driven automation from mysql, postgresql, more planned
  • go templates with sprig, so you get like 200+ functions to work with
  • server-side apply for proper kubernetes-native ownership
  • dag-based dependencies so you can control creation order
  • lifecycle policies for handling creation, deletion, conflicts

This matches how most products actually work internally anyway. Your users, orgs, tenants, they exist as database rows first. Lynq just connects that business truth directly to your infrastructure.

Check it out: https://lynq.sh and https://github.com/k8s-lynq/lynq

If you want to try it without setting anything up, I recently put together a Killercoda scenario where you can play with it in your browser: https://killercoda.com/lynq-operator/course/killercoda/lynq-quickstart

What do you even call this

I'm honestly not sure what to call this pattern. It's not gitops. It's not traditional IaC either, because we're intentionally not treating each tenant instance as something to declare and version individually.

I've been thinking of it as something like:

  • templateops for the git-versioned templates
  • dataops for the database-driven replication and lifecycle
  • kubernetes applies via SSA and policies

Maybe "DataOps for Kubernetes"? Feels close but also kind of vague. Open to better ideas.

Looking for feedback

If you're dealing with multi-tenant kubernetes, I'd love to hear from you.

Does this "version templates, not replicas" thing match your pain points? What would you call this pattern? And what's the hardest part of tenant provisioning for you right now, is it speed, safety, drift, cleanup, permissions?

Seriously, check out Lynq and tell me what's missing or what you'd change.

Top comments (0)