PecRodrigues

Posted on May 12

Why tenant isolation should not live only in application code

#opensource #saas #devops #architecture

Most SaaS applications start with a simple and familiar multi-tenant model:

one shared backend
one shared runtime
one shared process
tenant separation handled mostly by application code
a tenant_id column or some equivalent logical boundary

This model works well in many cases, especially at the beginning.

It is simple.
It is cheap.
It is fast to build.
It fits naturally into most web frameworks.

But as a SaaS product grows, especially when it becomes white-label, extensible, or customer-specific, this approach can become fragile.

A bug in one tenant can affect others.
A bad customization can break the whole service.
A plugin or custom app can access more than it should.
A single shared process can increase the blast radius of failures.

At some point, tenant isolation stops being only an application concern. It becomes an operational concern too.

Logical multi-tenancy is not enough for every case

Logical multi-tenancy usually means that the application is responsible for keeping tenants separated. For example:

single backend process
 └── all tenants
     ├── tenant A
     ├── tenant B
     └── tenant C

The application checks the current tenant, filters data, applies permissions, and makes sure one customer does not access another customer's resources.

That is a valid architecture. But it depends heavily on the correctness of the application code.

If the code forgets a tenant filter, exposes the wrong file path, shares state incorrectly, or loads unsafe custom behavior, the runtime itself does not provide many additional boundaries.

The tenants are logically separated, but operationally close.

The blast radius problem

In infrastructure and security discussions, "blast radius" means the amount of damage a failure or compromise can cause. In a traditional shared-runtime SaaS setup, the blast radius can be large:

one process fails
 └── every tenant may be affected

This is not always a problem. For small systems, internal tools, or products with limited customization, a shared process can be perfectly reasonable.

But for SaaS platforms with white-label clients, tenant-specific apps, plugins, custom domains, or stronger isolation needs, the model starts to feel risky.

The question becomes:

Should tenant isolation live only inside the application code?

I do not think it always should.

Runtime-level tenant isolation

An alternative approach is to move part of tenant isolation into the runtime layer.

Instead of running all tenants inside the same process, each tenant can have stronger runtime boundaries. For example:

runtime supervisor
 ├── tenant A process
 │   └── isolated app workers
 ├── tenant B process
 │   └── isolated app workers
 └── tenant C process
     └── isolated app workers

This does not mean replacing application-level security.

You still need authentication, authorization, input validation, database permissions, and good software design.

But the runtime can help reduce the consequences of mistakes. A tenant can have:

its own process
its own restricted system user
its own filesystem boundary
its own domain configuration
its own network access rules
isolated app workers inside the tenant

This creates a second layer of protection.

The application still knows about tenants, but the operating environment does too.

What this is not

This is not the same as saying:

"Everyone should stop using shared multi-tenancy."

That would be wrong.

Shared multi-tenancy is useful, efficient, and appropriate for many SaaS products.

This is also not a claim that process-level isolation replaces containers, VMs, Kubernetes, or cloud-native infrastructure.

A well-configured container or VM setup can provide stronger isolation boundaries.

The goal here is different:

reduce operational risk and provide stronger default boundaries for multi-tenant SaaS applications without forcing every project to start with a full container orchestration stack.

It is a middle layer between simple application-only tenancy and heavier infrastructure isolation.

Why this matters for white-label SaaS

White-label SaaS products often need more than basic tenant separation.

Each customer may need:

a custom domain
different branding
custom configuration
specific integrations
isolated extensions or apps
safer failure boundaries
controlled access to external resources

In these cases, the tenant is not just a row in the database.

The tenant starts to look more like an operating environment.

That is the kind of scenario where runtime-level isolation becomes interesting.

What I am building

I am building an open-source project called Ehecoatl.

Ehecoatl is a backend runtime for multi-tenant and white-label SaaS applications.

The goal is to make tenant isolation more structural, not only logical.

As an early proof of concept, the Ehecoatl website, the documentation site, and the newsletter system are already running on top of this runtime.

(That does not mean the project is production-ready for every use case yet, but it helps validate the core idea in a real environment: serving public pages, routing domains, loading app configuration, and operating small isolated app contexts.)

Instead of treating multi-tenancy only as something the application framework must handle, Ehecoatl treats it as a runtime and operational concern too.

The current direction includes:

tenant process supervision
filesystem isolation
restricted system users
tenant and app configuration scanning
multi-domain routing
isolated app workers
safer defaults for SaaS and white-label environments

The experience I am aiming for is:

less friction to provision, operate, and monitor tenants; more safety to experiment with customizations without increasing the blast radius too much.

A simplified mental model

Instead of thinking about a SaaS app like this:

application
 └── tenants separated only by app logic

Ehecoatl tries to move toward this:

runtime
 ├── tenant environment
 │   └── app workers
 ├── tenant environment
 │   └── app workers
 └── tenant environment
     └── app workers

The runtime becomes responsible for part of the isolation model.

The application still handles business logic.

The infrastructure still handles machine-level security.

But the tenant boundary becomes more explicit.

Why not just use containers?

Containers are a very reasonable answer.

For many teams, they are the right answer.

But they also introduce operational complexity, especially for small teams, solo builders, agencies, and early-stage SaaS products.

The idea behind Ehecoatl is not to compete directly with container orchestration.

It is to provide a simpler runtime model for cases where:

full orchestration feels too heavy
app-only isolation feels too weak
tenants need stronger operational boundaries
white-label environments need easier provisioning
custom apps or extensions need safer execution boundaries

There is a trade-off.

This approach is not perfect isolation.

But it can be a practical improvement over putting every tenant inside one shared process with only logical checks.

Current status

Ehecoatl is still early, but it is not only a design document.

The project is already being used to serve:

the main Ehecoatl website
the documentation site
the newsletter system

This is currently my first proof of concept and experimentation environment.

It allows me to test the runtime with real routing, real domains, real static and dynamic pages, app configuration loading, and operational behavior outside of a purely local demo.

There is still a lot to harden before I would describe it as generally production-ready, especially around security review, documentation, installation flow, and broader testing.

But the core runtime is already serving the project around it.

Open questions

I would love feedback from other developers and SaaS builders.

Some questions I am thinking about:

Would you use process-level tenant isolation in a multi-tenant SaaS project?
In which scenarios does this model make sense?
At what point would you prefer containers, VMs, or Kubernetes instead?
What would you expect from a multi-tenant runtime before trusting it?
Should this kind of tool be a standalone runtime, a framework companion, or a deployment layer?
Where do you think this approach can fail?

Final thought

Application-level tenant checks are necessary.

But for some SaaS products, they are not enough.

As soon as tenants start needing custom behavior, custom domains, isolated apps, or stronger operational guarantees, it becomes worth asking:

what should the runtime know about tenancy?

That is the question I am exploring with Ehecoatl.

DEV Community