Istvan Szabo

Posted on Apr 18

Multi-site schemas: lessons from evolving one large system over time

#database #systemdesign #backend #architecture

Multi-site schemas: lessons from evolving one large system over time

I worked on a large multi-site system where the data model had to evolve as the product grew.

It started out simple, but as always, with progress the demand also grew to something that allowed flexibility.

The progression was:

site_id
many-to-many relationships
scope_id

Each stage solved the limitations of the previous one.

Looking back, I would skip directly to the final model.

1. Start simple: `site_id`

The original model was:

products
- id
- site_id
- name

Queries were simple:

WHERE site_id = ?

In practice, the active site was usually resolved from the request domain.

Examples:

us.example.com → site 1
eu.example.com → site 2

So each request enters the system with a current site_id, and that drives filtering, permissions, and configuration.

This model is easy to understand, easy to query, and operationally simple.

Where it stopped being enough

Once entities needed selective sharing across sites, site_id became restrictive.

Typical workaround:

site_id = 0

Meaning global.

That handles:

one site
all sites

But not:

site A + B only
region-specific sharing
custom groups of sites

2. Add relationships: many-to-many

The next idea was relationship tables.

products
- id
- name

product_sites
- product_id
- site_id

Now entities could belong to any set of sites.

That solved the sharing problem.

Why I turned away from this approach

Only some data benefits from many-to-many visibility.

Examples:

Good candidates:

products
templates
assets

Poor candidates:

orders
logs
analytics events
transactions

So the system would be using two patterns:

WHERE site_id = ?

and

JOIN product_sites ...

That inconsistency spreads over time.

Different repositories, query builders, exports, reports, and jobs all need to know which model applies.

3. The real requirement was visibility

The actual question most features were asking was not:

Which site owns this?

It was:

Which sites can access this?

That is a visibility problem.

Visibility is naturally represented as a reusable set.

4. Introduce `scope_id`

Instead of attaching sites directly to every entity, entities reference a scope.

- id
- scope_id
- name

scopes
- id
- hash

scope_members
- scope_id
- site_id

With:

scope_id = 0

Reserved for global visibility.

What a scope represents

Examples:

0 → visible everywhere
scope 12 → site [1]
scope 27 → sites [1,2,5]

The entity only stores visibility.

No direct ownership rules. No per-entity join tables.

5. Why this improved the system

The biggest benefit was consistency.

Reads normalize to one pattern:

WHERE scope_id IN (0, ...)

Where the allowed scopes for the active site are resolved once and cached.

That means:

one read model
no switching between joins and direct filters
no special handling for global records
simpler shared infrastructure code

6. Writes become explicit

Visibility is assigned intentionally:

scope_id = Scope::forSite(1);

scope_id = Scope::forSites([1,2,5]);

scope_id = 0;

That centralizes the logic into one helper layer.

7. Why duplication becomes expensive

A common workaround for multi-region systems is cloning records into each site.

That seems fine initially.

Later it creates ongoing problems:

global identifiers
syncing edits across copies
partial overrides
having to worry about the source of truth
larger data size that could directly affect performance
rollout coordination

At scale, managing copies becomes harder than managing access.

A shared entity with scoped visibility avoids most of that.

One identity, multiple sites.

8. What I would do now

If I were building this again, I would start with scope_id.

It is close enough to site_id in simplicity:

scope_id = Scope::forSite(1)

But it avoids the later refactor when sharing becomes necessary.

Single-site behavior still works naturally.

Shared visibility is already built in.

Global visibility already exists.

There is little benefit in starting with a narrower model if the broader one carries almost no real penalty.

Final thought

The strongest argument for scope_id is not just flexibility.

It is that it handles:

isolated sites
shared entities
global entities
future growth

using one consistent model.

If a design solves today’s requirements and tomorrow’s likely requirements without meaningful tradeoffs, it is usually the better starting point.

DEV Community

Multi-site schemas: lessons from evolving one large system over time

Multi-site schemas: lessons from evolving one large system over time

1. Start simple: `site_id`

Where it stopped being enough

2. Add relationships: many-to-many

Why I turned away from this approach

3. The real requirement was visibility

4. Introduce `scope_id`

What a scope represents

5. Why this improved the system

6. Writes become explicit

7. Why duplication becomes expensive

8. What I would do now

Final thought

Top comments (0)

Multi-site schemas: lessons from evolving one large system over time

1. Start simple: site_id

Where it stopped being enough

2. Add relationships: many-to-many

Why I turned away from this approach

3. The real requirement was visibility

4. Introduce scope_id

What a scope represents

5. Why this improved the system

6. Writes become explicit

7. Why duplication becomes expensive

8. What I would do now

Final thought

1. Start simple: `site_id`

4. Introduce `scope_id`