kanaria007

Posted on Feb 16 • Originally published at zenn.dev

Chapter 2 — RML-1 (Closed World): Build a Room Where Failure Is Safe

#architecture #distributedsystems #microservices #sre

The Worlds of Distributed Systems — Chapter 2

“As long as nothing has left the room, you can retry forever.”

RML-1 (Closed World) is the least flashy of the three worlds, but it’s the one that quietly determines whether everything else becomes painful.

In this chapter, we’ll reframe RML-1 as:

a room where you can fail safely, and
a design discipline for deciding:
- what must stay inside RML-1, and
- where you must “graduate” into RML-2 / RML-3.

1) What is the Closed World?

If we summarize RML-1 in one line:

A temporary world that is still not observable from the outside.

1.1 Conditions for RML-1

A process belongs to RML-1 when it satisfies (roughly) these conditions:

No writes to external databases
No calls to external APIs that cause real effects
No notifications to humans (email / Slack / push / etc.)
No logs that later become inputs to business decisions or audits

In other words:

If this process fails, it leaves no external trace of having happened.

1.2 Typical examples

Form validation in the browser
Building a query from search conditions (not executed yet)
Scoring/inference in a dry run where results are discarded
Simulation that reads production data but never writes

flowchart LR
  subgraph ClosedWorld[RML-1: Closed World]
    A[Memory] --> B[Temporary files]
    B --> A
  end
  ClosedWorld -->|If OK| OUT[External world (RML-2/3)]

You can experiment inside the room forever without changing the outside world.

1.3 “Sandbox” is not the same as RML-1

A common confusion:

“Isn’t this just a sandbox environment?”

The key distinction:

Sandbox / staging = isolated environment
- separate cluster/account/URL
- can still perform DB writes and external integrations inside that environment
RML-1 = isolated state / behavior
- even in the same environment (sometimes even in production), you can run in a mode where outcomes are not externally observable

A useful phrase:

Sandbox is about where. RML-1 is about what leaks out.

This is why:

staging can still contain RML-2/RML-3 behavior, and
production can still host an RML-1 “mode” (dry-run preview, simulation).

2) Why you should deliberately carve out RML-1

RML-1 isn’t academic—it creates concrete operational advantages.

2.1 A place where you can “try messy things safely”

If you can keep work inside RML-1, you can do:

experiments on real data
algorithm comparisons
offline side of A/B testing

…with a healthy loop:

“Try quickly.”
“If it’s bad, erase it completely.”

If you don’t design RML-1 explicitly, a small PoC tends to leak into RML-2/3 immediately, and teams fall into the trap of:

“Production data is scary, so we’ll only test with toy data” → models and logic never mature.

2.2 You make the boundary visible

When RML-1 is explicit, the boundary becomes discussable:

transaction begin
queue publish
sending HTTP requests

You start to talk about exits as:

“This is the doorway from RML-1 → RML-2.”

In design reviews, it enables a simple and powerful question:

“Are we still inside the room here?”

2.3 It changes how you think about staging and testing

Many “staging” setups are fake safety:

they read production DBs directly
they connect to real external services
they send real emails/notifications

That’s not RML-1. That’s already RML-2/3 behavior.

Once you think in worlds, you can do better:

embed RML-1 inside production (dry-run APIs)
forbid RML-2/3 effects even in staging
enforce “world boundaries” in code, not only in infrastructure.

3) RML-1 design patterns

Now let’s get practical: patterns for keeping computation inside the room.

3.1 Read-only + Dry Run

Idea

Read production-like data, but never write and never notify.

Use cases

evaluate a new algorithm on real data
“If we apply this rule, what changes?” preview
dry-run a batch job (compute + discard; optional non-business debug logs)

Implementation sketch

use a DB role with read-only privileges
external clients support no-op mode

type World = "RML1" | "RML2" | "RML3";

type Env = {
  world: World;
  db: DbClient;       // may be read-only
  notifier: Notifier; // no-op in RML-1
};

async function simulate(env: Env, input: Input): Promise<SimulationResult> {
  if (env.world !== "RML1") {
    throw new Error("simulate() is RML-1 only");
  }

  const raw = await env.db.fetchProductionData(input);
  const result = runNewAlgorithm(raw);

  // In RML-1 we return results but do not persist or notify.
  return result;
}

Key habit

make “world” explicit in code
in RML-1, db.update(...) / notifier.send(...) should either throw or be no-op.

3.2 Consolidate exits into one place (Effect Dispatcher)

If effects can happen anywhere, you can’t reason about RML-1.

Anti-patterns

random HTTP calls scattered across the codebase
“a library casually sends an email”
business logic directly publishes to queues

Pattern

Collect all external effects as values, and execute them only at a single commit point.

type Effect =  
  | { type: "SendEmail"; to: string; subject: string; body: string }  
  | { type: "UpdateDb"; table: string; id: string; payload: unknown }  
  | { type: "EmitEvent"; topic: string; payload: unknown };  

type World = "RML1" | "RML2" | "RML3";  

// Minimal placeholders to keep the example self-contained.  
type Input = { userEmail: string };  
type Result = { ok: true };  

type CommitResult =  
  | { mode: "DRY_RUN"; planned: Effect[] }  
  | { mode: "EXECUTED" };  

async function runBusinessLogic(input: Input): Promise<{ result: Result; effects: Effect[] }> {  
  const effects: Effect[] = [];  

  // compute/validate (keep this RML-1-like)  
  const result: Result = { ok: true };  

  // Propose effects as data (do not execute here).  
  effects.push({  
    type: "SendEmail",  
    to: input.userEmail,  
    subject: "Notice",  
    body: "...",  
  });  

  return { result, effects };  
}  

async function commit(world: World, effects: Effect[]): Promise<CommitResult> {  
  if (world === "RML1") {  
    // Dry-run: do not execute. Return the plan for preview/UI.  
    return { mode: "DRY_RUN", planned: effects };  
  }  

  // In RML-2/3, effects are executed here and only here.  
  for (const e of effects) {  
    // dispatch by type  
  }  

  return { mode: "EXECUTED" };  
}

Why this matters:

runBusinessLogic() stays inside the room
commit() becomes the world boundary and the responsibility boundary

In reviews you can ask:

“Inside this function, we only accumulate effects, right?”

3.3 RML-1 inside production: safe “cheat mode”

A powerful step is:

Embed RML-1 features inside production.

Examples:

users see a preview that isn’t committed yet
admins simulate “what would happen” before applying changes

UI-wise it’s often:

“Save” button = exit to RML-2/3
“Simulate” button = stay in RML-1 and return

Another example: safe shadow validation:

run new payment logic on real traffic
keep results in RML-1
production behavior remains old logic until explicitly promoted

You can explain “shadow deployments” cleanly as a world concept.

3.4 Sidebar: idempotency and side-effect gradients

Advanced readers may ask:

“If writes are perfectly idempotent, can they be treated like RML-1?”

For this series, we intentionally draw a strict line:

RML-1 assumes zero externally observable side effects.

Because in practice:

proving idempotency is hard
“we thought it was safe” often means “we missed how it’s observed”

However, there is a practical gradient:

in-process memory cache updates
temporary files nobody else can observe
single-process internal state

These can be treated as “internal implementation detail” inside RML-1.

But the moment you touch something observable by others:

shared caches (Redis)
shared temp tables referenced by other services
logs used for operations / audits

…you should treat it as RML-2/3, even if the write is idempotent.

A simple rule:

RML-1: only unobservable side effects
RML-2+: any side effect observable by another service or human

4) Common pitfalls that break RML-1

4.1 “Logging is harmless” (it often isn’t)

A classic trap:

“It’s just logs, so it’s still RML-1.”

But logs come in two categories:

purely technical logs (latency, internal state)
- often fine in RML-1
business-meaning / audit logs
- if they will be referenced later, you’re effectively writing history

If “writing logs” becomes “writing history,” that’s RML-3 behavior.

Guideline:

keep RML-1 logs to those not used for business decisions
treat audit/transaction logs as RML-3 artifacts

4.2 “It’s staging, so it’s safe” (worlds don’t change)

Staging can still be RML-2/3 if:

it connects to real payment gateways
it sends real emails
it hits real partners

Staging is an environment; the world is defined by observability and effect leakage.

4.3 “Let’s add a notification to our simulation feature”

This is the temptation:

“Since we computed it, let’s email the result to someone.”

The moment you do that:

the world graduates to RML-2/3
humans act on it, and you’ve created irreversible reality

Countermeasure:

decide up front: RML-1 features must not send external notifications
if sharing is needed, make it a two-step flow:
- copy results out
- share via an explicit RML-2/3 UI path

5) Detecting “we thought it was RML-1, but it wasn’t”

The scariest state is:

everyone believes it’s RML-1
but it actually behaves like RML-2/3.

5.1 Checklist

If any of these are “yes,” be suspicious:

Does the result leave logs that change someone’s decision-making later?
Does it write to a DB referenced by other services?
Have you ever received a question about data produced by this flow?
Do you need to explain this behavior via ToS/SLA/compliance?

If you get even one yes:

“Are we sure this isn’t at least RML-2?”

5.2 A pragmatic approach

Perfect boundaries are hard. Start from extremes:

expand “clearly RML-1-safe” areas
label “clearly not RML-1” areas

Then fill the middle over time.

6) How to use this at work: label RML-1 in your project

6.1 Label features

Try labels like:

RML-1 strict (no observable effects)
RML-1 read-only (production reads allowed, no writes)
RML-2+ (effects exist)

Example:

Feature	RML label	Notes
Scoring simulation	RML-1 read-only	production reads, no writes
Auto assignment for A/B	RML-2+	writes to user attributes
CSV export preview	RML-1 strict	view-only, no file creation

This makes a new kind of proposal easy:

“Let’s keep this RML-1 strict and move effects into a separate flow.”

6.2 Create an “RML-1-only” module boundary

Even a soft code convention helps:

app/
  world_rml1/
    scoring/
    validation/
    simulation/
  world_rml2/
    saga/
    effect_dispatcher/
  world_rml3/
    ledger/
    incident/

You don’t need perfection. You need a shared boundary.

7) Summary: build the room first

RML-1 = a temporary world not observable from outside
a well-designed RML-1 enables safe PoCs on real data and makes exits visible
key patterns:
- read-only + dry-run
- consolidate exits (effect dispatcher)
- embed RML-1 modes inside production (preview/simulation/shadow)
key pitfalls:
- logs that are actually history
- staging ≠ RML-1
- mixing notification/writes into simulations
a practical rule:
- RML-1 allows only unobservable side effects

RML-1 isn’t just “testing convenience.”

It’s the foundational work of agreeing:
Where does the world start to change?

In the next chapter, we’ll step outside the room:

Chapter 3 — RML-2 (Dialog World): rollback as conversation between services and humans.

DEV Community