DEV Community

kanaria007
kanaria007

Posted on • Originally published at zenn.dev

Chapter 3 — RML-2 (Dialog World): Rollback as a Conversation

The Worlds of Distributed Systems — Chapter 3

“Can we roll this back by ourselves?”
Or do we need to talk to someone who now shares the outcome?

In Chapter 2, we defined RML-1 (Closed World) as:

A room where nothing has escaped yet—so failure is safe.

In this chapter, we step outside that room and enter:

RML-2 — Dialog World
(RML = Rollback Maturity Levels: a simple map for how “rollback” changes as the blast radius grows.)

The key idea is simple:

Rollback is not a rewind button. It’s part of an ongoing conversation.


1) What is RML-2? — A world where “someone else exists”

If we define RML-2 in one line:

RML-2 is a world where there is someone else who shares the outcome.

1.1 The boundary between RML-1 and RML-2

RML-1 was “not observable from outside,” roughly meaning:

  • no external DB writes
  • no external API calls that cause real effects
  • no human notifications
  • no logs that later become inputs to business decisions or audits

The moment you cross into RML-2, the situation changes:

  • you write a record to a DB that other services read
  • you call another microservice (RPC / messaging)
  • you notify users (email / push)
  • you update a state that appears on dashboards or internal tooling

In short:

The moment your action becomes observable by someone else, you are in RML-2.

1.2 The boundary between RML-2 and RML-3

RML-2 is not yet the History World (RML-3).

A rough split:

  • RML-2

    • inside your company / product boundary
    • multiple services and humans can still settle outcomes via “conversation”
    • failures are often recoverable via retries, status checks, and compensation
  • RML-3

    • money, legal responsibility, social trust
    • other organizations / regulation / reality-level events
    • you can’t “make it not happen”—you can only layer corrections on top

A simple mental picture:

        Responsibility weight / “history weight”

 high ─────────────────────┐  RML-3: History World
      |                    |   - money, legal, social trust
      |         ┌──────────┘
      |         |  RML-2: Dialog World
      |         |   - multi-party conversation
      |   ┌─────┘
      |   |  RML-1: Closed World
      |   |   - inside one room/process
      +────────────────────────→  blast radius / scope of impact
Enter fullscreen mode Exit fullscreen mode

RML-2 is the middle zone: the conversation zone.


2) See distributed systems as conversations

A useful metaphor for RML-2:

A distributed system is a network of conversations.

2.1 Not Q&A, but multi-turn dialog

If you only look at a single HTTP call, it feels like one round trip:

  • client → server (request)
  • server → client (response)

But real operations are almost always multi-turn:

  1. ask the inventory service
  2. request authorization from payment
  3. dispatch shipping
  4. notify the user via email

When any step fails, you often need the continuation of the conversation:

  • cancel something you already requested
  • compensate with another operation
  • explain to the user and provide next steps

So, for RML-2:

Rollback is not “restart the flow.”
It’s “continue the dialog until the situation settles.”

2.2 Three kinds of “participants”

In RML-2, the “other side” typically comes in three forms:

  • Services (microservices / external APIs)

    • connected via HTTP/gRPC/messaging
    • governed by SLOs, retry policies, error contracts
  • Humans (users / operators)

    • connected via UI, email, Slack, phone
    • their actions (read/didn’t read, clicked/didn’t click) matter
  • Schedulers (queues / batch jobs / workflow engines)

    • delay and re-execution are normal
    • “I’ll come back later” is expected

Treating them all as “dialog participants” makes the system easier to reason about.

2.3 Timeouts and “silence” are also responses

Conversations include silence.

Distributed systems constantly face this:

  • you sent a request, but no response arrives (timeout)

Now you must decide:

  • did the request never arrive?
  • did it arrive, but the response got lost?
  • did the other side process it and just not reply?

This ambiguity is why RML-2 needs explicit “silence reactions”:

  • retry with backoff
  • check status / poll
  • cancel / start compensation

Worldview summary:

Silence is also part of the dialog.

(Later chapters will encode these as concrete patterns like retry-with-backoff, check-status, and start-compensation.)


3) The basic story of RML-2: Promise → Perform → Reconcile

In RML-2, most operations follow a three-step story:

  1. Promise — “Here’s what I’m asking for, under your rules.”
  2. Perform — “Someone actually did something (maybe).”
  3. Reconcile — “We align our views of what happened.”

3.1 Promise: you enter the other side’s assumptions

Calling an API is essentially:

“I want this outcome, under your rules and constraints.”

  • request formats
  • quotas / rate limits / size limits
  • error codes and retry semantics

If you provide an API, you’re promising things too:

  • what you guarantee on success vs failure
  • what is safe to retry
  • what the caller should do next (retry? give up? check status?)

3.2 Perform: the moment one side moves ahead

This is where classic distributed weirdness happens:

  • the server processed it, but the response was lost
  • the server crashed mid-flight
  • the client thinks it succeeded, while the server thinks it didn’t happen (or vice versa)

The important question becomes:

What does each participant believe the world looks like right now?

RML-2 failures are often belief gaps, not just exceptions.

3.3 Reconcile: not rewind—align histories

Reconciliation closes the belief gap:

  • ask again to confirm status
  • retry with idempotency keys
  • cancel the request
  • run compensation actions

Over time, you push the system toward:

“A state where all participants share a consistent story.”

This process is what people usually mean by eventual consistency:

You can’t instantly align everyone’s world.
But through dialog (retry, check, compensate), you converge.

RML-2 is the worldview that treats eventual consistency as designed conversation.


4) Talk about “how far rollback goes” in world terms

The key RML-2 design question:

How far do you intend this operation to be rollbackable—realistically?

4.1 Three categories of “rollback scope”

Roughly, RML-2 operations fall into three categories:

  1. Rollbackable by yourself
  • nothing meaningful has changed outside your boundary, or
  • your “undo” completes entirely within your own service
  1. Requires dialog with the other side
  • the other side has state that changed
  • you need cancel/correct/retry as additional dialog turns
  1. Already escalated into RML-3
  • the effect has reached “history”
  • rollback becomes refund/correction/explanation

RML-2 design is often:

How wide can you make category (1),
and how well-designed is category (2)?

4.2 Example: payment authorization vs capture

A common split:

  • Authorize (reserve funds) → still more RML-2-like (often cancellable)
  • Capture (actually charge) → more RML-3-like (refund/correction territory)

Making that boundary explicit helps:

  • which APIs are still dialog/compensation scope
  • which APIs must be treated as history-entry points

5) Failure patterns you should expect (worldview-level)

Implementation details come later (Part II), but the worldview-level traps are worth naming now.

5.1 “Only one side rolled back” (split history)

  • caller rolls back locally: “it never happened”
  • callee processed it: “it happened”

Now you have two incompatible timelines.

RML framing:

  • both participants are in RML-2
  • local rollback alone created a dialog-wide inconsistency

From there, you can’t fix it with local rewinds. Only with continued dialog (status checks, compensation, reconciliation).

5.2 Depending on an API that isn’t designed for rollback

  • you assume compensation exists (“we’ll cancel if needed”)
  • the other service provides no cancel/undo path

This is a worldview mismatch:

“We treated this as RML-2, but the other side behaves like RML-3.”

If an action is effectively irreversible, you must treat it as history-bound.


6) Humans are also RML-2 participants

RML-2 isn’t just service-to-service.

6.1 User dialog

  • UI status displays
  • “processing / completed” emails
  • chatbot responses

These are dialog turns.

If you get them wrong, you create:

  • the user believes the old timeline
  • the system believes the new timeline

That’s an RML-2 split history, but with a human participant.

6.2 Operator dialog

Customer support and operators are often:

Human reconcilers in the Dialog World.

They:

  • notice inconsistencies
  • trigger manual corrections
  • coordinate with users

So RML-2 design should explicitly include:

  • where automation stops
  • where humans take over
  • what “handoff” artifacts exist (logs, case IDs, reason codes)

7) RML-2 checklist (worldview edition)

7.1 Map the participants and the dialog

  • [ ] Who are the participants (services / humans / schedulers)?
  • [ ] How many turns can the conversation last?
  • [ ] If something fails mid-way, how do we continue the dialog to settle it?
  • [ ] What do we do on silence (timeouts)?

7.2 Define rollback scope

  • [ ] Which actions are rollbackable by yourself?
  • [ ] Which actions require dialog (cancel/correct/retry)?
  • [ ] Where is the boundary where this becomes RML-3?

7.3 Reconciliation ownership

  • [ ] How do we detect “only one side rolled back”?
  • [ ] Who is the reconciler (automated or human)?
  • [ ] What is the explicit dialog flow once inconsistency is detected?

Summary — RML-2 is not “transactions,” it’s conversation

What this chapter is trying to fix is a common misunderstanding:

RML-2 rollback is not a rewind operation.
It’s a conversation that continues until the world settles.

  • RML-1: inside your room
  • RML-2: multi-party dialog and reconciliation
  • RML-3: irreversible history and forward-only correction

If you treat RML-2 as “a simpler distributed transaction,” you’ll miss the real work.
If you treat it as “a conversation among participants,” design becomes clearer—especially around silence, retries, compensation, and eventual consistency.

Next:

Chapter 4 — RML-3 (History World): when you can’t delete history, only layer corrections.

Top comments (0)