DEV Community

kanaria007
kanaria007

Posted on • Originally published at zenn.dev

Chapter 7 — API & Client Design: Exposing RML Labels Across the Boundary

The Worlds of Distributed Systems — Chapter 7

In Chapters 5–6, we made RML-2 concrete:

  • Attach world / severity / action to failures
  • Connect them to observability and on-call policy
  • Treat sagas as retryable conversations, where retry-with-backoff, start-compensation, and escalate-history are state machine triggers
  • Treat idempotency keys as the lifeline of retries

This chapter brings that worldview to the most important interface in real systems:

The boundary between API and client.
“What world is burning behind this HTTP 500?”

We’ll design APIs and clients so they can share RML semantics and enforce correct behavior—rather than leaving everything to guesswork.


1) HTTP status codes don’t tell you which world you’re in

A typical API error looks like this:

HTTP/1.1 500 Internal Server Error
Content-Type: application/json

{ "message": "Internal Server Error" }
Enter fullscreen mode Exit fullscreen mode

From the client’s perspective, this tells you almost nothing:

  • Was it invalid input (RML-1-ish)?
  • Was it a transient downstream failure (RML-2 retryable)?
  • Did we hit a history-bound rule (RML-3 escalation / no retries)?

In RML terms:

“Which world is this error from: RML-1 / RML-2 / RML-3?”
“What should I do next?”

This chapter’s goal:

  • Make API/client boundaries RML-aware
  • Let clients interpret Action Hints
  • Prevent the worst outcome: automation-by-guessing

2) Put RML in API responses: body vs headers vs status

Think of an API response as three layers:

  • HTTP status
  • HTTP headers
  • JSON body

Each has a natural role.

2.1 JSON body: the core error object

Body is what humans and clients read first.

A practical shape:

{
  "code": "PAYMENT_ALREADY_SETTLED",
  "message": "This payment is already settled and cannot be canceled.",
  "world": "RML3",
  "action": "escalate-history",
  "details": {
    "paymentId": "pay_123",
    "settledAt": "2025-12-01T10:23:45Z"
  }
}
Enter fullscreen mode Exit fullscreen mode
  • code: stable branching key (logging, UI mapping, analytics)
  • message: human-facing message (UI/logs)
  • world / action: the RML metadata
  • details: domain-specific context

Public APIs note: you may choose to expose fewer fields in the body (e.g., omit world/action) and put RML only in headers. Internal APIs can usually include everything.

2.2 HTTP headers: RML metadata + retry control

Headers are ideal for machine-readable policy and cross-cutting behavior:

HTTP/1.1 409 Conflict
Content-Type: application/json
X-RML-World: RML3
X-RML-Action: escalate-history
Retry-After: 0
X-Idempotency-Key: saga:12345:charge
X-Request-ID: req-abcde
Enter fullscreen mode Exit fullscreen mode
  • X-RML-World: RML1 / RML2 / RML3
  • X-RML-Action:

    • retry-local
    • retry-with-backoff
    • start-compensation
    • escalate-history
    • abort
  • Retry-After: suggested delay (especially for retry-with-backoff)

  • X-Idempotency-Key: the retry lifeline (when relevant)

  • X-Request-ID: ties UI and support workflows to traces

Now clients can behave consistently:

  • RML3 + escalate-historynever auto-retry, show support path / ticket ID
  • RML2 + retry-with-backoff → retry using the same idempotency key
  • RML1 + abort → show validation feedback immediately

2.3 HTTP status: keep it coarse

Status codes are a coarse bucket. That’s fine.

A useful mapping (not universal, but practical):

world action status (typical) example
RML1 abort 400 / 422 validation failure
RML2 retry-with-backoff 502 / 503 transient downstream outage
RML2 start-compensation 409 / 500 saga step failed mid-flow
RML3 escalate-history 409 / 422 / 403 settled payment; forbidden reversal

Don’t try to encode everything in status codes.
Encode the meaning in X-RML-World and X-RML-Action.


3) Internal API vs BFF vs Public API: different exposure levels

3.1 Internal microservice APIs

Assumption: callers understand RML.

So you can return full metadata:

{
  "code": "STOCK_RESERVE_FAILED",
  "message": "Failed to reserve stock.",
  "world": "RML2",
  "action": "start-compensation",
  "details": { "sku": "SKU-1234", "warehouse": "TOKYO-1" }
}
Enter fullscreen mode Exit fullscreen mode

BFFs, orchestrators, and gateways can interpret this directly.

3.2 BFF (Backend for Frontend) as translator

BFF sits between:

  • RML-aware backend world
  • UI that must be understandable to humans

Typical responsibilities:

  1. Receive world/action from downstream
  2. Translate to a UI-level error model:
  • which field to highlight
  • whether to auto-retry
  • whether to show a “contact support” path
    1. Optionally soften messaging for end users

Example translation:

Downstream returns:

{
  "code": "PAYMENT_ALREADY_SETTLED",
  "world": "RML3",
  "action": "escalate-history",
  "message": "Payment already settled; cannot cancel."
}
Enter fullscreen mode Exit fullscreen mode

BFF outputs to UI:

  • End-user message: “This payment can’t be canceled here. Please contact support.”
  • Internal behavior: enqueue incident/case + attach requestId

3.3 Public APIs

For external developers, you must decide what becomes part of the external contract.

A common approach:

  • Body: code/message/details only
  • Headers: X-RML-World, X-RML-Action, Retry-After, X-Request-ID

External clients can still follow simple rules:

  • RML2 + retry-with-backoff → retry safely
  • RML3 + escalate-history → do not retry; escalate to human flow

4) The RML-aware client (SDK) pattern

If you expose X-RML-* but nobody reads it, nothing changes.

So the core move is:

Provide an RML-aware client wrapper and standardize its use.

4.1 Parse RML from headers

type World = "RML1" | "RML2" | "RML3";
type Action =
  | "retry-local"
  | "retry-with-backoff"
  | "start-compensation"
  | "escalate-history"
  | "abort";

type RmlMeta = { world?: World; action?: Action };

function asWorld(v: string | null): World | undefined {
  if (v === "RML1" || v === "RML2" || v === "RML3") return v;
  return undefined;
}

function asAction(v: string | null): Action | undefined {
  if (
    v === "retry-local" ||
    v === "retry-with-backoff" ||
    v === "start-compensation" ||
    v === "escalate-history" ||
    v === "abort"
  ) return v;
  return undefined;
}

function parseRml(res: Response): RmlMeta {
  return {
    world: asWorld(res.headers.get("X-RML-World")),
    action: asAction(res.headers.get("X-RML-Action")),
  };
}
Enter fullscreen mode Exit fullscreen mode

4.2 Enforce Action Hints

function sleep(ms: number) {
  return new Promise<void>((r) => setTimeout(r, ms));
}

function backoffWithJitter(baseMs: number, attempt: number) {
  const capped = Math.min(baseMs * 2 ** attempt, 10_000);
  const jitter = Math.random() * (capped * 0.2); // ~ +0–20%
  return capped + jitter;
}

export async function fetchWithRmlHandling(
  input: RequestInfo,
  init?: RequestInit,
  opts?: { maxRetries?: number; baseDelayMs?: number }
): Promise<Response> {
  const maxRetries = opts?.maxRetries ?? 3;
  const baseDelayMs = opts?.baseDelayMs ?? 500;

  let attempt = 0;

  while (true) {
    const res = await fetch(input, init);
    if (res.ok) return res;

    const { world, action } = parseRml(res);

    // If no RML metadata: behave like a normal fetch client.
    if (!world || !action) return res;

    // Never auto-retry history-bound failures.
    if (action === "escalate-history" || world === "RML3") return res;

    if (action === "retry-local" || action === "retry-with-backoff") {
      if (attempt >= maxRetries) return res;
      attempt++;

      const retryAfterHeader = res.headers.get("Retry-After");
      const retryAfterMs = retryAfterHeader ? Number(retryAfterHeader) * 1000 : 0;

      const delay =
        action === "retry-with-backoff"
          ? (retryAfterMs || backoffWithJitter(baseDelayMs, attempt))
          : 0;

      if (delay > 0) await sleep(delay);
      continue;
    }

    // For saga compensation triggers, return the response to caller (or hook your saga runner here).
    if (action === "start-compensation") return res;

    // Default: do not invent behavior.
    return res;
  }
}
Enter fullscreen mode Exit fullscreen mode

Key point: action hints only matter if you enforce them via shared code.


5) UX by world: don’t show “RML3 error” to users

RML metadata is primarily for clients/ops—not end users.

A healthy UX mapping:

  • RML-1 (validation / local issues)

    • highlight fields, inline errors
    • “Please fix and resubmit”
  • RML-2 (transient dialog failures)

    • “Temporary issue. Retrying…”
    • keep spinner, auto-retry with backoff
    • after N retries: “Please try later”
  • RML-3 (history-bound)

    • “This can’t be completed here. Contact support.”
    • show requestId / case ID
    • quietly create an incident/case on the backend

6) Gateway governance: centralize retry policies and detect bad clients

If you emit RML headers, gateways can enforce org-level safety.

6.1 Centralized retry rules

  • If X-RML-World=RML2 and X-RML-Action=retry-with-backoff:

    • gateway retries a limited number of times
  • If X-RML-World=RML3:

    • gateway forbids retries

6.2 Detect clients ignoring Action Hints

If you see:

  • API responds RML3 + escalate-history
  • same client continues calling at high frequency

That’s a sign they ignore the contract. Gateways can:

  • rate limit
  • block
  • notify developers

7) GraphQL / gRPC: where to carry world/action

RML is not REST-specific.

7.1 GraphQL (use extensions)

{
  "data": null,
  "errors": [
    {
      "message": "Payment already settled; cannot cancel.",
      "path": ["cancelPayment"],
      "extensions": {
        "code": "PAYMENT_ALREADY_SETTLED",
        "world": "RML3",
        "action": "escalate-history",
        "requestId": "req-abcde"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

7.2 gRPC (metadata + rich status details)

Use:

  • gRPC status codes (FAILED_PRECONDITION, UNAVAILABLE, etc.)
  • metadata/trailers: x-rml-world, x-rml-action
  • optional structured error details message

The idea is the same: clients must be able to route behavior based on world/action.


8) Anti-patterns

8.1 You defined X-RML-*, but nobody reads it

Result: you changed nothing. Fix it by:

  • providing an RML-aware SDK wrapper
  • enforcing “no raw fetch/http client” via linting or code review rules

8.2 “World” semantics drift across services

If one service uses RML3 to mean “non-retryable policy violation” and another uses RML3 to mean “temporary restriction,” clients become confused and unsafe.

Fix it by:

  • having a single org-level RML meaning table
  • documenting endpoint guarantees like “This endpoint never returns RML3”

8.3 Exposing RML jargon directly to end users

“RML3 error occurred” is meaningless to users.

Fix it by:

  • translating at the BFF/UI layer
  • showing meaningful messages + a support path + request ID

9) Checklist

API side

  • [ ] JSON body includes code, message, and optional details
  • [ ] Headers carry X-RML-World, X-RML-Action, Retry-After, X-Request-ID
  • [ ] RML3 + escalate-history returns a non-retryable status (often 409/422/403)
  • [ ] Idempotency is contractually supported where retries exist

Client / BFF side

  • [ ] Use an RML-aware shared client wrapper
  • [ ] Behavior branches on world/action (retry / compensate / escalate)
  • [ ] UX differs per world (validation vs retry vs support)
  • [ ] Public API docs explain RML headers clearly

Gateway / SRE side

  • [ ] Central retry policy exists for RML2 + retry-*
  • [ ] Retry forbidden for RML3
  • [ ] Detection exists for clients ignoring Action Hints

Closing — carrying the worldview across boundaries makes responsibility easier

The core takeaway:

  • HTTP status codes alone can’t express RML semantics
  • Add world/action in headers (and optionally body)
  • Build an RML-aware client wrapper so Action Hints are enforced
  • Translate RML semantics into UX, gateway policy, and incident workflows
  • Apply the same idea across REST, GraphQL, and gRPC

At the end of Part II, we now have the full “Dialog World” stack:

  • Chapter 5: Failure design + observability + governance
  • Chapter 6: Sagas and compensations as retryable conversations
  • Chapter 7: API/client boundary carrying the worldview

Next, Part III moves the focus fully into RML-3 (History World)—where distributed systems meet legal responsibility, money, and social trust.

Top comments (0)