kanaria007

Posted on Mar 5 • Originally published at zenn.dev

Chapter 7 — API & Client Design: Exposing RML Labels Across the Boundary

#distributedsystems #api #sre #microservices

The Worlds of Distributed Systems — Chapter 7

In Chapters 5–6, we made RML-2 concrete:

Attach world / severity / action to failures
Connect them to observability and on-call policy
Treat sagas as retryable conversations, where retry-with-backoff, start-compensation, and escalate-history are state machine triggers
Treat idempotency keys as the lifeline of retries

This chapter brings that worldview to the most important interface in real systems:

The boundary between API and client.
“What world is burning behind this HTTP 500?”

We’ll design APIs and clients so they can share RML semantics and enforce correct behavior—rather than leaving everything to guesswork.

1) HTTP status codes don’t tell you which world you’re in

A typical API error looks like this:

HTTP/1.1 500 Internal Server Error
Content-Type: application/json

{ "message": "Internal Server Error" }

From the client’s perspective, this tells you almost nothing:

Was it invalid input (RML-1-ish)?
Was it a transient downstream failure (RML-2 retryable)?
Did we hit a history-bound rule (RML-3 escalation / no retries)?

In RML terms:

“Which world is this error from: RML-1 / RML-2 / RML-3?”
“What should I do next?”

This chapter’s goal:

Make API/client boundaries RML-aware
Let clients interpret Action Hints
Prevent the worst outcome: automation-by-guessing

2) Put RML in API responses: body vs headers vs status

Think of an API response as three layers:

HTTP status
HTTP headers
JSON body

Each has a natural role.

2.1 JSON body: the core error object

Body is what humans and clients read first.

A practical shape:

{
  "code": "PAYMENT_ALREADY_SETTLED",
  "message": "This payment is already settled and cannot be canceled.",
  "world": "RML3",
  "action": "escalate-history",
  "details": {
    "paymentId": "pay_123",
    "settledAt": "2025-12-01T10:23:45Z"
  }
}

code: stable branching key (logging, UI mapping, analytics)
message: human-facing message (UI/logs)
world / action: the RML metadata
details: domain-specific context

Public APIs note: you may choose to expose fewer fields in the body (e.g., omit world/action) and put RML only in headers. Internal APIs can usually include everything.

2.2 HTTP headers: RML metadata + retry control

Headers are ideal for machine-readable policy and cross-cutting behavior:

HTTP/1.1 409 Conflict
Content-Type: application/json
X-RML-World: RML3
X-RML-Action: escalate-history
Retry-After: 0
X-Idempotency-Key: saga:12345:charge
X-Request-ID: req-abcde

X-RML-World: RML1 / RML2 / RML3
X-RML-Action:
- retry-local
- retry-with-backoff
- start-compensation
- escalate-history
- abort
Retry-After: suggested delay (especially for retry-with-backoff)
X-Idempotency-Key: the retry lifeline (when relevant)
X-Request-ID: ties UI and support workflows to traces

Now clients can behave consistently:

RML3 + escalate-history → never auto-retry, show support path / ticket ID
RML2 + retry-with-backoff → retry using the same idempotency key
RML1 + abort → show validation feedback immediately

2.3 HTTP status: keep it coarse

Status codes are a coarse bucket. That’s fine.

A useful mapping (not universal, but practical):

world	action	status (typical)	example
RML1	abort	400 / 422	validation failure
RML2	retry-with-backoff	502 / 503	transient downstream outage
RML2	start-compensation	409 / 500	saga step failed mid-flow
RML3	escalate-history	409 / 422 / 403	settled payment; forbidden reversal

Don’t try to encode everything in status codes.
Encode the meaning in X-RML-World and X-RML-Action.

3) Internal API vs BFF vs Public API: different exposure levels

3.1 Internal microservice APIs

Assumption: callers understand RML.

So you can return full metadata:

{
  "code": "STOCK_RESERVE_FAILED",
  "message": "Failed to reserve stock.",
  "world": "RML2",
  "action": "start-compensation",
  "details": { "sku": "SKU-1234", "warehouse": "TOKYO-1" }
}

BFFs, orchestrators, and gateways can interpret this directly.

3.2 BFF (Backend for Frontend) as translator

BFF sits between:

RML-aware backend world
UI that must be understandable to humans

Typical responsibilities:

Receive world/action from downstream
Translate to a UI-level error model:

which field to highlight
whether to auto-retry
whether to show a “contact support” path
1. Optionally soften messaging for end users

Example translation:

Downstream returns:

{
  "code": "PAYMENT_ALREADY_SETTLED",
  "world": "RML3",
  "action": "escalate-history",
  "message": "Payment already settled; cannot cancel."
}

BFF outputs to UI:

End-user message: “This payment can’t be canceled here. Please contact support.”
Internal behavior: enqueue incident/case + attach requestId

3.3 Public APIs

For external developers, you must decide what becomes part of the external contract.

A common approach:

Body: code/message/details only
Headers: X-RML-World, X-RML-Action, Retry-After, X-Request-ID

External clients can still follow simple rules:

RML2 + retry-with-backoff → retry safely
RML3 + escalate-history → do not retry; escalate to human flow

4) The RML-aware client (SDK) pattern

If you expose X-RML-* but nobody reads it, nothing changes.

So the core move is:

Provide an RML-aware client wrapper and standardize its use.

4.1 Parse RML from headers

type World = "RML1" | "RML2" | "RML3";
type Action =
  | "retry-local"
  | "retry-with-backoff"
  | "start-compensation"
  | "escalate-history"
  | "abort";

type RmlMeta = { world?: World; action?: Action };

function asWorld(v: string | null): World | undefined {
  if (v === "RML1" || v === "RML2" || v === "RML3") return v;
  return undefined;
}

function asAction(v: string | null): Action | undefined {
  if (
    v === "retry-local" ||
    v === "retry-with-backoff" ||
    v === "start-compensation" ||
    v === "escalate-history" ||
    v === "abort"
  ) return v;
  return undefined;
}

function parseRml(res: Response): RmlMeta {
  return {
    world: asWorld(res.headers.get("X-RML-World")),
    action: asAction(res.headers.get("X-RML-Action")),
  };
}

4.2 Enforce Action Hints

function sleep(ms: number) {
  return new Promise<void>((r) => setTimeout(r, ms));
}

function backoffWithJitter(baseMs: number, attempt: number) {
  const capped = Math.min(baseMs * 2 ** attempt, 10_000);
  const jitter = Math.random() * (capped * 0.2); // ~ +0–20%
  return capped + jitter;
}

export async function fetchWithRmlHandling(
  input: RequestInfo,
  init?: RequestInit,
  opts?: { maxRetries?: number; baseDelayMs?: number }
): Promise<Response> {
  const maxRetries = opts?.maxRetries ?? 3;
  const baseDelayMs = opts?.baseDelayMs ?? 500;

  let attempt = 0;

  while (true) {
    const res = await fetch(input, init);
    if (res.ok) return res;

    const { world, action } = parseRml(res);

    // If no RML metadata: behave like a normal fetch client.
    if (!world || !action) return res;

    // Never auto-retry history-bound failures.
    if (action === "escalate-history" || world === "RML3") return res;

    if (action === "retry-local" || action === "retry-with-backoff") {
      if (attempt >= maxRetries) return res;
      attempt++;

      const retryAfterHeader = res.headers.get("Retry-After");
      const retryAfterMs = retryAfterHeader ? Number(retryAfterHeader) * 1000 : 0;

      const delay =
        action === "retry-with-backoff"
          ? (retryAfterMs || backoffWithJitter(baseDelayMs, attempt))
          : 0;

      if (delay > 0) await sleep(delay);
      continue;
    }

    // For saga compensation triggers, return the response to caller (or hook your saga runner here).
    if (action === "start-compensation") return res;

    // Default: do not invent behavior.
    return res;
  }
}

Key point: action hints only matter if you enforce them via shared code.

5) UX by world: don’t show “RML3 error” to users

RML metadata is primarily for clients/ops—not end users.

A healthy UX mapping:

RML-1 (validation / local issues)
- highlight fields, inline errors
- “Please fix and resubmit”
RML-2 (transient dialog failures)
- “Temporary issue. Retrying…”
- keep spinner, auto-retry with backoff
- after N retries: “Please try later”
RML-3 (history-bound)
- “This can’t be completed here. Contact support.”
- show requestId / case ID
- quietly create an incident/case on the backend

6) Gateway governance: centralize retry policies and detect bad clients

If you emit RML headers, gateways can enforce org-level safety.

6.1 Centralized retry rules

If X-RML-World=RML2 and X-RML-Action=retry-with-backoff:
- gateway retries a limited number of times
If X-RML-World=RML3:
- gateway forbids retries

6.2 Detect clients ignoring Action Hints

If you see:

API responds RML3 + escalate-history
same client continues calling at high frequency

That’s a sign they ignore the contract. Gateways can:

rate limit
block
notify developers

7) GraphQL / gRPC: where to carry `world/action`

RML is not REST-specific.

7.1 GraphQL (use `extensions`)

{
  "data": null,
  "errors": [
    {
      "message": "Payment already settled; cannot cancel.",
      "path": ["cancelPayment"],
      "extensions": {
        "code": "PAYMENT_ALREADY_SETTLED",
        "world": "RML3",
        "action": "escalate-history",
        "requestId": "req-abcde"
      }
    }
  ]
}

7.2 gRPC (metadata + rich status details)

Use:

gRPC status codes (FAILED_PRECONDITION, UNAVAILABLE, etc.)
metadata/trailers: x-rml-world, x-rml-action
optional structured error details message

The idea is the same: clients must be able to route behavior based on world/action.

8) Anti-patterns

8.1 You defined `X-RML-*`, but nobody reads it

Result: you changed nothing. Fix it by:

providing an RML-aware SDK wrapper
enforcing “no raw fetch/http client” via linting or code review rules

8.2 “World” semantics drift across services

If one service uses RML3 to mean “non-retryable policy violation” and another uses RML3 to mean “temporary restriction,” clients become confused and unsafe.

Fix it by:

having a single org-level RML meaning table
documenting endpoint guarantees like “This endpoint never returns RML3”

8.3 Exposing RML jargon directly to end users

“RML3 error occurred” is meaningless to users.

Fix it by:

translating at the BFF/UI layer
showing meaningful messages + a support path + request ID

9) Checklist

API side

[ ] JSON body includes code, message, and optional details
[ ] Headers carry X-RML-World, X-RML-Action, Retry-After, X-Request-ID
[ ] RML3 + escalate-history returns a non-retryable status (often 409/422/403)
[ ] Idempotency is contractually supported where retries exist

Client / BFF side

[ ] Use an RML-aware shared client wrapper
[ ] Behavior branches on world/action (retry / compensate / escalate)
[ ] UX differs per world (validation vs retry vs support)
[ ] Public API docs explain RML headers clearly

Gateway / SRE side

[ ] Central retry policy exists for RML2 + retry-*
[ ] Retry forbidden for RML3
[ ] Detection exists for clients ignoring Action Hints

Closing — carrying the worldview across boundaries makes responsibility easier

The core takeaway:

HTTP status codes alone can’t express RML semantics
Add world/action in headers (and optionally body)
Build an RML-aware client wrapper so Action Hints are enforced
Translate RML semantics into UX, gateway policy, and incident workflows
Apply the same idea across REST, GraphQL, and gRPC

At the end of Part II, we now have the full “Dialog World” stack:

Chapter 5: Failure design + observability + governance
Chapter 6: Sagas and compensations as retryable conversations
Chapter 7: API/client boundary carrying the worldview

Next, Part III moves the focus fully into RML-3 (History World)—where distributed systems meet legal responsibility, money, and social trust.

DEV Community

Chapter 7 — API & Client Design: Exposing RML Labels Across the Boundary

1) HTTP status codes don’t tell you which world you’re in

2) Put RML in API responses: body vs headers vs status

2.1 JSON body: the core error object

2.2 HTTP headers: RML metadata + retry control

2.3 HTTP status: keep it coarse

3) Internal API vs BFF vs Public API: different exposure levels

3.1 Internal microservice APIs

3.2 BFF (Backend for Frontend) as translator

3.3 Public APIs

4) The RML-aware client (SDK) pattern

4.1 Parse RML from headers

4.2 Enforce Action Hints

5) UX by world: don’t show “RML3 error” to users

6) Gateway governance: centralize retry policies and detect bad clients

6.1 Centralized retry rules

6.2 Detect clients ignoring Action Hints

7) GraphQL / gRPC: where to carry `world/action`

7.1 GraphQL (use `extensions`)

7.2 gRPC (metadata + rich status details)

8) Anti-patterns

8.1 You defined `X-RML-*`, but nobody reads it

8.2 “World” semantics drift across services

8.3 Exposing RML jargon directly to end users

9) Checklist

API side

Client / BFF side

Gateway / SRE side

Closing — carrying the worldview across boundaries makes responsibility easier

Top comments (0)

1) HTTP status codes don’t tell you which world you’re in

2) Put RML in API responses: body vs headers vs status

2.1 JSON body: the core error object

2.2 HTTP headers: RML metadata + retry control

2.3 HTTP status: keep it coarse

3) Internal API vs BFF vs Public API: different exposure levels

3.1 Internal microservice APIs

3.2 BFF (Backend for Frontend) as translator

3.3 Public APIs

4) The RML-aware client (SDK) pattern

4.1 Parse RML from headers

4.2 Enforce Action Hints

5) UX by world: don’t show “RML3 error” to users

6) Gateway governance: centralize retry policies and detect bad clients

6.1 Centralized retry rules

6.2 Detect clients ignoring Action Hints

7) GraphQL / gRPC: where to carry world/action

7.1 GraphQL (use extensions)

7.2 gRPC (metadata + rich status details)

8) Anti-patterns

8.1 You defined X-RML-*, but nobody reads it

8.2 “World” semantics drift across services

8.3 Exposing RML jargon directly to end users

9) Checklist

API side

Client / BFF side

Gateway / SRE side

Closing — carrying the worldview across boundaries makes responsibility easier

7) GraphQL / gRPC: where to carry `world/action`

7.1 GraphQL (use `extensions`)

8.1 You defined `X-RML-*`, but nobody reads it