Gabriel Anhaia

Posted on May 24

Design a Feature Flag Service: 100k SDK Clients and the SSE Protocol Reframe

#systemdesign #interview #distributedsystems #scalability

Book: System Design Pocket Guide: Interviews — 15 Real System Designs, Step by Step
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

"Design a feature flag service" sounds soft. It's the kind of prompt candidates think they can wing because they've used LaunchDarkly and Unleash. Then the interviewer asks the follow-up: "Your SDK ships in 100,000 production processes. How do they know when a flag changes?"

The naive answer hits the load balancer first. The right answer reframes the protocol before drawing a single box.

The interviewer's hidden question: how do you scale READS?

The shape of the system is asymmetric. Writes are rare: a product manager toggles a flag a few times a day. Reads are everywhere. Every request your fleet handles asks "is this flag on for this user?" at least once, sometimes dozens of times.

100,000 SDK clients is not your user count. It's your process count. If your app runs on 5,000 pods and each pod is a separate SDK instance, you're at 5,000. If your mobile app has a million daily actives and each device runs the SDK, you're at a million. The interviewer says "100k" to anchor you somewhere realistic for a mid-size SaaS. Push them on it. Ask whether the SDKs live in your backend fleet or on end-user devices. The protocol changes.

The real question under the prompt: given that flag config is small (kilobytes) and changes rarely, how do you make every SDK in the world see the new value in under a second without melting your origin?

If you start sketching MySQL and a GET /flags/:key endpoint, you've answered the wrong question. The interviewer wants you to notice that this is a read-distribution problem with a publish-subscribe shape, not a CRUD app.

Naive design: REST polling, and why it dies at 100k clients

The first instinct is GET /api/flags. The SDK polls every 30 seconds. Toggle latency is bounded by the poll interval. Done.

Walk the math on the whiteboard. 100,000 clients, 30-second poll, that's 3,333 requests per second on average. Sustainable. Except every SDK starts at process boot and processes boot in waves: a deploy of 5,000 pods finishes in 90 seconds and you've stacked thousands of polls on top of each other. Now your p99 latency on the flag endpoint spikes and your SDKs time out at startup, which means your app boots without flags, which means defaults everywhere, which means a silent incident.

Lower the poll interval to 5 seconds and you're at 20,000 req/s sustained. You can cache aggressively at the edge, but you've also turned a flag toggle into a 5-second worst-case propagation. Not interview-grade.

Raise the interval to 5 minutes and your incident-response story collapses. The SRE flips the kill switch and waits five minutes for the bad code path to stop firing. The PM who shipped the broken experiment is already on a call.

The pattern: polling forces a tradeoff between propagation latency and origin load that gets worse linearly with client count. There is no value of pollInterval that wins. The protocol itself is wrong.

The reframe: SSE push from server, clients hold the connection

Server-Sent Events. One long-lived HTTP connection per SDK, opened at startup, held open by the server, written to only when a flag changes. The flag toggle becomes O(N) writes across N open sockets instead of O(N) polls per interval.

Why SSE over WebSocket: unidirectional fits the problem (server tells client, client never tells server), it's plain HTTP so corporate proxies don't choke, browsers and most HTTP libraries support it natively, and reconnect-with-Last-Event-ID is part of the spec. WebSocket is fine if you also need client-to-server messages, but for flag distribution you don't.

Here's the wire-level protocol an SDK should implement. Real SSE, not pseudo-code:

import requests
import json
import time
from typing import Iterator

class FlagStream:
    def __init__(self, base_url: str, sdk_key: str, env: str):
        self.base_url = base_url
        self.sdk_key = sdk_key
        self.env = env
        self.last_event_id: str | None = None
        self.flags: dict[str, dict] = {}

    def stream(self) -> Iterator[dict]:
        # exponential backoff on disconnect — flags service is best-effort
        backoff = 1.0
        while True:
            try:
                headers = {
                    "Authorization": f"Bearer {self.sdk_key}",
                    "Accept": "text/event-stream",
                    "Cache-Control": "no-cache",
                }
                if self.last_event_id:
                    headers["Last-Event-ID"] = self.last_event_id

                url = f"{self.base_url}/sdk/stream?env={self.env}"
                # stream=True is the whole point — don't buffer the response
                with requests.get(url, headers=headers, stream=True,
                                  timeout=(5, None)) as resp:
                    resp.raise_for_status()
                    backoff = 1.0  # reset on successful connect
                    yield from self._parse_events(resp)
            except (requests.RequestException, ConnectionError):
                # keep last-known flags; never block app startup on this
                time.sleep(min(backoff, 30))
                backoff *= 2

    def _parse_events(self, resp) -> Iterator[dict]:
        event_type = "message"
        data_buffer = []
        event_id = None

        for raw in resp.iter_lines(decode_unicode=True):
            if raw == "":
                # blank line = dispatch the event
                if data_buffer:
                    payload = json.loads("\n".join(data_buffer))
                    if event_id:
                        self.last_event_id = event_id
                    yield {"type": event_type, "data": payload}
                event_type = "message"
                data_buffer = []
                event_id = None
                continue

            if raw.startswith(":"):
                continue  # SSE comment / keepalive
            field, _, value = raw.partition(":")
            value = value.lstrip(" ")
            if field == "event":
                event_type = value
            elif field == "data":
                data_buffer.append(value)
            elif field == "id":
                event_id = value

The server side is symmetrically simple. On connect, send the full flag snapshot as a put event. On every subsequent change, send a patch event with just the diff.

# server-side SSE handler — FastAPI / Starlette
from fastapi import FastAPI, Request
from sse_starlette.sse import EventSourceResponse
import asyncio

app = FastAPI()

@app.get("/sdk/stream")
async def stream(request: Request, env: str, last_event_id: str | None = None):
    async def event_generator():
        # 1) snapshot — bring the client to current state
        snapshot, version = await flag_store.get_snapshot(env)
        yield {
            "event": "put",
            "id": str(version),
            "data": json.dumps({"flags": snapshot, "version": version}),
        }

        # 2) live patches — pubsub fan-out from the write path
        async for change in flag_pubsub.subscribe(env, since=version):
            if await request.is_disconnected():
                break
            yield {
                "event": "patch",
                "id": str(change.version),
                "data": json.dumps(change.diff),
            }

    return EventSourceResponse(
        event_generator(),
        ping=15,  # send :keepalive every 15s for proxy timeouts
    )

Capacity changes character. A modern Linux box holds 200k+ open SSE connections with a sane file-descriptor limit and a non-blocking server. Five such gateway nodes cover the 100k fleet with 5x headroom. The hard part stops being throughput and becomes connection lifecycle: graceful drain on deploy, half-open detection, idle-killer proxies in front of you.

Edge caching: flag evaluations at the CDN edge for read-anywhere clients

For browser SDKs and mobile SDKs, even SSE is too chatty. Every cold start opens a connection, downloads the snapshot, then holds an idle socket. On a flaky mobile network you'd rather not.

The reframe again: push the evaluation to the edge. Flag config is small enough that it fits in a CloudFlare Worker, a Fastly Compute@Edge function, or a Lambda@Edge handler. The SDK calls one HTTP endpoint, the edge worker has the flag rules cached locally, and the answer comes back from the nearest PoP in 30ms.

// CloudFlare Worker — evaluates a single flag at the edge
// flag config is hydrated from KV (CF's edge KV store)

export default {
  async fetch(req, env) {
    const url = new URL(req.url);
    const flagKey = url.pathname.split("/").pop();
    const ctx = await req.json(); // { userId, attrs }

    // KV hit is sub-ms at the edge; miss falls through to origin
    const cfgRaw = await env.FLAGS_KV.get(`flag:${flagKey}`, "json");
    if (!cfgRaw) {
      return new Response(JSON.stringify({ value: null, reason: "UNKNOWN" }),
        { status: 404 });
    }

    const result = evaluate(cfgRaw, ctx);

    // 60s edge cache, but vary on the bucket — not the userId itself
    // (don't blow up the cache key space)
    const bucket = stickyBucket(ctx.userId, flagKey);
    const headers = new Headers({
      "Content-Type": "application/json",
      "Cache-Control": "public, s-maxage=60",
      "X-Flag-Bucket": String(bucket),
    });
    return new Response(JSON.stringify(result), { headers });
  },
};

function evaluate(cfg, ctx) {
  // 1) kill switch — fast path
  if (!cfg.enabled) {
    return { value: cfg.offVariation, reason: "OFF" };
  }

  // 2) targeting rules — explicit user/segment overrides
  for (const rule of cfg.rules ?? []) {
    if (matches(rule.clauses, ctx.attrs)) {
      return { value: rule.variation, reason: "TARGET_MATCH" };
    }
  }

  // 3) percentage rollout — bucket the user, compare to threshold
  const bucket = stickyBucket(ctx.userId, cfg.salt);
  for (const variant of cfg.rollout) {
    if (bucket < variant.cumulativeWeight) {
      return { value: variant.value, reason: "ROLLOUT" };
    }
  }
  return { value: cfg.fallthrough, reason: "FALLTHROUGH" };
}

Edge cache invalidation runs off the same pubsub channel the SSE gateway uses. When a flag changes, you push a KV update to every edge PoP and the next request reads the new config. Propagation is dominated by KV replication time, which on CloudFlare KV is sub-second globally.

The gotcha: edge caching only works for flag values that don't depend on per-request secrets. If the flag rules reference ctx.attrs.email to do regex matching, you can't cache the response; the cache key would explode. Restrict edge evaluation to flags with bucket-based rollouts and named-segment matches; route attribute-heavy evaluations back to origin.

Flag-config distribution: durable store, snapshot to object storage, push diffs over pubsub

Behind the gateway and the edge sits the source of truth. The write path is low-traffic: a dashboard call writes a new flag version to a relational store (Postgres, simple), bumps a monotonic version counter, and publishes a change event to Redis pubsub or NATS.

The read path is everything. Gateway pods don't read Postgres on every SDK request; they hold the flag set in process memory and subscribe to the same pubsub channel. On boot, a gateway reads a snapshot from S3 (refreshed by a background job every 30s) so a cold restart of the entire fleet doesn't herd against Postgres.

S3 plus CloudFront is also the SDK-side fallback channel. If the SSE connection won't establish (corporate proxy strips the connection, mobile network is hostile, the gateway is down), the SDK falls back to a 60-second polled GET of flags-{env}-{version}.json from a public CloudFront URL. Slower, lossier, but the application boots with real flag values instead of compiled-in defaults.

The pattern to name in the interview: write to a durable store, snapshot to object storage for cold-start, push diffs over pubsub for hot fan-out, fall back to polled snapshots for hostile networks. Three independent paths, ranked by cost and latency.

SDK-side caching with TTL + fallback (offline-safe)

Every flag evaluation must answer in microseconds. The SDK keeps the full flag set in memory and serves evaluations locally. The SSE stream keeps the in-memory copy fresh.

When the connection drops, the SDK keeps serving the last-known values. That's the offline-safe property. No timeout, no fallback default unless the SDK has never received a snapshot. Add a lastSyncedAt field on the SDK that your monitoring scrapes; a process that's been disconnected from the flag service for 10 minutes is a real signal, but it shouldn't crash the request path.

Compiled-in defaults belong in the application code, not the SDK. The contract is: if the SDK has no value for this key, return the default that the calling code supplied. The application owner decides what "off" looks like for that specific flag, not the flag platform.

Targeting and rules engine: boolean predicates, rollout percentages, sticky bucketing

Three primitives, in order: kill switch, targeting rules, percentage rollout. Already shown in the edge evaluator. Worth saying out loud in the interview because it answers "what's a flag actually evaluating?"

Sticky bucketing is the load-bearing piece. When you say "5% rollout", a given user must always land in the same bucket. Otherwise the user flips between treatment and control across requests, ruins your experiment, and corrupts your analytics.

# the bucketing hash — identical implementation on every SDK,
# every edge worker, every backend evaluator. one source of truth.
import hashlib

def sticky_bucket(user_id: str, flag_salt: str, total_buckets: int = 10_000) -> int:
    # SHA-1, not for security — for stable, uniform distribution across languages.
    # Every SDK ships the same impl. If you swap algorithms you re-bucket
    # every user mid-experiment, which is a silent data corruption bug.
    h = hashlib.sha1(f"{flag_salt}.{user_id}".encode("utf-8")).digest()
    # take 4 bytes, big-endian, mod the bucket count
    n = int.from_bytes(h[:4], "big")
    return n % total_buckets

Notice the salt. Without it, the same user lands in bucket 4,217 for every flag, so a user in the 5% rollout of flag A is also in the 5% rollout of flag B, C, and D. Correlation across experiments destroys your stats. The salt is usually the flag key itself plus a per-flag random string set at flag creation.

Why total_buckets=10_000: lets you express rollout in basis points (0.01% granularity), enough precision for ramp schedules and small canary groups.

The 90-second answer that wins the round

When the interviewer drops the prompt, talk for 90 seconds before drawing anything:

"Feature flags are a read-heavy, write-rare distribution problem. The naive GET /flags design fails at 100k SDK clients because polling forces a bad tradeoff between propagation latency and origin load. I'd reframe to a push protocol: SSE from a stateless gateway tier, with each SDK holding one long-lived connection that receives a full snapshot on connect and patches on every flag change. The gateways subscribe to a pubsub channel (Redis or NATS) that the write path publishes to. The source of truth is Postgres for writes and S3-plus-CDN for cold-start snapshots, so a fleet restart doesn't herd. For browser and mobile SDKs I'd push evaluation to the edge via CloudFlare Workers or Lambda@Edge, with flag config replicated to edge KV; that gives 30ms response from the nearest PoP. SDKs cache locally and evaluate in microseconds. Sticky bucketing uses a salted SHA-1 hash with the same implementation in every SDK, edge worker, and backend evaluator. The whole system is offline-safe because the SDK keeps serving last-known values when the stream drops."

That's the answer. Now they ask follow-ups and you draw boxes.

The gotcha: sticky bucketing requires a deterministic hash, and every SDK must agree

The single failure mode that sinks real flag platforms: hash drift across SDKs.

Your Node SDK uses MurmurHash3 because someone copy-pasted from LaunchDarkly's old open-source SDK. Your Go SDK uses FNV-1a because Go's stdlib has it. Your Python SDK uses MD5 because that's what the first engineer reached for. The same user gets bucketed three different ways. You roll out a flag to 10% and you actually hit 27% of users because each SDK has independent randomness.

Worse: the bug is invisible. Aggregate counts look right (10% of evaluations across the fleet return the new variant), but per-user consistency is gone. An experiment that should detect a 2% conversion lift sees noise. A canary that should affect 5% of traffic affects different 5%-slices in different SDKs.

The fix is governance, not code: one bucketing spec, written down, with test vectors. Every SDK ships a test_sticky_bucket.py (or _test.go, etc.) with at least 20 (user_id, flag_salt) -> expected_bucket pairs. CI fails if any SDK disagrees with the canonical vectors. When you change the algorithm, you bump a bucketingVersion field on every flag and run the old and new algorithms in parallel during the cutover.

If the interviewer is sharp they'll ask about this. Bring it up unprompted and you've shown you've actually shipped one of these systems.

If this was useful

This pattern (protocol reframe, push beats poll, edge evaluation, deterministic bucketing) is one of fifteen full system designs walked end-to-end in System Design Pocket Guide: Interviews. The feature-flag design lives next to the rate-limiter, the URL shortener, and the notification system; each one structured around the 90-second answer and the follow-up questions that actually decide the round.

What's the gnarliest follow-up you've been asked on this kind of design? Drop it in the comments and I'll work through it.