Rizwan Saleem

Posted on Jun 4

Building a developer-friendly feature flag system: architecture, best practices, and a practical imp

#frontend #typescript #webdev

Building a developer-friendly feature flag system: architecture, best practices, and a practical imp

Building a developer-friendly feature flag system: architecture, best practices, and a practical implementation

Feature flags are a powerful tool for shipping faster, reducing risk, and enabling safer experimentation. A well-designed system lets engineers toggle features in production without redeploys, runs efficiently at scale, and remains operable under load. This guide walks you through designing and implementing a robust, developer-friendly feature flag system from ground up, with concrete code examples, operational considerations, and a practical rollout plan.

1) Define the goals and scope

Decide what your feature flag system should achieve. Common goals:

Fine-grained control: flags at per-user, per-team, or per-request granularity.
Safe rollouts: gradual, percentage-based rollouts with quick rollback.
Observability: telemetry to verify impact and catch issues early.
Performance: minimal latency overhead, ideally sub-millisecond in the common path.
Ergonomics: simple APIs, meaningful defaults, and good IDE/editor support.
Auditability: who changed flags, when, and why.

Scope decisions to make upfront:

Where flags live (in-app, in a separate service, or both)?
How flags are evaluated (server-side, client-side, or hybrid)?
How to store flag definitions and values (config files, database, or specialized feature flag service)?
Safety nets (fallback values, circuit breakers, and rate limits). ### 2) Choose an architecture

A practical, scalable architecture balances simplicity with control:

Flag definitions service (central source of truth)
- Stores metadata: flag key, type, default value, allowed targets (roles, tenancy), rollout strategy, and auditing.
Flag evaluation service (or widget)
- Determines the value for a given user/context, applying rollout rules and targeting.
Telemetry and observability
- Emit events when flags are evaluated or toggled; collect metrics like activation rate, error rate, and latency.
Caching layer
- Cache evaluated values to avoid repeated lookups; use TTLs aligned with how often you expect changes.
Governance and rollout tooling
- UI or API for safe rollout, pause/kill switch, and an approval workflow for high-risk flags.

For many teams, a two-layer approach works well:

A flag definitions store (e.g., a lightweight database or config service).
An evaluator library embedded in the app, with a small caching layer and an optional remote refresh.

If you want a microservice approach, keep the evaluator stateless and rely on a fast, cacheable response to minimize risk.

3) Define data models

A clean data model makes it easy to reason about flags and their rollout. Key entities:

Flag
- id/key: string
- type: boolean, multivariate (e.g., user segment), percentage
- defaultValue: depends on type
- description: string
- createdAt, updatedAt
Targeting
- segment: e.g., user_id, account_id, region, role
- weight or percentage: for percentage rollouts
- constraints: e.g., user consent, experimental group
Rollout
- strategy: boolean (on/off), percentage, multivariate (different values per segment)
- rolloutRate: 0-100
- seed: integer for deterministic bucketing
AuditLog
- who changed, when, oldValue -> newValue, rationale

Example JSON shape for a boolean flag with percentage rollout:
{
"key": "new_dashboard",
"type": "boolean",
"defaultValue": false,
"rollout": {
"strategy": "percent",
"percentage": 30,
"seed": 12345
},
"targets": [
{"segment": "region:eu", "value": true},
{"segment": "segment:beta", "value": true}
],
"description": "Gradual rollout of the new dashboard UI"
}

4) Evaluation strategies

Common evaluation strategies:

Boolean flags
- Simple on/off, possibly with a per-user override.
Percentage rollout
- Use a deterministic hash of a user identifier to assign a bucket (0-99) and enable for buckets below the threshold.
Multivariate flags
- Assign different values (A/B tests, feature variations) based on hash buckets or segments.
Targeted flags
- Apply to specific users, accounts, regions, or roles.

Deterministic bucketing example (pseudo-logic):

bucket = hash(user_id + flag_seed) mod 100
enabled if bucket < rollout.percentage

For client-side flags, keep evaluation deterministic and respect privacy constraints. For server-side evaluation, you can leverage more context but still benefit from deterministic bucketing for reproducibility.

5) Implementation plan: a minimal, practical approach

We’ll implement a lightweight feature flag system in a Node.js/TypeScript environment, with:

A simple in-memory flag store (extendable to a database)
An evaluator library
A small client function to fetch flag values with caching
A basic rollout UI via a REST API (for demonstration)

You can adapt the patterns to your stack (Go, Python, Java, etc.).

Code: flag definitions (TypeScript)

Define types
Implement a FlagStore
Implement a Evaluator
Add caching and a refresh endpoint

Implementation outline (files and snippets):

1) types.ts
type FlagType = 'boolean' | 'percent' | 'multivariate';
type Target = { segment: string; value: any; };
type Rollout = { strategy: 'boolean' | 'percent' | 'multivariate'; percentage?: number; seed?: number; multivariateValues?: { [bucket: string]: any } };

interface Flag {
key: string;
type: FlagType;
defaultValue: any;
description?: string;
rollout: Rollout;
targets?: Target[];
createdAt?: string;
updatedAt?: string;
}

2) store.ts
export class FlagStore {
private flags: Map = new Map();

constructor(initial?: Flag[]) {
if (initial) for (const f of initial) this.flags.set(f.key, f);
}

get(key: string): Flag | undefined {
return this.flags.get(key);
}

set(flag: Flag) {
this.flags.set(flag.key, flag);
}

all(): Flag[] { return Array.from(this.flags.values()); }
}

3) evaluator.ts
import { Flag, FlagType } from './types';
function hashString(str: string): number {
let h = 2166136261;
for (let i = 0; i < str.length; i++) {
h = (h ^ str.charCodeAt(i)) * 16777619;
}
return Math.abs(h);
}

export class Evaluator {
constructor(private store: FlagStore) {}

evaluate(flagKey: string, context: { userId?: string; [k: string]: any }): any {
const flag = this.store.get(flagKey);
if (!flag) return undefined;

// If there are explicit targets, check them first (simple example)
if (flag.targets) {
  for (const t of flag.targets) {
    if (context[t.segment]) {
      // For simplicity, treat presence as enabled value
      return t.value ?? flag.defaultValue;
    }
  }
}

switch (flag.rollout.strategy) {
  case 'boolean':
    return !!flag.rollout.percentage; // naive, for example
  case 'percent':
    if (!context.userId) return flag.defaultValue;
    const bucket = hashString(String(context.userId) + String(flag.rollout.seed ?? 0)) % 100;
    return bucket < (flag.rollout.percentage ?? 0);
  case 'multivariate':
    // deterministic bucketing into variants
    const bucketMv = (hashString(String(context.userId || '') + String(flag.rollout.seed ?? 0)) % 100);
    const keys = Object.keys(flag.rollout.multivariateValues || {});
    const idx = Math.floor((bucketMv / 100) * keys.length);
    const key = keys[Math.min(idx, keys.length - 1)];
    return flag.rollout.multivariateValues?.[key];
  default:
    return flag.defaultValue;
}

}
}

4) example usage (app.ts)
import express from 'express';
import { FlagStore } from './store';
import { Evaluator } from './evaluator';

const app = express();
const port = 3000;

const initialFlags = [
{
key: 'new_dashboard',
type: 'boolean' as const,
defaultValue: false,
description: 'Gradual rollout of the new dashboard',
rollout: { strategy: 'percent', percentage: 25, seed: 42 },
},
{
key: 'pricing_experiment',
type: 'multivariate' as const,
defaultValue: 'control',
description: 'Pricing experiment with variants A/B',
rollout: {
strategy: 'multivariate',
seed: 7,
multivariateValues: { control: 0.99, variantA: 1.29, variantB: 0.99 },
},
},
];

const store = new FlagStore(initialFlags as any[]);
const evaluator = new Evaluator(store);

app.get('/flag/:key', (req, res) => {
const key = req.params.key;
const userId = req.query.userId as string | undefined;
const value = evaluator.evaluate(key, { userId });
res.json({ key, value });
});

app.listen(port, () => {
console.log(Flag service listening at http://localhost:${port});
});

6) Integrating with your app

Server-side evaluation
- Call the evaluator in your request path where feature behavior changes. Cache results per request/context to avoid repeated computation.
Client-side evaluation
- Expose a small API endpoint or embed a lightweight library that fetches flag values and caches them locally. Be mindful of privacy and data usage.
Caching strategy
- Short TTLs (e.g., 5-15 minutes) for flags that may change often, longer TTLs for stable flags. Invalidate on admin changes.
Fallbacks
- Always have sensible defaults. If flag data is temporarily unavailable, use the flag’s defaultValue. ### 7) Observability and governance
Telemetry
- Track flag evaluations, latency, and outcomes. Correlate with feature usage metrics to measure impact.
Audit trails
- Log changes to flag definitions: who changed what, when, and why. Integrate with your SSO and an issue tracker.
Safety nets
- Pause switch: a global emergency flag to disable all risky features instantly.
- Rollback path: maintain old code paths that can be re-enabled quickly if a rollout goes awry. ### 8) Testing strategies
Unit tests
- Test evaluation logic with various rollout configurations and contexts.
Integration tests
- Validate end-to-end behavior across services; simulate admin changes and ensure clients reflect updates.
Chaos and resilience
- Simulate partial outages (flag store unreachable) and verify graceful fallbacks.
A/B testing guardrails
- Ensure randomization is deterministic per user, and that exposure is within planned bounds.

Example test idea:

Given a userId, flag with 30% rollout yields true for bucket < 30; verify several userIds produce deterministic results.

9) Rollout plan
Start small
- Roll out one low-risk flag to 5% of users, monitor metrics, and collect feedback.
Increase gradually
- Expand to 20%, then 50%, while watching latency and error rates.
Operational readiness
- Automate flag changes via CI/CD hooks and provide a rollback UI for non-engineers.
Documentation
- Create a developer portal with flag keys, allowed targeting, and recommended defaults. ### 10) Best practices and gotchas
Prefer deterministic bucketing over random per call to ensure reproducibility.
Keep flags independent; avoid deep nesting of rollout logic that becomes hard to reason about.
Document every flag with purpose, owners, and expected impact.
Respect user privacy; don’t rely on sensitive data for targeting unless necessary and compliant.
Plan for deprecation: flag lifecycle management to remove stale flags safely.

Illustration: a simple mental model

Think of a flag as a switchboard ruler. Each time a feature path is chosen, the system consults:
- Is there a direct targeting rule for this user/context?
- If not, does the rollout bucket enable this user?
- If still unclear, fallback to the default behavior.

This layered approach helps keep behavior predictable and auditable.
If you’d like, I can tailor this to your stack (Go, Python, Java, or a frontend-heavy app), add a small UI for managing flags, or wire up a real database-backed store with migrations. Which environment are you targeting, and do you want a minimal prototype or a production-grade feature flag service?

Rizwan Saleem | https://rizwansaleem.co