DEV Community

Rizwan Saleem
Rizwan Saleem

Posted on

Building a developer-friendly feature flag system: architecture, best practices, and a practical imp

Building a developer-friendly feature flag system: architecture, best practices, and a practical imp

Building a developer-friendly feature flag system: architecture, best practices, and a practical implementation

Feature flags are a powerful tool for shipping faster, reducing risk, and enabling safer experimentation. A well-designed system lets engineers toggle features in production without redeploys, runs efficiently at scale, and remains operable under load. This guide walks you through designing and implementing a robust, developer-friendly feature flag system from ground up, with concrete code examples, operational considerations, and a practical rollout plan.

1) Define the goals and scope

Decide what your feature flag system should achieve. Common goals:

  • Fine-grained control: flags at per-user, per-team, or per-request granularity.
  • Safe rollouts: gradual, percentage-based rollouts with quick rollback.
  • Observability: telemetry to verify impact and catch issues early.
  • Performance: minimal latency overhead, ideally sub-millisecond in the common path.
  • Ergonomics: simple APIs, meaningful defaults, and good IDE/editor support.
  • Auditability: who changed flags, when, and why.

Scope decisions to make upfront:

  • Where flags live (in-app, in a separate service, or both)?
  • How flags are evaluated (server-side, client-side, or hybrid)?
  • How to store flag definitions and values (config files, database, or specialized feature flag service)?
  • Safety nets (fallback values, circuit breakers, and rate limits). ### 2) Choose an architecture

A practical, scalable architecture balances simplicity with control:

  • Flag definitions service (central source of truth)
    • Stores metadata: flag key, type, default value, allowed targets (roles, tenancy), rollout strategy, and auditing.
  • Flag evaluation service (or widget)
    • Determines the value for a given user/context, applying rollout rules and targeting.
  • Telemetry and observability
    • Emit events when flags are evaluated or toggled; collect metrics like activation rate, error rate, and latency.
  • Caching layer
    • Cache evaluated values to avoid repeated lookups; use TTLs aligned with how often you expect changes.
  • Governance and rollout tooling
    • UI or API for safe rollout, pause/kill switch, and an approval workflow for high-risk flags.

For many teams, a two-layer approach works well:

  • A flag definitions store (e.g., a lightweight database or config service).
  • An evaluator library embedded in the app, with a small caching layer and an optional remote refresh.

If you want a microservice approach, keep the evaluator stateless and rely on a fast, cacheable response to minimize risk.

3) Define data models

A clean data model makes it easy to reason about flags and their rollout. Key entities:

  • Flag
    • id/key: string
    • type: boolean, multivariate (e.g., user segment), percentage
    • defaultValue: depends on type
    • description: string
    • createdAt, updatedAt
  • Targeting
    • segment: e.g., user_id, account_id, region, role
    • weight or percentage: for percentage rollouts
    • constraints: e.g., user consent, experimental group
  • Rollout
    • strategy: boolean (on/off), percentage, multivariate (different values per segment)
    • rolloutRate: 0-100
    • seed: integer for deterministic bucketing
  • AuditLog
    • who changed, when, oldValue -> newValue, rationale

Example JSON shape for a boolean flag with percentage rollout:
{
"key": "new_dashboard",
"type": "boolean",
"defaultValue": false,
"rollout": {
"strategy": "percent",
"percentage": 30,
"seed": 12345
},
"targets": [
{"segment": "region:eu", "value": true},
{"segment": "segment:beta", "value": true}
],
"description": "Gradual rollout of the new dashboard UI"
}

4) Evaluation strategies

Common evaluation strategies:

  • Boolean flags
    • Simple on/off, possibly with a per-user override.
  • Percentage rollout
    • Use a deterministic hash of a user identifier to assign a bucket (0-99) and enable for buckets below the threshold.
  • Multivariate flags
    • Assign different values (A/B tests, feature variations) based on hash buckets or segments.
  • Targeted flags
    • Apply to specific users, accounts, regions, or roles.

Deterministic bucketing example (pseudo-logic):

  • bucket = hash(user_id + flag_seed) mod 100
  • enabled if bucket < rollout.percentage

For client-side flags, keep evaluation deterministic and respect privacy constraints. For server-side evaluation, you can leverage more context but still benefit from deterministic bucketing for reproducibility.

5) Implementation plan: a minimal, practical approach

We’ll implement a lightweight feature flag system in a Node.js/TypeScript environment, with:

  • A simple in-memory flag store (extendable to a database)
  • An evaluator library
  • A small client function to fetch flag values with caching
  • A basic rollout UI via a REST API (for demonstration)

You can adapt the patterns to your stack (Go, Python, Java, etc.).

Code: flag definitions (TypeScript)

  • Define types
  • Implement a FlagStore
  • Implement a Evaluator
  • Add caching and a refresh endpoint

Implementation outline (files and snippets):

1) types.ts
type FlagType = 'boolean' | 'percent' | 'multivariate';
type Target = { segment: string; value: any; };
type Rollout = { strategy: 'boolean' | 'percent' | 'multivariate'; percentage?: number; seed?: number; multivariateValues?: { [bucket: string]: any } };

interface Flag {
key: string;
type: FlagType;
defaultValue: any;
description?: string;
rollout: Rollout;
targets?: Target[];
createdAt?: string;
updatedAt?: string;
}

2) store.ts
export class FlagStore {
private flags: Map = new Map();

constructor(initial?: Flag[]) {
if (initial) for (const f of initial) this.flags.set(f.key, f);
}

get(key: string): Flag | undefined {
return this.flags.get(key);
}

set(flag: Flag) {
this.flags.set(flag.key, flag);
}

all(): Flag[] { return Array.from(this.flags.values()); }
}

3) evaluator.ts
import { Flag, FlagType } from './types';
function hashString(str: string): number {
let h = 2166136261;
for (let i = 0; i < str.length; i++) {
h = (h ^ str.charCodeAt(i)) * 16777619;
}
return Math.abs(h);
}

export class Evaluator {
constructor(private store: FlagStore) {}

evaluate(flagKey: string, context: { userId?: string; [k: string]: any }): any {
const flag = this.store.get(flagKey);
if (!flag) return undefined;

// If there are explicit targets, check them first (simple example)
if (flag.targets) {
  for (const t of flag.targets) {
    if (context[t.segment]) {
      // For simplicity, treat presence as enabled value
      return t.value ?? flag.defaultValue;
    }
  }
}

switch (flag.rollout.strategy) {
  case 'boolean':
    return !!flag.rollout.percentage; // naive, for example
  case 'percent':
    if (!context.userId) return flag.defaultValue;
    const bucket = hashString(String(context.userId) + String(flag.rollout.seed ?? 0)) % 100;
    return bucket < (flag.rollout.percentage ?? 0);
  case 'multivariate':
    // deterministic bucketing into variants
    const bucketMv = (hashString(String(context.userId || '') + String(flag.rollout.seed ?? 0)) % 100);
    const keys = Object.keys(flag.rollout.multivariateValues || {});
    const idx = Math.floor((bucketMv / 100) * keys.length);
    const key = keys[Math.min(idx, keys.length - 1)];
    return flag.rollout.multivariateValues?.[key];
  default:
    return flag.defaultValue;
}
Enter fullscreen mode Exit fullscreen mode

}
}

4) example usage (app.ts)
import express from 'express';
import { FlagStore } from './store';
import { Evaluator } from './evaluator';

const app = express();
const port = 3000;

const initialFlags = [
{
key: 'new_dashboard',
type: 'boolean' as const,
defaultValue: false,
description: 'Gradual rollout of the new dashboard',
rollout: { strategy: 'percent', percentage: 25, seed: 42 },
},
{
key: 'pricing_experiment',
type: 'multivariate' as const,
defaultValue: 'control',
description: 'Pricing experiment with variants A/B',
rollout: {
strategy: 'multivariate',
seed: 7,
multivariateValues: { control: 0.99, variantA: 1.29, variantB: 0.99 },
},
},
];

const store = new FlagStore(initialFlags as any[]);
const evaluator = new Evaluator(store);

app.get('/flag/:key', (req, res) => {
const key = req.params.key;
const userId = req.query.userId as string | undefined;
const value = evaluator.evaluate(key, { userId });
res.json({ key, value });
});

app.listen(port, () => {
console.log(Flag service listening at http://localhost:${port});
});

6) Integrating with your app

  • Server-side evaluation
    • Call the evaluator in your request path where feature behavior changes. Cache results per request/context to avoid repeated computation.
  • Client-side evaluation
    • Expose a small API endpoint or embed a lightweight library that fetches flag values and caches them locally. Be mindful of privacy and data usage.
  • Caching strategy
    • Short TTLs (e.g., 5-15 minutes) for flags that may change often, longer TTLs for stable flags. Invalidate on admin changes.
  • Fallbacks

    • Always have sensible defaults. If flag data is temporarily unavailable, use the flag’s defaultValue. ### 7) Observability and governance
  • Telemetry

    • Track flag evaluations, latency, and outcomes. Correlate with feature usage metrics to measure impact.
  • Audit trails

    • Log changes to flag definitions: who changed what, when, and why. Integrate with your SSO and an issue tracker.
  • Safety nets

    • Pause switch: a global emergency flag to disable all risky features instantly.
    • Rollback path: maintain old code paths that can be re-enabled quickly if a rollout goes awry. ### 8) Testing strategies
  • Unit tests

    • Test evaluation logic with various rollout configurations and contexts.
  • Integration tests

    • Validate end-to-end behavior across services; simulate admin changes and ensure clients reflect updates.
  • Chaos and resilience

    • Simulate partial outages (flag store unreachable) and verify graceful fallbacks.
  • A/B testing guardrails

    • Ensure randomization is deterministic per user, and that exposure is within planned bounds.

Example test idea:

  • Given a userId, flag with 30% rollout yields true for bucket < 30; verify several userIds produce deterministic results.

    9) Rollout plan

  • Start small

    • Roll out one low-risk flag to 5% of users, monitor metrics, and collect feedback.
  • Increase gradually

    • Expand to 20%, then 50%, while watching latency and error rates.
  • Operational readiness

    • Automate flag changes via CI/CD hooks and provide a rollback UI for non-engineers.
  • Documentation

    • Create a developer portal with flag keys, allowed targeting, and recommended defaults. ### 10) Best practices and gotchas
  • Prefer deterministic bucketing over random per call to ensure reproducibility.

  • Keep flags independent; avoid deep nesting of rollout logic that becomes hard to reason about.

  • Document every flag with purpose, owners, and expected impact.

  • Respect user privacy; don’t rely on sensitive data for targeting unless necessary and compliant.

  • Plan for deprecation: flag lifecycle management to remove stale flags safely.

Illustration: a simple mental model

  • Think of a flag as a switchboard ruler. Each time a feature path is chosen, the system consults:
    • Is there a direct targeting rule for this user/context?
    • If not, does the rollout bucket enable this user?
    • If still unclear, fallback to the default behavior.

This layered approach helps keep behavior predictable and auditable.
If you’d like, I can tailor this to your stack (Go, Python, Java, or a frontend-heavy app), add a small UI for managing flags, or wire up a real database-backed store with migrations. Which environment are you targeting, and do you want a minimal prototype or a production-grade feature flag service?

-

Rizwan Saleem | https://rizwansaleem.co

Sources

Top comments (0)