Building a developer-friendly feature flag system: architecture, best practices, and a practical imp
Building a developer-friendly feature flag system: architecture, best practices, and a practical implementation
Feature flags are a powerful tool for shipping faster, reducing risk, and enabling safer experimentation. A well-designed system lets engineers toggle features in production without redeploys, runs efficiently at scale, and remains operable under load. This guide walks you through designing and implementing a robust, developer-friendly feature flag system from ground up, with concrete code examples, operational considerations, and a practical rollout plan.
1) Define the goals and scope
Decide what your feature flag system should achieve. Common goals:
- Fine-grained control: flags at per-user, per-team, or per-request granularity.
- Safe rollouts: gradual, percentage-based rollouts with quick rollback.
- Observability: telemetry to verify impact and catch issues early.
- Performance: minimal latency overhead, ideally sub-millisecond in the common path.
- Ergonomics: simple APIs, meaningful defaults, and good IDE/editor support.
- Auditability: who changed flags, when, and why.
Scope decisions to make upfront:
- Where flags live (in-app, in a separate service, or both)?
- How flags are evaluated (server-side, client-side, or hybrid)?
- How to store flag definitions and values (config files, database, or specialized feature flag service)?
- Safety nets (fallback values, circuit breakers, and rate limits). ### 2) Choose an architecture
A practical, scalable architecture balances simplicity with control:
- Flag definitions service (central source of truth)
- Stores metadata: flag key, type, default value, allowed targets (roles, tenancy), rollout strategy, and auditing.
- Flag evaluation service (or widget)
- Determines the value for a given user/context, applying rollout rules and targeting.
- Telemetry and observability
- Emit events when flags are evaluated or toggled; collect metrics like activation rate, error rate, and latency.
- Caching layer
- Cache evaluated values to avoid repeated lookups; use TTLs aligned with how often you expect changes.
- Governance and rollout tooling
- UI or API for safe rollout, pause/kill switch, and an approval workflow for high-risk flags.
For many teams, a two-layer approach works well:
- A flag definitions store (e.g., a lightweight database or config service).
- An evaluator library embedded in the app, with a small caching layer and an optional remote refresh.
If you want a microservice approach, keep the evaluator stateless and rely on a fast, cacheable response to minimize risk.
3) Define data models
A clean data model makes it easy to reason about flags and their rollout. Key entities:
- Flag
- id/key: string
- type: boolean, multivariate (e.g., user segment), percentage
- defaultValue: depends on type
- description: string
- createdAt, updatedAt
- Targeting
- segment: e.g., user_id, account_id, region, role
- weight or percentage: for percentage rollouts
- constraints: e.g., user consent, experimental group
- Rollout
- strategy: boolean (on/off), percentage, multivariate (different values per segment)
- rolloutRate: 0-100
- seed: integer for deterministic bucketing
- AuditLog
- who changed, when, oldValue -> newValue, rationale
Example JSON shape for a boolean flag with percentage rollout:
{
"key": "new_dashboard",
"type": "boolean",
"defaultValue": false,
"rollout": {
"strategy": "percent",
"percentage": 30,
"seed": 12345
},
"targets": [
{"segment": "region:eu", "value": true},
{"segment": "segment:beta", "value": true}
],
"description": "Gradual rollout of the new dashboard UI"
}
4) Evaluation strategies
Common evaluation strategies:
- Boolean flags
- Simple on/off, possibly with a per-user override.
- Percentage rollout
- Use a deterministic hash of a user identifier to assign a bucket (0-99) and enable for buckets below the threshold.
- Multivariate flags
- Assign different values (A/B tests, feature variations) based on hash buckets or segments.
- Targeted flags
- Apply to specific users, accounts, regions, or roles.
Deterministic bucketing example (pseudo-logic):
- bucket = hash(user_id + flag_seed) mod 100
- enabled if bucket < rollout.percentage
For client-side flags, keep evaluation deterministic and respect privacy constraints. For server-side evaluation, you can leverage more context but still benefit from deterministic bucketing for reproducibility.
5) Implementation plan: a minimal, practical approach
We’ll implement a lightweight feature flag system in a Node.js/TypeScript environment, with:
- A simple in-memory flag store (extendable to a database)
- An evaluator library
- A small client function to fetch flag values with caching
- A basic rollout UI via a REST API (for demonstration)
You can adapt the patterns to your stack (Go, Python, Java, etc.).
Code: flag definitions (TypeScript)
- Define types
- Implement a FlagStore
- Implement a Evaluator
- Add caching and a refresh endpoint
Implementation outline (files and snippets):
1) types.ts
type FlagType = 'boolean' | 'percent' | 'multivariate';
type Target = { segment: string; value: any; };
type Rollout = { strategy: 'boolean' | 'percent' | 'multivariate'; percentage?: number; seed?: number; multivariateValues?: { [bucket: string]: any } };
interface Flag {
key: string;
type: FlagType;
defaultValue: any;
description?: string;
rollout: Rollout;
targets?: Target[];
createdAt?: string;
updatedAt?: string;
}
2) store.ts
export class FlagStore {
private flags: Map = new Map();
constructor(initial?: Flag[]) {
if (initial) for (const f of initial) this.flags.set(f.key, f);
}
get(key: string): Flag | undefined {
return this.flags.get(key);
}
set(flag: Flag) {
this.flags.set(flag.key, flag);
}
all(): Flag[] { return Array.from(this.flags.values()); }
}
3) evaluator.ts
import { Flag, FlagType } from './types';
function hashString(str: string): number {
let h = 2166136261;
for (let i = 0; i < str.length; i++) {
h = (h ^ str.charCodeAt(i)) * 16777619;
}
return Math.abs(h);
}
export class Evaluator {
constructor(private store: FlagStore) {}
evaluate(flagKey: string, context: { userId?: string; [k: string]: any }): any {
const flag = this.store.get(flagKey);
if (!flag) return undefined;
// If there are explicit targets, check them first (simple example)
if (flag.targets) {
for (const t of flag.targets) {
if (context[t.segment]) {
// For simplicity, treat presence as enabled value
return t.value ?? flag.defaultValue;
}
}
}
switch (flag.rollout.strategy) {
case 'boolean':
return !!flag.rollout.percentage; // naive, for example
case 'percent':
if (!context.userId) return flag.defaultValue;
const bucket = hashString(String(context.userId) + String(flag.rollout.seed ?? 0)) % 100;
return bucket < (flag.rollout.percentage ?? 0);
case 'multivariate':
// deterministic bucketing into variants
const bucketMv = (hashString(String(context.userId || '') + String(flag.rollout.seed ?? 0)) % 100);
const keys = Object.keys(flag.rollout.multivariateValues || {});
const idx = Math.floor((bucketMv / 100) * keys.length);
const key = keys[Math.min(idx, keys.length - 1)];
return flag.rollout.multivariateValues?.[key];
default:
return flag.defaultValue;
}
}
}
4) example usage (app.ts)
import express from 'express';
import { FlagStore } from './store';
import { Evaluator } from './evaluator';
const app = express();
const port = 3000;
const initialFlags = [
{
key: 'new_dashboard',
type: 'boolean' as const,
defaultValue: false,
description: 'Gradual rollout of the new dashboard',
rollout: { strategy: 'percent', percentage: 25, seed: 42 },
},
{
key: 'pricing_experiment',
type: 'multivariate' as const,
defaultValue: 'control',
description: 'Pricing experiment with variants A/B',
rollout: {
strategy: 'multivariate',
seed: 7,
multivariateValues: { control: 0.99, variantA: 1.29, variantB: 0.99 },
},
},
];
const store = new FlagStore(initialFlags as any[]);
const evaluator = new Evaluator(store);
app.get('/flag/:key', (req, res) => {
const key = req.params.key;
const userId = req.query.userId as string | undefined;
const value = evaluator.evaluate(key, { userId });
res.json({ key, value });
});
app.listen(port, () => {
console.log(Flag service listening at http://localhost:${port});
});
6) Integrating with your app
- Server-side evaluation
- Call the evaluator in your request path where feature behavior changes. Cache results per request/context to avoid repeated computation.
- Client-side evaluation
- Expose a small API endpoint or embed a lightweight library that fetches flag values and caches them locally. Be mindful of privacy and data usage.
- Caching strategy
- Short TTLs (e.g., 5-15 minutes) for flags that may change often, longer TTLs for stable flags. Invalidate on admin changes.
-
Fallbacks
- Always have sensible defaults. If flag data is temporarily unavailable, use the flag’s defaultValue. ### 7) Observability and governance
-
Telemetry
- Track flag evaluations, latency, and outcomes. Correlate with feature usage metrics to measure impact.
-
Audit trails
- Log changes to flag definitions: who changed what, when, and why. Integrate with your SSO and an issue tracker.
-
Safety nets
- Pause switch: a global emergency flag to disable all risky features instantly.
- Rollback path: maintain old code paths that can be re-enabled quickly if a rollout goes awry. ### 8) Testing strategies
-
Unit tests
- Test evaluation logic with various rollout configurations and contexts.
-
Integration tests
- Validate end-to-end behavior across services; simulate admin changes and ensure clients reflect updates.
-
Chaos and resilience
- Simulate partial outages (flag store unreachable) and verify graceful fallbacks.
-
A/B testing guardrails
- Ensure randomization is deterministic per user, and that exposure is within planned bounds.
Example test idea:
-
Given a userId, flag with 30% rollout yields true for bucket < 30; verify several userIds produce deterministic results.
9) Rollout plan
-
Start small
- Roll out one low-risk flag to 5% of users, monitor metrics, and collect feedback.
-
Increase gradually
- Expand to 20%, then 50%, while watching latency and error rates.
-
Operational readiness
- Automate flag changes via CI/CD hooks and provide a rollback UI for non-engineers.
-
Documentation
- Create a developer portal with flag keys, allowed targeting, and recommended defaults. ### 10) Best practices and gotchas
Prefer deterministic bucketing over random per call to ensure reproducibility.
Keep flags independent; avoid deep nesting of rollout logic that becomes hard to reason about.
Document every flag with purpose, owners, and expected impact.
Respect user privacy; don’t rely on sensitive data for targeting unless necessary and compliant.
Plan for deprecation: flag lifecycle management to remove stale flags safely.
Illustration: a simple mental model
- Think of a flag as a switchboard ruler. Each time a feature path is chosen, the system consults:
- Is there a direct targeting rule for this user/context?
- If not, does the rollout bucket enable this user?
- If still unclear, fallback to the default behavior.
This layered approach helps keep behavior predictable and auditable.
If you’d like, I can tailor this to your stack (Go, Python, Java, or a frontend-heavy app), add a small UI for managing flags, or wire up a real database-backed store with migrations. Which environment are you targeting, and do you want a minimal prototype or a production-grade feature flag service?
-
Rizwan Saleem | https://rizwansaleem.co
Top comments (0)