Young Gao

Posted on Mar 21

Feature Flags from Scratch: Build a Runtime Toggle System in TypeScript (2026)

#typescript #devops #architecture #tutorial

I need file write permission to save the article. Since both Write and Bash are denied, I'll output the article directly below.

Every backend team eventually faces the same question: how do we ship code to production without shipping the feature? The answer, almost universally, is feature flags. But most teams reach for a third-party service before understanding what they're actually buying. In this installment of Production Backend Patterns, we'll build a feature flag system from scratch in TypeScript — starting with a simple boolean toggle and ending with a production-grade system that handles targeting rules, percentage rollouts, A/B testing, and lifecycle management.

Why Feature Flags Matter

Feature flags decouple deployment from release. That single idea unlocks a cascade of operational benefits:

Continuous deployment without risk. Merge to main, deploy to production, but keep the feature hidden behind a flag. If something breaks, kill the flag — no rollback required.
Gradual rollouts. Ship to 1% of users, watch your error rate, then ramp to 10%, 50%, 100%.
A/B testing. Route users into cohorts and measure which variant performs better.
Kill switches. Wrap expensive operations in flags so you can shed load during incidents.
Trunk-based development. Long-lived feature branches become unnecessary when incomplete work hides behind flags.

The cost of not having flags is steep: you either ship everything at once and pray, or you maintain complex branching strategies that slow your entire team down.

The Core Abstraction

Before writing any code, let's define what a feature flag system needs to do. At its core, it answers one question: is this flag on for this context?

interface FlagContext {
  userId?: string;
  email?: string;
  country?: string;
  plan?: string;
  attributes?: Record<string, string | number | boolean>;
}

interface FlagDefinition {
  key: string;
  enabled: boolean;
  description?: string;
  rules?: TargetingRule[];
  percentageRollout?: number;
  variants?: Record<string, VariantConfig>;
  createdAt: Date;
  updatedAt: Date;
  owner?: string;
  tags?: string[];
}

interface TargetingRule {
  attribute: string;
  operator: 'eq' | 'neq' | 'in' | 'nin' | 'gt' | 'lt' | 'contains';
  value: string | number | boolean | Array<string | number>;
  result: boolean;
}

interface VariantConfig {
  weight: number;
  payload?: Record<string, unknown>;
}

These interfaces are the contract. Everything else — storage, evaluation, lifecycle — builds on top of them.

Building the In-Memory Flag Store

We start with the simplest useful implementation: an in-memory store with synchronous evaluation.

class FlagStore {
  private flags: Map<string, FlagDefinition> = new Map();

  register(flag: Omit<FlagDefinition, 'createdAt' | 'updatedAt'>): void {
    const now = new Date();
    this.flags.set(flag.key, {
      ...flag,
      createdAt: now,
      updatedAt: now,
    });
  }

  getAll(): Map<string, FlagDefinition> {
    return new Map(this.flags);
  }

  isEnabled(key: string, context: FlagContext = {}): boolean {
    const flag = this.flags.get(key);
    if (!flag) return false;
    if (!flag.enabled) return false;

    // Check targeting rules first — they override everything
    if (flag.rules && flag.rules.length > 0) {
      const ruleResult = this.evaluateRules(flag.rules, context);
      if (ruleResult !== null) return ruleResult;
    }

    // Fall back to percentage rollout
    if (flag.percentageRollout !== undefined && context.userId) {
      return this.evaluatePercentage(key, context.userId, flag.percentageRollout);
    }

    return flag.enabled;
  }

  private evaluateRules(rules: TargetingRule[], context: FlagContext): boolean | null {
    for (const rule of rules) {
      const contextValue = this.resolveAttribute(rule.attribute, context);
      if (contextValue === undefined) continue;

      if (this.matchesRule(contextValue, rule)) {
        return rule.result;
      }
    }
    return null; // No rule matched
  }

  private resolveAttribute(
    attribute: string,
    context: FlagContext
  ): string | number | boolean | undefined {
    if (attribute in context) {
      return (context as Record<string, unknown>)[attribute] as string;
    }
    return context.attributes?.[attribute];
  }

  private matchesRule(
    contextValue: string | number | boolean,
    rule: TargetingRule
  ): boolean {
    switch (rule.operator) {
      case 'eq':
        return contextValue === rule.value;
      case 'neq':
        return contextValue !== rule.value;
      case 'in':
        return Array.isArray(rule.value) && rule.value.includes(contextValue as never);
      case 'nin':
        return Array.isArray(rule.value) && !rule.value.includes(contextValue as never);
      case 'gt':
        return typeof contextValue === 'number' && contextValue > (rule.value as number);
      case 'lt':
        return typeof contextValue === 'number' && contextValue < (rule.value as number);
      case 'contains':
        return typeof contextValue === 'string' && contextValue.includes(rule.value as string);
      default:
        return false;
    }
  }

  private evaluatePercentage(flagKey: string, userId: string, percentage: number): boolean {
    const hash = this.hashUserFlag(flagKey, userId);
    return (hash % 100) < percentage;
  }

  private hashUserFlag(flagKey: string, userId: string): number {
    const str = `${flagKey}:${userId}`;
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
      const char = str.charCodeAt(i);
      hash = ((hash << 5) - hash) + char;
      hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash);
  }
}

A few design decisions worth calling out:

Deterministic hashing for percentage rollouts. We hash the combination of flag key and user ID, not just the user ID. This means a user who is in the 5% for new-checkout won't necessarily be in the 5% for dark-mode. Each flag gets independent randomization, but the same user always gets the same result for the same flag — no flickering.

Rules take priority over percentages. If a targeting rule matches, the percentage rollout is ignored. This lets you force-enable a flag for your QA team while rolling out to 10% of everyone else.

Unknown flags default to off. This is critical. If someone queries a flag that doesn't exist, the answer is always false. Never fail open with feature flags.

Adding Persistence

An in-memory store is fine for tests, but production needs persistence. We'll define a storage interface and implement two backends.

interface FlagStorage {
  load(): Promise<Map<string, FlagDefinition>>;
  save(flags: Map<string, FlagDefinition>): Promise<void>;
  watch?(onChange: (flags: Map<string, FlagDefinition>) => void): void;
}

class FileFlagStorage implements FlagStorage {
  constructor(private filePath: string) {}

  async load(): Promise<Map<string, FlagDefinition>> {
    try {
      const data = await fs.readFile(this.filePath, 'utf-8');
      const parsed = JSON.parse(data, (key, value) => {
        if (key === 'createdAt' || key === 'updatedAt') return new Date(value);
        return value;
      });
      return new Map(Object.entries(parsed));
    } catch {
      return new Map();
    }
  }

  async save(flags: Map<string, FlagDefinition>): Promise<void> {
    const obj = Object.fromEntries(flags);
    await fs.writeFile(this.filePath, JSON.stringify(obj, null, 2));
  }

  watch(onChange: (flags: Map<string, FlagDefinition>) => void): void {
    const watcher = fsSync.watch(this.filePath, async () => {
      const flags = await this.load();
      onChange(flags);
    });
    process.on('exit', () => watcher.close());
  }
}

For database-backed storage, the pattern is the same — swap file reads for queries:

class PostgresFlagStorage implements FlagStorage {
  constructor(private pool: Pool) {}

  async load(): Promise<Map<string, FlagDefinition>> {
    const result = await this.pool.query(
      'SELECT key, definition FROM feature_flags'
    );
    const flags = new Map<string, FlagDefinition>();
    for (const row of result.rows) {
      flags.set(row.key, {
        ...row.definition,
        createdAt: new Date(row.definition.createdAt),
        updatedAt: new Date(row.definition.updatedAt),
      });
    }
    return flags;
  }

  async save(flags: Map<string, FlagDefinition>): Promise<void> {
    const client = await this.pool.connect();
    try {
      await client.query('BEGIN');
      for (const [key, def] of flags) {
        await client.query(
          `INSERT INTO feature_flags (key, definition)
           VALUES ($1, $2)
           ON CONFLICT (key) DO UPDATE SET definition = $2`,
          [key, JSON.stringify(def)]
        );
      }
      await client.query('COMMIT');
    } catch (err) {
      await client.query('ROLLBACK');
      throw err;
    } finally {
      client.release();
    }
  }
}

Now wire the persistent store into the flag system:

class PersistentFlagStore extends FlagStore {
  private storage: FlagStorage;

  constructor(storage: FlagStorage) {
    super();
    this.storage = storage;
    this.storage.watch?.((flags) => this.reload(flags));
  }

  async initialize(): Promise<void> {
    const flags = await this.storage.load();
    this.reload(flags);
  }

  private reload(flags: Map<string, FlagDefinition>): void {
    for (const [, def] of flags) {
      this.register(def);
    }
  }

  async persist(): Promise<void> {
    await this.storage.save(this.getAll());
  }
}

The watch mechanism is important. In production, you want flag changes to propagate without restarting services. File watchers work for single-node setups. For distributed systems, use Postgres LISTEN/NOTIFY, Redis pub/sub, or poll on an interval.

A/B Testing with Variants

Boolean flags are a starting point. Real A/B tests need multiple variants with configurable weights.

class FlagStore {
  // ... previous methods ...

  getVariant(key: string, context: FlagContext): string | null {
    const flag = this.flags.get(key);
    if (!flag || !flag.enabled || !flag.variants) return null;

    const userId = context.userId;
    if (!userId) return null;

    const hash = this.hashUserFlag(key, userId) % 100;
    const entries = Object.entries(flag.variants);

    let cumulative = 0;
    for (const [variantName, config] of entries) {
      cumulative += config.weight;
      if (hash < cumulative) return variantName;
    }

    return entries[0]?.[0] ?? null;
  }

  getVariantPayload(key: string, context: FlagContext): Record<string, unknown> | null {
    const variantName = this.getVariant(key, context);
    if (!variantName) return null;

    const flag = this.flags.get(key);
    return flag?.variants?.[variantName]?.payload ?? null;
  }
}

Usage in application code:

// Register an A/B test
flags.register({
  key: 'checkout-flow',
  enabled: true,
  description: "'Test new streamlined checkout vs current',"
  variants: {
    control: { weight: 50, payload: { steps: 3, showProgress: false } },
    streamlined: { weight: 50, payload: { steps: 1, showProgress: true } },
  },
});

// In the request handler
const variant = flags.getVariant('checkout-flow', { userId: req.user.id });
const payload = flags.getVariantPayload('checkout-flow', { userId: req.user.id });

analytics.track('checkout_started', {
  userId: req.user.id,
  variant,
  ...payload,
});

The variant weights must sum to 100. Add a validation check in register to enforce this — silent misconfiguration is a recipe for debugging nightmares.

Flag Lifecycle Management

Flags accumulate. Every team that adopts feature flags eventually drowns in stale ones. The flag system itself should track lifecycle state and nudge developers toward cleanup.

type FlagPhase = 'development' | 'testing' | 'rolling_out' | 'fully_released' | 'archived';

interface FlagMetadata {
  phase: FlagPhase;
  jiraTicket?: string;
  owner: string;
  reviewDate: Date;
  maxLifespanDays: number;
}

class FlagLifecycleManager {
  constructor(private store: FlagStore) {}

  getStaleFlags(thresholdDays: number = 30): FlagDefinition[] {
    const now = Date.now();
    const stale: FlagDefinition[] = [];

    for (const [, flag] of this.store.getAll()) {
      const age = (now - flag.updatedAt.getTime()) / (1000 * 60 * 60 * 24);
      if (age > thresholdDays) {
        stale.push(flag);
      }
    }

    return stale;
  }

  getFullyReleasedFlags(): FlagDefinition[] {
    const results: FlagDefinition[] = [];
    for (const [, flag] of this.store.getAll()) {
      if (
        flag.enabled &&
        !flag.rules?.length &&
        flag.percentageRollout === undefined &&
        !flag.variants
      ) {
        results.push(flag);
      }
    }
    return results;
  }

  generateReport(): FlagReport {
    const all = Array.from(this.store.getAll().values());
    return {
      total: all.length,
      enabled: all.filter((f) => f.enabled).length,
      disabled: all.filter((f) => !f.enabled).length,
      stale: this.getStaleFlags().length,
      fullyReleased: this.getFullyReleasedFlags().length,
      byTag: this.groupByTag(all),
    };
  }

  private groupByTag(flags: FlagDefinition[]): Record<string, number> {
    const groups: Record<string, number> = {};
    for (const flag of flags) {
      for (const tag of flag.tags ?? []) {
        groups[tag] = (groups[tag] ?? 0) + 1;
      }
    }
    return groups;
  }
}

interface FlagReport {
  total: number;
  enabled: number;
  disabled: number;
  stale: number;
  fullyReleased: number;
  byTag: Record<string, number>;
}

Cleanup Strategies for Stale Flags

Finding stale flags is only half the battle. You need a system that makes removal frictionless.

Strategy 1: Expiry dates. When registering a flag, set a reviewDate. Run a daily job that alerts owners when their flags are overdue.

async function flagCleanupJob(
  manager: FlagLifecycleManager,
  notify: (owner: string, message: string) => Promise<void>
): Promise<void> {
  const stale = manager.getStaleFlags(30);

  for (const flag of stale) {
    if (flag.owner) {
      await notify(
        flag.owner,
        `Flag "${flag.key}" hasn't been updated in 30+ days. ` +
        `Remove it or update its review date.`
      );
    }
  }

  const released = manager.getFullyReleasedFlags();
  for (const flag of released) {
    if (flag.owner) {
      await notify(
        flag.owner,
        `Flag "${flag.key}" is fully released with no conditions. ` +
        `Consider removing the flag and hardcoding the behavior.`
      );
    }
  }
}

Strategy 2: Code references. Scan your codebase for flag key usage. If a flag exists in the store but no code references it, it's dead.

import { execSync } from 'child_process';

function findUnreferencedFlags(
  store: FlagStore,
  codebasePath: string
): string[] {
  const unreferenced: string[] = [];

  for (const [key] of store.getAll()) {
    try {
      const result = execSync(
        `grep -r "${key}" ${codebasePath}/src --include="*.ts" -l`,
        { encoding: 'utf-8' }
      );
      if (!result.trim()) unreferenced.push(key);
    } catch {
      // grep returns exit code 1 when no matches found
      unreferenced.push(key);
    }
  }

  return unreferenced;
}

Strategy 3: Usage tracking. Instrument the flag evaluation to count how often each flag is checked. Flags with zero evaluations over a week are candidates for removal.

class InstrumentedFlagStore extends FlagStore {
  private evaluationCounts: Map<string, number> = new Map();
  private lastReset: Date = new Date();

  isEnabled(key: string, context: FlagContext = {}): boolean {
    this.evaluationCounts.set(key, (this.evaluationCounts.get(key) ?? 0) + 1);
    return super.isEnabled(key, context);
  }

  getUsageStats(): { key: string; count: number }[] {
    return Array.from(this.evaluationCounts.entries())
      .map(([key, count]) => ({ key, count }))
      .sort((a, b) => b.count - a.count);
  }

  getUnusedFlags(): string[] {
    const evaluated = new Set(this.evaluationCounts.keys());
    const unused: string[] = [];
    for (const [key] of this.getAll()) {
      if (!evaluated.has(key)) unused.push(key);
    }
    return unused;
  }

  resetStats(): void {
    this.evaluationCounts.clear();
    this.lastReset = new Date();
  }
}

Integrating with CI/CD

Feature flags should be part of your deployment pipeline, not a runtime afterthought.

Validate flag definitions in CI. Add a build step that loads your flag configuration and checks for common mistakes:

// scripts/validate-flags.ts
import { readFileSync } from 'fs';

interface ValidationError {
  flag: string;
  message: string;
}

function validateFlags(configPath: string): ValidationError[] {
  const errors: ValidationError[] = [];
  const raw = JSON.parse(readFileSync(configPath, 'utf-8'));

  for (const [key, def] of Object.entries<any>(raw)) {
    if (!key.match(/^[a-z][a-z0-9-]*$/)) {
      errors.push({ flag: key, message: 'Flag key must be lowercase kebab-case' });
    }

    if (!def.owner) {
      errors.push({ flag: key, message: 'Flag must have an owner' });
    }

    if (!def.description) {
      errors.push({ flag: key, message: 'Flag must have a description' });
    }

    if (def.variants) {
      const totalWeight = Object.values<any>(def.variants)
        .reduce((sum: number, v: any) => sum + v.weight, 0);
      if (totalWeight !== 100) {
        errors.push({
          flag: key,
          message: `Variant weights sum to ${totalWeight}, expected 100`,
        });
      }
    }

    if (def.percentageRollout !== undefined) {
      if (def.percentageRollout < 0 || def.percentageRollout > 100) {
        errors.push({ flag: key, message: 'Percentage must be between 0 and 100' });
      }
    }
  }

  return errors;
}

const errors = validateFlags(process.argv[2] ?? 'flags.json');
if (errors.length > 0) {
  console.error('Flag validation failed:');
  errors.forEach((e) => console.error(`  [${e.flag}] ${e.message}`));
  process.exit(1);
}
console.log('All flags valid.');

Deploy flags before code. Your CI pipeline should update flag definitions in the store before deploying the application code that references them. This prevents a window where code checks a flag that doesn't exist yet.

A typical pipeline looks like:

Run flag validation in CI
Deploy flag configuration to the flag store
Deploy application code
Run smoke tests with flags in their expected state
If rollback is needed, revert both flag config and application code

Track flag changes in version control. Even if your flags live in a database at runtime, keep a canonical flags.json in your repository. Treat it as infrastructure-as-code. Every flag change goes through a PR, gets reviewed, and has an audit trail.

Monitoring Flag Usage

In production, you need visibility into how flags affect your system. Expose metrics that answer three questions: which flags are active, how often are they evaluated, and what's the split between on and off?

interface FlagMetrics {
  record(flagKey: string, result: boolean, variant?: string): void;
  getMetrics(flagKey: string): FlagMetricsSummary;
}

interface FlagMetricsSummary {
  totalEvaluations: number;
  enabledCount: number;
  disabledCount: number;
  variantDistribution: Record<string, number>;
  lastEvaluated: Date;
}

class PrometheusFlagMetrics implements FlagMetrics {
  private data: Map<string, FlagMetricsSummary> = new Map();

  record(flagKey: string, result: boolean, variant?: string): void {
    let summary = this.data.get(flagKey);
    if (!summary) {
      summary = {
        totalEvaluations: 0,
        enabledCount: 0,
        disabledCount: 0,
        variantDistribution: {},
        lastEvaluated: new Date(),
      };
      this.data.set(flagKey, summary);
    }

    summary.totalEvaluations++;
    if (result) summary.enabledCount++;
    else summary.disabledCount++;
    if (variant) {
      summary.variantDistribution[variant] =
        (summary.variantDistribution[variant] ?? 0) + 1;
    }
    summary.lastEvaluated = new Date();
  }

  getMetrics(flagKey: string): FlagMetricsSummary {
    return this.data.get(flagKey) ?? {
      totalEvaluations: 0,
      enabledCount: 0,
      disabledCount: 0,
      variantDistribution: {},
      lastEvaluated: new Date(0),
    };
  }

  toPrometheusFormat(): string {
    const lines: string[] = [];
    for (const [key, summary] of this.data) {
      const safeKey = key.replace(/-/g, '_');
      lines.push(`feature_flag_evaluations_total{flag="${safeKey}"} ${summary.totalEvaluations}`);
      lines.push(`feature_flag_enabled_total{flag="${safeKey}"} ${summary.enabledCount}`);
      lines.push(`feature_flag_disabled_total{flag="${safeKey}"} ${summary.disabledCount}`);
      for (const [variant, count] of Object.entries(summary.variantDistribution)) {
        lines.push(`feature_flag_variant_total{flag="${safeKey}",variant="${variant}"} ${count}`);
      }
    }
    return lines.join('\n');
  }
}

Wire the metrics into your flag store by wrapping evaluation calls:

class ObservableFlagStore extends FlagStore {
  constructor(private metrics: FlagMetrics) {
    super();
  }

  isEnabled(key: string, context: FlagContext = {}): boolean {
    const result = super.isEnabled(key, context);
    this.metrics.record(key, result);
    return result;
  }

  getVariant(key: string, context: FlagContext): string | null {
    const variant = super.getVariant(key, context);
    this.metrics.record(key, variant !== null, variant ?? undefined);
    return variant;
  }
}

Set up alerts on two conditions: a flag that was actively evaluated yesterday but has zero evaluations today (possible regression), and a flag whose enabled/disabled ratio shifts dramatically (possible misconfiguration).

Putting It All Together

Here's how these pieces compose in a real application:

import express from 'express';

async function main() {
  const storage = new FileFlagStorage('./flags.json');
  const metrics = new PrometheusFlagMetrics();
  const store = new ObservableFlagStore(metrics);

  // Load flags from disk
  const persisted = new PersistentFlagStore(storage);
  await persisted.initialize();

  // Register flags
  store.register({
    key: 'new-pricing-page',
    enabled: true,
    description: "'Redesigned pricing page with annual toggle',"
    owner: 'growth-team',
    tags: ['growth', 'pricing'],
    percentageRollout: 25,
    rules: [
      { attribute: 'email', operator: 'contains', value: '@company.com', result: true },
    ],
  });

  const app = express();

  app.get('/pricing', (req, res) => {
    const context: FlagContext = {
      userId: req.user?.id,
      email: req.user?.email,
      plan: req.user?.plan,
    };

    if (store.isEnabled('new-pricing-page', context)) {
      return res.render('pricing-v2');
    }
    return res.render('pricing');
  });

  // Expose metrics endpoint for Prometheus scraping
  app.get('/metrics', (_req, res) => {
    res.type('text/plain').send(metrics.toPrometheusFormat());
  });

  // Lifecycle management endpoint
  app.get('/admin/flags/report', (_req, res) => {
    const manager = new FlagLifecycleManager(store);
    res.json(manager.generateReport());
  });

  app.listen(3000);
}

When to Build vs. Buy

Build your own when: you have fewer than 20 flags, your team is small, you want to understand the internals, or you're in an environment where third-party SaaS is restricted.

Buy (LaunchDarkly, Unleash, Flagsmith) when: you need a management UI for non-engineers, you're operating at scale across many services, you need audit logs and approval workflows out of the box, or your team's time is better spent on product features.

The system we built here is not a toy — it handles the core mechanics correctly. But a production SaaS offering adds edge SDKs, streaming updates, experimentation statistics, and an admin dashboard. Know what you're trading off.

Key Takeaways

Feature flags are infrastructure, not a nice-to-have. Treat them with the same rigor as your database schema or API contracts.

Start simple. A boolean flag store with deterministic hashing covers 80% of use cases.
Plan for cleanup from day one. Every flag should have an owner and a review date. Automate stale flag detection.
Make evaluation fast. Flags are checked on every request. Keep the hot path in memory, sync from persistence in the background.
Monitor everything. You can't manage what you can't measure. Track evaluation counts, variant splits, and flag age.
Version control your flags. Even if the runtime store is a database, the source of truth should live in your repository.

The complete code from this article gives you a foundation. Fork it, adapt the storage backend to your stack, add the validation step to your CI pipeline, and start shipping features with confidence.

This is part of the **Production Backend Patterns* series. Each article takes a common backend concern and builds it from first principles in TypeScript.*

DEV Community