Most Teams Use Feature Flags Wrong
They wire up LaunchDarkly or Unleash, use it for two A/B tests, then forget about it.
Meanwhile, their production is full of if (isNewCheckoutEnabled) blocks that nobody remembers how to toggle.
Feature flags are not primarily an experimentation tool. They're a reliability tool.
The Real Value
Feature flags let you separate deploy from release. You ship code to production cold, then turn it on gradually for real users.
When things break, you flip the switch back in 10 seconds. No rollback, no redeploy, no PR reverts.
The Four Reliability Patterns
1. Kill Switches
Every risky new feature ships behind a kill switch:
if (featureFlags.isEnabled('new_payment_flow', userId)) {
return newPaymentFlow();
}
return legacyPaymentFlow();
When the new flow has a bug, you don't rollback. You flip the flag.
2. Gradual Rollouts
new_search_algorithm:
rollout_percentage: 1 # Start at 1% of users
rules:
- if: "user.tier == 'internal'"
enabled: true # Internal users always see it
Deploy to 1%, watch metrics, go to 5%, watch, 25%, 50%, 100%. Takes 2-4 hours per rollout instead of a single risky deploy.
3. Circuit Breakers
external_recommendations_service:
enabled: true
automatic_disable_if:
error_rate_above: 5%
for_minutes: 5
If a downstream service starts failing, the flag auto-disables that feature. Your product degrades gracefully instead of crashing.
4. Load Shedding
expensive_realtime_dashboard:
enabled_when:
cpu_utilization_below: 70%
active_users_below: 50000
Under load, disable non-critical features to preserve the critical path.
The Anti-Pattern: Permanent Flags
After a feature is 100% rolled out, the flag should be deleted within 2 weeks. Every flag left in the codebase is technical debt.
Flag hygiene rules:
- Every flag has an expiration date (90 days max)
- Every flag has an owner in CODEOWNERS
- CI fails if a flag is older than 180 days
- Monthly flag cleanup is part of standard operations
We track "flag count" as a reliability metric. If it grows unbounded, we're doing it wrong.
The Architecture
A solid feature flag system has three parts:
1. Definition store
- Source of truth for all flags
- Versioned in Git or a managed service (LaunchDarkly, Unleash, GrowthBook)
- Audit log for every change
2. Client SDK
- In-app flag evaluation
- Falls back to defaults if the service is unreachable
- Caches decisions for 60 seconds
- Emits telemetry for flag usage
3. Admin interface
- Change flags without deploying code
- See current state across environments
- Role-based access (not everyone can flip prod flags)
- Approval workflow for high-risk flags
Evaluating at the Right Layer
Flags can live at multiple layers:
CDN edge use for marketing experiments
Load balancer use for blue/green deploys
App server use for feature experiments
Database use for schema migrations
The deeper the layer, the faster the rollout. CDN flags flip in seconds. Database flags take minutes to propagate.
The Reliability Metric
Track: mean time to mitigate (MTTM).
If your team can mitigate an incident in under 30 seconds via a feature flag flip, that's a win. If you have to redeploy to mitigate, your reliability is bottlenecked by deploy time.
Good teams: MTTM under 60 seconds
Great teams: MTTM under 15 seconds
Common Gotchas
- Stale flags skew A/B results clean them up after experiments
- Flags without defaults cause prod outages every flag must have a safe fallback
- Flag flips mid-request cause weird bugs evaluate at request start, cache for the request lifetime
- Nested flags (flags inside flags) are impossible to reason about avoid
A Reliability-First Flag Strategy
Start simple:
- Every new feature ships behind a kill switch
- Gradual rollouts for anything touching the critical path
- Circuit breakers for external dependencies
- Flag cleanup is a monthly ritual
- Track MTTM and optimize it
Feature flags are the most underrated reliability tool in modern engineering. Treat them that way.
Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com
Top comments (0)