Flaggy

Posted on May 19 • Originally published at flaggy.io

Feature flag best practices for engineering teams

#webdev #programming #devops #tutorial

The patterns that make feature flags useful long-term, and the habits that keep them from becoming a maintenance burden. Drawn from common failure modes.

Feature flags are simple in concept. The failure modes are also simple, and predictable. Most teams hit the same three problems: flags that never get cleaned up, targeting rules nobody understands, and flag state that’s invisible to the people who need it.

Here are the practices that prevent each of those.

Name flags by intent, not implementation

A flag named “New dashboard v2” tells you what it wraps but not why it exists or when it ends. A flag named “Release dashboard redesign” communicates both that it’s temporary and what it controls.

A useful naming convention uses a prefix word to encode the flag’s lifecycle. In Flaggy, names are entered as plain text and the key is generated automatically — “Release dashboard redesign” becomes the key release-dashboard-redesign in your code.

Useful prefixes:

Release — enables a new feature during a gradual release. Intended to be removed once fully shipped. e.g. “Release dashboard redesign”
Experiment — an A/B test. Has a defined end date. e.g. “Experiment checkout button color”
Kill switch — a permanent circuit breaker. Not expected to be removed. e.g. “Kill switch payments integration”
Ops — a configuration flag for operational tuning. May be permanent. e.g. “Ops cache TTL override”

The prefix communicates lifespan at a glance. When you’re reviewing a list of 50 flags, you can immediately separate the ones that should be cleaned up from the ones that are load-bearing.

Create the cleanup ticket at flag creation

The most reliable way to remove a flag is to plan for it before you merge. When you create a flag in the dashboard, immediately create a ticket in your issue tracker: “Remove flag rollout-feature-name after full rollout.”

Without this step, the flag gets created with full intention to clean it up, but the rollout ends, the team moves on, and the ticket never gets written. Flags accumulate. After a year you have 150 flags and nobody knows which ones are active.

The sequence is: create flag → create cleanup ticket → merge code → run rollout → remove flag → close ticket.

Set flag context consistently

The user context you pass to the SDK determines what targeting rules you can write. If different parts of the app pass different context attributes, you end up with targeting rules that silently don’t apply everywhere.

Define a canonical context object and build it in one place:

function buildFlagContext(user: User): FlagContext {
  return {
    key: user.id,
    email: user.email,
    plan: user.subscriptionPlan,
    teamId: user.teamId,
    accountAgeDays: differenceInDays(new Date(), user.createdAt),
    country: user.countryCode,
  };
}

Use this everywhere. When you add a new attribute, add it here and it becomes available for targeting across the entire app.

Keep flag logic at the boundary, not deep in components

Flag checks embedded deep in component trees or utility functions are hard to find and easy to forget. Prefer pushing flag evaluation to the boundary of a feature — the entry point that decides which version to render.

Harder to clean up:

// buried in a utility function
function formatPrice(amount: number) {
  if (flags.isEnabled('new-pricing-display')) {
    return newFormat(amount);
  }
  return legacyFormat(amount);
}

Easier to clean up:

// at the component boundary
function PricingSection({ user }) {
  const useNewDisplay = flags.isEnabled('new-pricing-display', { key: user.id });
  return useNewDisplay ? <NewPricingDisplay /> : <LegacyPricingDisplay />;
}

The second version makes the flag visible at a high level. When you remove it, you delete the condition and the LegacyPricingDisplay import — the diff is obvious.

Use segments for reusable targeting logic

If you find yourself writing the same targeting rule on multiple flags — “users where plan = team AND accountAge > 30 days” — define it as a segment once and target the segment.

Benefits:

Rules are consistent. If your definition of “beta users” changes, you update it in one place.
Auditing is easier. You can see which flags target a segment.
Onboarding new team members is simpler — they see named segments, not repeated rule logic.
Segments with clear names also make flag state legible to non-engineers. “This flag is enabled for the Beta Users segment” is understandable to a PM in a way that a list of attribute conditions isn’t.

Make flag state visible to the right people

The support engineer who gets a ticket about a broken feature needs to check flag state. The PM managing a rollout wants to see current coverage. The on-call engineer debugging an incident needs to know which flags are currently active.

This is the hidden argument for unlimited seats: restricting dashboard access to the engineers who created the flags means everyone else is guessing. “Is this behind a flag?” becomes a Slack thread instead of a 30-second dashboard check.

Every person who touches your product in some capacity — PM, design, support, on-call — should be able to read flag state without asking an engineer.

Test the off state

When you add a flag, write a test for both branches — not just the new behavior you’re building.

describe('checkout flow', () => {
  it('shows new flow when flag is enabled', () => {
    mockFlag('new-checkout-flow', true);
    // ...
  });

  it('shows legacy flow when flag is disabled', () => {
    mockFlag('new-checkout-flow', false);
    // ...
  });
});

The off state breaks silently. The old code path doesn’t get touched during development, gets skipped in manual testing, and only fails in production when a flag gets rolled back. Test both.

Define a rollout pace

“We’ll do a gradual rollout” is not a plan. Before you flip the flag, decide:

What percentage you’ll start at (common: 1–5%)
What metric signals it’s safe to expand (error rate, p99 latency, specific event count)
How long you’ll hold at each stage before expanding
What triggers a rollback
Write this down somewhere. It doesn’t need to be formal — a comment in the cleanup ticket is enough. The goal is that whoever is on-call at 2am knows the answer to “is this flag supposed to be at 10%?” without having to find the person who created it.

Audit logs are not optional

You will have a production incident caused by a flag change. Someone will toggle something at the wrong time, or a targeting rule will match more users than expected, or the SDK will behave differently on an edge case.

When that happens, the first question is: what changed, when, and who changed it? An audit log that captures every flag toggle, rule edit, and segment update is how you answer that in minutes instead of hours.

This is not a nice-to-have — it’s incident response infrastructure. Flaggy includes audit logging on all plans, including the Free tier (7-day retention). The Team plan extends this to 90 days — worth considering if your incident review windows run longer.

If you're looking for a feature flag tool built around these practices — with automatic key generation from plain-text names, flag lifecycle visibility, and a clean dashboard — check out Flaggy.

DEV Community