DEV Community: Flaggy

Feature flag driven development

Flaggy — Mon, 08 Jun 2026 04:39:00 +0000

A workflow where every change ships behind a flag. How feature flag driven development works, why it pairs with trunk-based development, and the habits that keep it from becoming flag debt.

Feature flag driven development is a workflow where new code ships to production behind a flag by default, and the decision to expose it to users is made afterward, from a dashboard, rather than at deploy time. The unit of work isn’t “merge and it’s live” — it’s “merge it off, then turn it on when you’re ready.” Done consistently, it changes how a team thinks about risk, releases, and the difference between shipping code and releasing a feature.

This builds directly on the basics. If you’re not yet sure what a flag is or how one evaluates, start with what are feature flags and come back — this article is about the development practice built on top of that primitive.

What “flag driven” actually changes

In a conventional workflow, merging to main and deploying are the moment a feature goes live. That couples two things that don’t need to be coupled: the act of putting code in production and the act of showing it to users.

Feature flag driven development decouples them. Every non-trivial change lands wrapped in a flag, defaulted off:

import { flaggy } from '@flaggy.io/sdk-js';

const client = flaggy({ apiKey: process.env.FLAGGY_API_KEY });
await client.initialize();

if (client.isEnabled('release-new-search', { key: user.id })) {
  return renderNewSearch();
}
return renderLegacySearch();

The code is deployed. It’s running in production. But until someone flips release-new-search on in the dashboard, no user sees it. Release becomes a deliberate, reversible action — a dashboard toggle, picked up by clients on their next background refresh (about a minute) — instead of a deploy.

That one shift cascades into everything else this article covers.

It only works with trunk-based development

Flag driven development and trunk-based development are two halves of the same practice. You can’t really do one without the other.

The problem flags solve is “how do I keep merging to main without releasing half-finished work?” The answer is: merge it behind an off flag. That answer only matters if you’re actually merging to main frequently — which is the definition of trunk-based development. Long-lived feature branches don’t need flags to hide unfinished work; the branch already hides it. The cost is the merge hell you get when that branch is finally integrated.

So the loop is:

Work in small increments on short-lived branches (or directly on main).
Each increment merges behind a flag that’s off in production.
The pipeline deploys every merge continuously.
The feature comes together in main, invisible, until it’s complete and you turn the flag on.
You get the integration benefits of trunk-based development — no divergent branches, no big-bang merges — without exposing work in progress. The flag is what makes “commit incomplete work to main” safe.

The development loop, step by step

Here’s what a single feature looks like under this model.

Create the flag first. Before you write the gated code, create the flag in the dashboard. In Flaggy you enter a plain-text name like “Release new search” and the key release-new-search is generated for you. Naming it by intent — Release, Experiment, Kill switch, Ops — encodes its lifespan up front, a habit covered in feature flag best practices.
Create the cleanup ticket immediately. “Remove release-new-search after full release.” The single most reliable way to avoid flag debt is to write the removal ticket at creation, before the rollout makes everyone forget. The sequence is: create flag → create cleanup ticket → merge code → roll out → remove flag → close ticket.
Wrap the feature at its boundary. Put the flag check at the entry point of the feature — the component or route that decides which version renders — not buried deep in a utility function. A boundary check is easy to find and easy to delete later. A check three layers down in a shared helper is how flags become permanent by accident.
Build both branches, test both branches. Write a test for the on state and the off state. The off path is the one that breaks silently: nobody touches it during development, manual testing skips it, and it only fails when you roll the flag back in production.

describe('search', () => {
  it('renders new search when flag is on', () => {
    mockFlag('release-new-search', true);
    // ...
  });
  it('renders legacy search when flag is off', () => {
    mockFlag('release-new-search', false);
    // ...
  });
});

Merge and deploy off. The code goes to production with the flag off. Nothing changes for users. You can do this dozens of times across a multi-week feature.
Release deliberately. When the feature is complete, turn it on — for internal staff first, then a percentage, then everyone — watching your metrics at each stage. This is a percentage rollout, and it gives you canary-style gradual exposure without touching a load balancer.
Remove the flag. Once it’s fully shipped and stable, delete the flag and the legacy branch. Close the cleanup ticket. If you skip this step often enough, you don’t have flag driven development — you have a codebase full of dead conditionals.

What you get for it

Deploy and release become separate decisions. The pipeline runs constantly; releases happen on a product schedule, by people who aren’t necessarily the ones who deployed. A PM can own the “turn it on” moment without filing a deploy request.

Rollback is a toggle, not a redeploy. A bad release flips off and clients pick up the change on their next refresh — roughly a minute — without rebuilding and redeploying the previous version. For changes you expect might need disabling — a third-party integration, a risky migration path — a permanent kill switch is the same pattern with no intent to remove it.

Testing in production becomes safe. You can enable a feature for your own account, or one friendly customer, and exercise it against real production data and load before anyone else sees it. The flag’s targeting rules decide exactly who’s included.

Incomplete work stops blocking releases. Because everything unfinished is off, you can cut a release at any time. There’s no “wait, don’t deploy, my half-built feature is on main” — it’s on main, but it’s off.

The failure mode: flag debt

The honest downside of putting everything behind a flag is that you generate flags faster than any other workflow, and if you don’t remove them, you drown. After a year of undisciplined flag driven development you have 200 flags, half of them permanently on, nobody sure which ones are load-bearing.

The discipline that prevents this is entirely about lifecycle, and it’s the subject of feature flag management. The short version:

Name by intent and lifespan, so Release flags are visibly different from Ops flags at a glance.
Create the cleanup ticket at creation, every time.
Treat a stale release flag as a bug, not a backlog item. A release flag that’s been at 100% for a month is dead code waiting to be deleted.
Use an audit log so every toggle and rule change is attributable when an incident review asks “what changed?”
Flag driven development without removal discipline isn’t a different practice — it’s the same practice with the bill unpaid.

A note on what flags can and can’t do

Two things worth being precise about, because they shape how you design the workflow:

Flags are booleans. A flag is on or off. There’s no “variant A / variant B / variant C” multivariate value. For an A/B test you use a boolean plus a percentage rollout — half the users get true, half get false, and you compare a metric across the two. That covers experiment-driven development cleanly; it just means you model variants as separate flags rather than one flag with many values.

Releases aren’t instant. SDKs evaluate flags locally against a ruleset they refresh by background polling — about every minute by default. So a dashboard toggle reaches users on their next refresh, not the same second you click it. The dashboard action needs no deploy and no code change, which is the real win; just don’t design a workflow that assumes a flag change is visible everywhere within seconds. Evaluation itself is an in-memory lookup, so flags add no latency to your app — see how feature flags work for the mechanics.

Getting started without boiling the ocean

You don’t adopt flag driven development by flagging your entire backlog on day one. Start with the next risky thing you ship:

Pick a feature you’d normally be nervous to deploy.
Put it behind a release flag, off.
Merge and deploy it dark. Confirm nothing changed for users.
Turn it on for yourself, then 5%, then everyone, watching error rates.
When it’s stable, delete the flag.

Do that three or four times and the workflow stops feeling like overhead and starts feeling like the default safe way to ship. From there it generalizes: kill switches around integrations, ops flags for tuning, experiments for product questions. Framework-specific setup is covered in our guides for JavaScript, React, and Angular.

Flaggy is built for this workflow — local-evaluation SDKs that add no latency, targeting and segments, percentage rollouts, analytics, and a full audit log on a flat $99/month Team plan with unlimited seats, so everyone who owns a release can actually flip the flag.

Feature flag driven development FAQ

Is feature flag driven development the same as trunk-based development? They’re complementary, not identical. Trunk-based development is about merging small changes to main frequently; flag driven development is what makes that safe by hiding unfinished work behind off flags. In practice you do both together.
Doesn’t flagging everything create a mess? Only if you don’t remove flags. The practice depends on a removal discipline — cleanup tickets at creation, treating stale release flags as bugs. See feature flag management.
Can I A/B test with this approach? Yes, with a boolean flag and a percentage rollout: half your users get the feature, half don’t, and you compare a metric. There are no multivariate flags, so each variant you test is modeled as its own flag.
How fast does turning a flag on take effect? Clients pick up a change on their next background refresh — about a minute by default — because SDKs poll for the current ruleset rather than holding an open connection. The dashboard change itself is immediate and needs no deploy.
Do I need a vendor, or can I use a config file? For a handful of permanent flags, a config file works. The moment you want targeting, percentage rollouts, an audit trail, and non-engineers flipping flags safely, a dedicated feature flag tool earns its place.

Feature flag management: the complete playbook

Flaggy — Sun, 07 Jun 2026 09:48:00 +0000

How to manage feature flags at scale: naming conventions, the flag lifecycle, cleanup discipline, audit trails, access control, and avoiding technical debt.

Creating a feature flag takes thirty seconds. Managing hundreds of them over years is where teams actually struggle. The cost of feature flags isn’t evaluating them — it’s the ones nobody removes, the targeting rules nobody understands, and the flag state nobody can see. Feature flag management is the set of practices that keep a growing flag system from turning into a liability.

If you’re still deciding whether to adopt flags at all, start with what are feature flags. This guide assumes you have flags and now need to govern them.

The flag lifecycle

Every flag has a lifespan, and most management problems come from ignoring it. A healthy flag moves through clear stages:

Created — with an owner, a category, and (for temporary flags) a removal plan.
Rolling out — enabled for internal users, then a percentage, then everyone.
Fully rolled out — at 100% for all users. For a temporary flag, this is the signal to start removing it.
Removed — both the dashboard flag and the dead else branch in code are deleted.
Permanent flags (kill switches, ops toggles) skip step 4 and live indefinitely. The whole discipline of flag management is making sure temporary flags actually reach step 4 instead of lingering at step 3 forever.

Name flags by intent, not implementation

A flag named “New dashboard v2” tells you what it wraps but not why it exists or when it ends. A useful feature flag naming convention uses a prefix that encodes the flag’s category — and therefore its expected lifespan:

release — gates a new feature during rollout. Temporary; remove after full release. e.g. release-dashboard-redesign
experiment — an A/B test with a defined end date. e.g. experiment-checkout-button
kill-switch — a permanent circuit breaker. Not expected to be removed. e.g. kill-switch-payments
ops — operational tuning, may be permanent. e.g. ops-cache-ttl
When you scan a list of 80 flags, the prefix instantly separates the temporary from the load-bearing. In Flaggy, you enter names as plain text — “Release dashboard redesign” — and the key release-dashboard-redesign is generated automatically for your code. We go deeper on naming in feature flag best practices.

Create the cleanup ticket at creation time

The single most effective management habit: file the removal ticket before you merge the code that introduces the flag. “Remove flag release-feature-name after full release.” A release flag’s lifespan is “rollout plus one release cycle” — make that explicit while the context is fresh, because nobody remembers to clean up a flag six months later when they’ve moved on to other work.

Fight flag debt with scheduled reviews

Even with good habits, some flags slip through. Run a recurring review — quarterly works for most teams — that pulls every flag unchanged in the last 90 days:

Fully rolled out and temporary - Delete it, and remove the dead branch in code.
A forgotten experiment - The test is over. Delete it.
Permanent (kill switch / ops) - Confirm it still has an owner, and leave it.
Use your platform’s analytics to see which flags are still actually evaluated in production. A flag that hasn’t been read by any SDK in months is almost always safe to remove.

Keep an audit trail

When a flag flips in production and something breaks, the first question is “who changed what, and when?” Without a record you’re guessing. An audit log that captures every flag change — value, targeting rule, who made it, timestamp — turns an incident postmortem from speculation into a lookup. Once more than one person can touch flags, this stops being optional.

Control who can change what

A junior engineer toggling a release flag in staging is fine. The same person flipping a kill switch on the payment system in production is not. Mature flag management means role-based access: who can create flags, who can edit production targeting, who can only view. Tie changes to identities (not a shared key) so the audit log is meaningful.

Centralize, don’t scatter

Flags spread across environment variables, hard-coded config files, and three different tools are impossible to govern — there’s no single list to review, no shared audit trail, no consistent naming. The foundation of feature flag management is a single flag management dashboard where every flag, rule, and change lives. Everything else in this playbook — reviews, audit, access control — depends on having one place to look.

A management checklist

Every flag has an owner and a category prefix
Temporary flags have a removal ticket filed at creation
Stale flags are reviewed on a fixed schedule
Every change is recorded in an audit log
Production changes are gated by role-based access
All flags live in one dashboard, not scattered across configs

Flaggy is built around exactly this — centralized flag management, segmentation, usage analytics, and a complete audit log — on a flat $99/month plan with unlimited seats and no per-user fees.

How to use feature flags in JavaScript and TypeScript

Flaggy — Wed, 20 May 2026 02:17:00 +0000

A practical guide to implementing feature flags in JS/TS: what they are, when to use them, and how to avoid the traps that make them painful to manage.

Feature flags are a conditional: if (flagEnabled) { show new thing } else { show old thing }. That’s the entire concept. The value comes from where the condition is evaluated and who controls it — not the developer at deploy time, but a dashboard at runtime.

This guide covers the core patterns, when each one applies, and what separates a well-managed flag from one that quietly rots in your codebase.

The basic pattern

A feature flag SDK gives you a function. You call it with a flag key, optionally a user context, and it returns a boolean (or a variant, for multivariate flags).

import { flaggy } from '@flaggy.io/sdk-js';

const client = flaggy({ apiKey: process.env.FLAGGY_API_KEY });
await client.initialize();

if (client.isEnabled('release-checkout-flow', { key: user.id })) {
  return <NewCheckout />;
}
return <LegacyCheckout />;

The SDK downloads your ruleset on initialization and evaluates locally. No network call happens at the point of evaluation — it’s a dictionary lookup against rules in memory. This is why MAU-based pricing from some vendors doesn’t make technical sense: the vendor’s infrastructure isn’t involved when a flag evaluates.

When to use a feature flag

Release control is the most common case. You merge a half-finished feature to main behind a flag, deploy continuously, and flip the flag when it’s ready. No long-lived feature branches, no merge conflicts.

Percentage rollouts let you ship to 5% of users, watch your error rate, and expand if it looks clean. This is a canary deployment without the infrastructure complexity of actually routing traffic.

User targeting lets you ship to internal users, beta testers, or a specific customer segment first. You define a segment in the dashboard — “users where plan = enterprise” — and enable the flag for that segment.

Kill switches are flags you never intend to remove. A circuit breaker on a third-party integration, a way to disable a feature if it’s causing support volume. Having this in a dashboard is faster than a deploy.

The initialization pattern

Initialize the SDK once at app startup and share the client. Don’t initialize per-request or per-component — it defeats the local evaluation model.

// flaggy.ts — initialize once, export everywhere
import { flaggy } from '@flaggy.io/sdk-js';

export const flags = flaggy({
  apiKey: process.env.FLAGGY_API_KEY!,
});

await flags.initialize();
// any component or module
import { flags } from './flaggy';

const showFeature = flags.isEnabled('my-feature', { key: currentUser.id });

Targeting rules

Most SDKs let you pass a context object — any key/value pairs you want to use for targeting. The dashboard lets you define rules against those attributes without a code change.

Common context attributes:

flags.isEnabled('experiment-dark-mode', {
  key: user.id,
  email: user.email,
  plan: user.plan,           // target by billing tier
  accountAge: daysSinceSignup,
  country: user.countryCode,
});

You define the rule once in the dashboard: “users where plan = team AND accountAge > 30”. No deploy required to change targeting.

The cleanup problem

The real cost of feature flags isn’t evaluating them — it’s the ones you forget to remove. A codebase with 200 permanent flags is harder to reason about than one with 20.

A few habits that help:

Add a ticket at flag creation. When you create a flag, immediately create a ticket to remove it after the rollout is done. The flag has a lifespan of “rollout + one release cycle.”

Name flags by lifecycle prefix. Use a prefix that encodes intent: release-new-dashboard, experiment-checkout-button, kill-switch-payment-provider, ops-cache-ttl. It makes lifespan visible at a glance — you can immediately separate flags that need cleanup from ones that are permanent.

Review old flags quarterly. Pull a list of flags that haven’t changed in 90 days. Most of them are either fully rolled out (safe to remove) or forgotten experiments.

Server-side vs client-side evaluation

Some teams evaluate flags server-side to avoid any flag state being visible in the client bundle. The tradeoff:

Client-side: zero latency at evaluation, no server dependency, flag rules are technically visible in the bundle (usually fine for most use cases)
Server-side: rules stay private, adds one lookup per request (fast if in-memory, latency concern if remote)
For most product flags — rollouts, A/B tests, kill switches — client-side evaluation is fine. For flags that gate access control or pricing, server-side is worth the overhead.

What to look for in a flag platform

You need: a dashboard to manage flags without a code change, targeting rules, audit history (who changed what and when), and an SDK that evaluates locally.

What you don’t need to pay for: MAU metering. Modern SDKs evaluate client-side. The vendor’s servers aren’t involved in evaluations — if you’re paying per-MAU, you’re paying for a proxy metric that doesn’t reflect actual vendor cost.

Flaggy’s Team plan is $99/month flat — unlimited seats, no MAU fees, full flag analytics and audit log included.

Feature flag best practices for engineering teams

Flaggy — Tue, 19 May 2026 04:40:00 +0000

The patterns that make feature flags useful long-term, and the habits that keep them from becoming a maintenance burden. Drawn from common failure modes.

Feature flags are simple in concept. The failure modes are also simple, and predictable. Most teams hit the same three problems: flags that never get cleaned up, targeting rules nobody understands, and flag state that’s invisible to the people who need it.

Here are the practices that prevent each of those.

Name flags by intent, not implementation

A flag named “New dashboard v2” tells you what it wraps but not why it exists or when it ends. A flag named “Release dashboard redesign” communicates both that it’s temporary and what it controls.

A useful naming convention uses a prefix word to encode the flag’s lifecycle. In Flaggy, names are entered as plain text and the key is generated automatically — “Release dashboard redesign” becomes the key release-dashboard-redesign in your code.

Useful prefixes:

Release — enables a new feature during a gradual release. Intended to be removed once fully shipped. e.g. “Release dashboard redesign”
Experiment — an A/B test. Has a defined end date. e.g. “Experiment checkout button color”
Kill switch — a permanent circuit breaker. Not expected to be removed. e.g. “Kill switch payments integration”
Ops — a configuration flag for operational tuning. May be permanent. e.g. “Ops cache TTL override”

The prefix communicates lifespan at a glance. When you’re reviewing a list of 50 flags, you can immediately separate the ones that should be cleaned up from the ones that are load-bearing.

Create the cleanup ticket at flag creation

The most reliable way to remove a flag is to plan for it before you merge. When you create a flag in the dashboard, immediately create a ticket in your issue tracker: “Remove flag rollout-feature-name after full rollout.”

Without this step, the flag gets created with full intention to clean it up, but the rollout ends, the team moves on, and the ticket never gets written. Flags accumulate. After a year you have 150 flags and nobody knows which ones are active.

The sequence is: create flag → create cleanup ticket → merge code → run rollout → remove flag → close ticket.

Set flag context consistently

The user context you pass to the SDK determines what targeting rules you can write. If different parts of the app pass different context attributes, you end up with targeting rules that silently don’t apply everywhere.

Define a canonical context object and build it in one place:

function buildFlagContext(user: User): FlagContext {
  return {
    key: user.id,
    email: user.email,
    plan: user.subscriptionPlan,
    teamId: user.teamId,
    accountAgeDays: differenceInDays(new Date(), user.createdAt),
    country: user.countryCode,
  };
}

Use this everywhere. When you add a new attribute, add it here and it becomes available for targeting across the entire app.

Keep flag logic at the boundary, not deep in components

Flag checks embedded deep in component trees or utility functions are hard to find and easy to forget. Prefer pushing flag evaluation to the boundary of a feature — the entry point that decides which version to render.

Harder to clean up:

// buried in a utility function
function formatPrice(amount: number) {
  if (flags.isEnabled('new-pricing-display')) {
    return newFormat(amount);
  }
  return legacyFormat(amount);
}

Easier to clean up:

// at the component boundary
function PricingSection({ user }) {
  const useNewDisplay = flags.isEnabled('new-pricing-display', { key: user.id });
  return useNewDisplay ? <NewPricingDisplay /> : <LegacyPricingDisplay />;
}

The second version makes the flag visible at a high level. When you remove it, you delete the condition and the LegacyPricingDisplay import — the diff is obvious.

Use segments for reusable targeting logic

If you find yourself writing the same targeting rule on multiple flags — “users where plan = team AND accountAge > 30 days” — define it as a segment once and target the segment.

Benefits:

Rules are consistent. If your definition of “beta users” changes, you update it in one place.
Auditing is easier. You can see which flags target a segment.
Onboarding new team members is simpler — they see named segments, not repeated rule logic.
Segments with clear names also make flag state legible to non-engineers. “This flag is enabled for the Beta Users segment” is understandable to a PM in a way that a list of attribute conditions isn’t.

Make flag state visible to the right people

The support engineer who gets a ticket about a broken feature needs to check flag state. The PM managing a rollout wants to see current coverage. The on-call engineer debugging an incident needs to know which flags are currently active.

This is the hidden argument for unlimited seats: restricting dashboard access to the engineers who created the flags means everyone else is guessing. “Is this behind a flag?” becomes a Slack thread instead of a 30-second dashboard check.

Every person who touches your product in some capacity — PM, design, support, on-call — should be able to read flag state without asking an engineer.

Test the off state

When you add a flag, write a test for both branches — not just the new behavior you’re building.

describe('checkout flow', () => {
  it('shows new flow when flag is enabled', () => {
    mockFlag('new-checkout-flow', true);
    // ...
  });

  it('shows legacy flow when flag is disabled', () => {
    mockFlag('new-checkout-flow', false);
    // ...
  });
});

The off state breaks silently. The old code path doesn’t get touched during development, gets skipped in manual testing, and only fails in production when a flag gets rolled back. Test both.

Define a rollout pace

“We’ll do a gradual rollout” is not a plan. Before you flip the flag, decide:

What percentage you’ll start at (common: 1–5%)
What metric signals it’s safe to expand (error rate, p99 latency, specific event count)
How long you’ll hold at each stage before expanding
What triggers a rollback
Write this down somewhere. It doesn’t need to be formal — a comment in the cleanup ticket is enough. The goal is that whoever is on-call at 2am knows the answer to “is this flag supposed to be at 10%?” without having to find the person who created it.

Audit logs are not optional

You will have a production incident caused by a flag change. Someone will toggle something at the wrong time, or a targeting rule will match more users than expected, or the SDK will behave differently on an edge case.

When that happens, the first question is: what changed, when, and who changed it? An audit log that captures every flag toggle, rule edit, and segment update is how you answer that in minutes instead of hours.

This is not a nice-to-have — it’s incident response infrastructure. Flaggy includes audit logging on all plans, including the Free tier (7-day retention). The Team plan extends this to 90 days — worth considering if your incident review windows run longer.

If you're looking for a feature flag tool built around these practices — with automatic key generation from plain-text names, flag lifecycle visibility, and a clean dashboard — check out Flaggy.

Feature flag rollouts: canary-style risk reduction without the infrastructure

Flaggy — Mon, 18 May 2026 11:22:03 +0000

Canary deployments and feature flag rollouts solve the same problem at different layers. Here's how they actually work, when to use each, and how to combine them.

A canary deployment releases a new version of your application to a small slice of users before rolling it out fully. If something breaks, you catch it at 5% of traffic instead of 100%.

Feature flags solve the same problem at a different layer. Understanding the distinction helps you pick the right tool — and combine them when it makes sense.

How traditional canary deployments work

A canary deployment means running two versions of your application simultaneously in production. Version A is your current stable release. Version B is the new version. A routing layer — a load balancer, a service mesh, or your orchestration platform — directs a percentage of traffic to version B while the rest continues to hit version A.

In Kubernetes, the most common implementation uses two Deployments sharing a single Service selector. The Service routes traffic across all matching pods, so the traffic split is determined by the ratio of replicas:

# stable: 9 replicas
# canary: 1 replica
# → roughly 90% / 10% split

More sophisticated setups use an ingress controller (like NGINX or Traefik) or a service mesh (like Istio or Linkerd) to control the split with explicit weights, independent of replica count. This gives you precise percentages without scaling your fleet.

The key point: a canary deployment is an infrastructure strategy. Both versions of your code are running in production. The routing layer decides which users hit which version.

How feature flag rollouts work

A feature flag rollout is an application-layer strategy. You deploy a single version of your application to all servers. Inside that code, a flag evaluation controls which code path executes for a given user.

import { flaggy } from '@flaggy.io/sdk-js';

const client = flaggy({ apiKey: process.env.FLAGGY_API_KEY });
await client.initialize();

const useNewAlgorithm = client.isEnabled('new-recommendation-engine', {
  key: currentUser.id,
});

const results = useNewAlgorithm
  ? newRecommendationEngine(user)
  : legacyRecommendationEngine(user);

The key field ensures the same user always gets a consistent experience — the percentage is applied by hashing the key, not by random chance per request.

The key point: a feature flag rollout is a code strategy. One binary is deployed. The flag determines which behavior that binary exhibits for each user.

The difference in practice

Both approaches answer the same question: is this change safe at scale before committing fully? But they operate at different layers and have different strengths.

Canary deployments are managed by infrastructure or DevOps teams. Rollback means shifting traffic back to the stable version and eventually tearing down the canary — it’s a deployment operation. All changes in the new version are rolled back together; you can’t independently revert one feature if it shares a deployment with others.

Feature flag rollouts are managed by the team shipping the feature. Rollback means flipping a flag — it takes seconds and requires no deployment. Each feature has its own flag, so you can roll back one independently without touching anything else in the same release.

Another difference: canary deployments work at the traffic level, not the user level. You can route based on headers, cookies, or geographic region — but you’re routing requests, not targeting specific users. Feature flags can target individual users, user attributes, cohorts, or segments. You can enable a feature for your beta group specifically, then expand to a percentage of everyone else.

Percentage rollouts with Flaggy

The basic pattern is a flag with a percentage rollout rule. Start at 5%, verify your metrics look clean, expand in stages.

Monitoring during rollout

Before you expand, you need something to watch:

Error rates. Instrument both code paths with error tracking. An increase in exceptions for users in the new path is the clearest signal something is wrong.

try {
  const results = useNewAlgorithm
    ? newRecommendationEngine(user)
    : legacyRecommendationEngine(user);
  return results;
} catch (err) {
  errorTracker.capture(err, {
    tags: { flagVariant: useNewAlgorithm ? 'new' : 'legacy' }
  });
  throw err;
}

Latency. New code paths are often slower. Tag your performance measurements with the flag variant so you can compare p50/p99 between the two groups.

Business metrics. For user-facing features, watch the metrics that matter for the feature itself — conversion rate, click-through rate, session length. These take longer to accumulate signal but catch subtler problems that error rates miss.

Flag evaluation split. Flaggy’s analytics show the true/false breakdown for each flag in real time. If you set a 10% rollout and the split reads 0% or 100%, something is wrong before your error tracker has had time to accumulate signal.

The rollout ladder

A sensible default for most features:

You can compress this for low-risk changes or extend it for changes with high blast radius. The goal is giving each stage enough time to accumulate error budget signal, not following the ladder for its own sake.

Write the ladder down before you start. On-call engineers shouldn’t have to find the person who created the flag to know whether 25% coverage is expected or a bug.

Staged rollouts: internal team first

Percentage rollouts distribute randomly across your user base. For higher-risk changes, you want more control over who gets the new behavior first.

The pattern: enable the flag for internal users or beta testers before opening any percentage rollout. If something breaks, your team finds it — not your customers.

In Flaggy:

Create a segment: "users where email ends with @yourcompany.com"
Enable the flag for that segment — your team gets the new behavior
Run it internally for a few days, file bugs, fix them
Then start the percentage rollout for the broader user base
You can stack these rules: the flag is on for the internal segment and for 5% of everyone else. The two targeting rules work independently.

The Free plan includes 2 segments — enough for an internal team and a beta group. The Team plan has unlimited segments.

When to use infrastructure canary deployments instead

Feature flag rollouts handle application-level changes well. There are cases where you need infrastructure-level canaries:

Database schema migrations. Running two versions of your app simultaneously lets you verify the new schema is compatible before fully migrating. A flag can’t help if the migration breaks the old code path.

Dependency and runtime upgrades. Upgrading a major version of a core library, or moving to a new runtime version, is safer to test on a slice of real traffic. These changes affect behavior throughout the app in ways that are hard to wrap in a flag.

Infrastructure configuration changes. Changes to server tuning, connection pool sizes, caching behavior — these need to run on real infrastructure to measure their effect. A flag in application code can’t surface OS-level or infrastructure-level impacts.

For everything else — new UI, changed algorithms, new API endpoints, updated business logic — a feature flag rollout is faster to set up, easier to operate, and gives you better targeting control than an infrastructure canary.

Rollback

The practical advantage of a feature flag over a deployment rollback is speed and granularity. Rolling back a deployment takes minutes and reverts everything in that release. Flipping a flag takes seconds and affects only that feature.

When something goes wrong at 10% rollout, you open the dashboard and move the slider to 0%. The broken code path stops executing immediately for all users. Then you fix the bug and run the rollout again — without touching anything else that shipped in the same deployment.

If you want to skip the infrastructure overhead and roll out features gradually with percentage-based targeting, check out Flaggy.