Feature flag driven development

#saas #webdev #startup #devops

A workflow where every change ships behind a flag. How feature flag driven development works, why it pairs with trunk-based development, and the habits that keep it from becoming flag debt.

Feature flag driven development is a workflow where new code ships to production behind a flag by default, and the decision to expose it to users is made afterward, from a dashboard, rather than at deploy time. The unit of work isn’t “merge and it’s live” — it’s “merge it off, then turn it on when you’re ready.” Done consistently, it changes how a team thinks about risk, releases, and the difference between shipping code and releasing a feature.

This builds directly on the basics. If you’re not yet sure what a flag is or how one evaluates, start with what are feature flags and come back — this article is about the development practice built on top of that primitive.

What “flag driven” actually changes

In a conventional workflow, merging to main and deploying are the moment a feature goes live. That couples two things that don’t need to be coupled: the act of putting code in production and the act of showing it to users.

Feature flag driven development decouples them. Every non-trivial change lands wrapped in a flag, defaulted off:

import { flaggy } from '@flaggy.io/sdk-js';

const client = flaggy({ apiKey: process.env.FLAGGY_API_KEY });
await client.initialize();

if (client.isEnabled('release-new-search', { key: user.id })) {
  return renderNewSearch();
}
return renderLegacySearch();

The code is deployed. It’s running in production. But until someone flips release-new-search on in the dashboard, no user sees it. Release becomes a deliberate, reversible action — a dashboard toggle, picked up by clients on their next background refresh (about a minute) — instead of a deploy.

That one shift cascades into everything else this article covers.

It only works with trunk-based development

Flag driven development and trunk-based development are two halves of the same practice. You can’t really do one without the other.

The problem flags solve is “how do I keep merging to main without releasing half-finished work?” The answer is: merge it behind an off flag. That answer only matters if you’re actually merging to main frequently — which is the definition of trunk-based development. Long-lived feature branches don’t need flags to hide unfinished work; the branch already hides it. The cost is the merge hell you get when that branch is finally integrated.

So the loop is:

Work in small increments on short-lived branches (or directly on main).
Each increment merges behind a flag that’s off in production.
The pipeline deploys every merge continuously.
The feature comes together in main, invisible, until it’s complete and you turn the flag on.
You get the integration benefits of trunk-based development — no divergent branches, no big-bang merges — without exposing work in progress. The flag is what makes “commit incomplete work to main” safe.

The development loop, step by step

Here’s what a single feature looks like under this model.

Create the flag first. Before you write the gated code, create the flag in the dashboard. In Flaggy you enter a plain-text name like “Release new search” and the key release-new-search is generated for you. Naming it by intent — Release, Experiment, Kill switch, Ops — encodes its lifespan up front, a habit covered in feature flag best practices.
Create the cleanup ticket immediately. “Remove release-new-search after full release.” The single most reliable way to avoid flag debt is to write the removal ticket at creation, before the rollout makes everyone forget. The sequence is: create flag → create cleanup ticket → merge code → roll out → remove flag → close ticket.
Wrap the feature at its boundary. Put the flag check at the entry point of the feature — the component or route that decides which version renders — not buried deep in a utility function. A boundary check is easy to find and easy to delete later. A check three layers down in a shared helper is how flags become permanent by accident.
Build both branches, test both branches. Write a test for the on state and the off state. The off path is the one that breaks silently: nobody touches it during development, manual testing skips it, and it only fails when you roll the flag back in production.

describe('search', () => {
  it('renders new search when flag is on', () => {
    mockFlag('release-new-search', true);
    // ...
  });
  it('renders legacy search when flag is off', () => {
    mockFlag('release-new-search', false);
    // ...
  });
});

Merge and deploy off. The code goes to production with the flag off. Nothing changes for users. You can do this dozens of times across a multi-week feature.
Release deliberately. When the feature is complete, turn it on — for internal staff first, then a percentage, then everyone — watching your metrics at each stage. This is a percentage rollout, and it gives you canary-style gradual exposure without touching a load balancer.
Remove the flag. Once it’s fully shipped and stable, delete the flag and the legacy branch. Close the cleanup ticket. If you skip this step often enough, you don’t have flag driven development — you have a codebase full of dead conditionals.

What you get for it

Deploy and release become separate decisions. The pipeline runs constantly; releases happen on a product schedule, by people who aren’t necessarily the ones who deployed. A PM can own the “turn it on” moment without filing a deploy request.

Rollback is a toggle, not a redeploy. A bad release flips off and clients pick up the change on their next refresh — roughly a minute — without rebuilding and redeploying the previous version. For changes you expect might need disabling — a third-party integration, a risky migration path — a permanent kill switch is the same pattern with no intent to remove it.

Testing in production becomes safe. You can enable a feature for your own account, or one friendly customer, and exercise it against real production data and load before anyone else sees it. The flag’s targeting rules decide exactly who’s included.

Incomplete work stops blocking releases. Because everything unfinished is off, you can cut a release at any time. There’s no “wait, don’t deploy, my half-built feature is on main” — it’s on main, but it’s off.

The failure mode: flag debt

The honest downside of putting everything behind a flag is that you generate flags faster than any other workflow, and if you don’t remove them, you drown. After a year of undisciplined flag driven development you have 200 flags, half of them permanently on, nobody sure which ones are load-bearing.

The discipline that prevents this is entirely about lifecycle, and it’s the subject of feature flag management. The short version:

Name by intent and lifespan, so Release flags are visibly different from Ops flags at a glance.
Create the cleanup ticket at creation, every time.
Treat a stale release flag as a bug, not a backlog item. A release flag that’s been at 100% for a month is dead code waiting to be deleted.
Use an audit log so every toggle and rule change is attributable when an incident review asks “what changed?”
Flag driven development without removal discipline isn’t a different practice — it’s the same practice with the bill unpaid.

A note on what flags can and can’t do

Two things worth being precise about, because they shape how you design the workflow:

Flags are booleans. A flag is on or off. There’s no “variant A / variant B / variant C” multivariate value. For an A/B test you use a boolean plus a percentage rollout — half the users get true, half get false, and you compare a metric across the two. That covers experiment-driven development cleanly; it just means you model variants as separate flags rather than one flag with many values.

Releases aren’t instant. SDKs evaluate flags locally against a ruleset they refresh by background polling — about every minute by default. So a dashboard toggle reaches users on their next refresh, not the same second you click it. The dashboard action needs no deploy and no code change, which is the real win; just don’t design a workflow that assumes a flag change is visible everywhere within seconds. Evaluation itself is an in-memory lookup, so flags add no latency to your app — see how feature flags work for the mechanics.

Getting started without boiling the ocean

You don’t adopt flag driven development by flagging your entire backlog on day one. Start with the next risky thing you ship:

Pick a feature you’d normally be nervous to deploy.
Put it behind a release flag, off.
Merge and deploy it dark. Confirm nothing changed for users.
Turn it on for yourself, then 5%, then everyone, watching error rates.
When it’s stable, delete the flag.

Do that three or four times and the workflow stops feeling like overhead and starts feeling like the default safe way to ship. From there it generalizes: kill switches around integrations, ops flags for tuning, experiments for product questions. Framework-specific setup is covered in our guides for JavaScript, React, and Angular.

Flaggy is built for this workflow — local-evaluation SDKs that add no latency, targeting and segments, percentage rollouts, analytics, and a full audit log on a flat $99/month Team plan with unlimited seats, so everyone who owns a release can actually flip the flag.

Feature flag driven development FAQ

Is feature flag driven development the same as trunk-based development? They’re complementary, not identical. Trunk-based development is about merging small changes to main frequently; flag driven development is what makes that safe by hiding unfinished work behind off flags. In practice you do both together.
Doesn’t flagging everything create a mess? Only if you don’t remove flags. The practice depends on a removal discipline — cleanup tickets at creation, treating stale release flags as bugs. See feature flag management.
Can I A/B test with this approach? Yes, with a boolean flag and a percentage rollout: half your users get the feature, half don’t, and you compare a metric. There are no multivariate flags, so each variant you test is modeled as its own flag.
How fast does turning a flag on take effect? Clients pick up a change on their next background refresh — about a minute by default — because SDKs poll for the current ruleset rather than holding an open connection. The dashboard change itself is immediate and needs no deploy.
Do I need a vendor, or can I use a config file? For a handful of permanent flags, a config file works. The moment you want targeting, percentage rollouts, an audit trail, and non-engineers flipping flags safely, a dedicated feature flag tool earns its place.