Ian Johnson

Posted on May 25

Feature Flags That Forgot to Leave

#webdev #agents #devops #programming

A feature flag goes into the codebase to make a rollout safer. The new behavior lives behind the flag. The team turns the flag on for one customer, then ten, then everyone. The rollout succeeds.

The flag is still in the code.

It is still in the code six months later. Still in the code a year later. The team that added it has rotated. The flag has been "on" for everyone for so long that nobody remembers the old behavior. The branch behind the false value is unreachable in any environment, and yet the code remains, and every reader has to mentally evaluate both branches every time they encounter the flag.

Feature flag debt is the slowest-moving anti-pattern in most codebases. It does no damage on any given day, and it accumulates anyway. Agents make the accumulation worse.

What flag debt costs

A live feature flag in the code is not free. It is a branch: a real one, in the control flow sense, even if no environment actually traverses both sides.

A reader has to evaluate both branches. A reviewer has to consider whether a change to one branch should also apply to the other. A test suite has to either cover both branches or accept that one of them is untested. A monitoring system that catches errors has to do so for code paths that, in production, might never run.

When the flag was new and the rollout was live, all of this was worth it. After the rollout, none of it is. The cost stays; the value left.

Multiply this by every flag your team has ever shipped and never cleaned up. The codebase becomes a thicket of dormant branches, each one a small cognitive tax, none of them individually large enough to be worth a cleanup PR. The team works around them, slowly, paying the tax in attention rather than in time.

How agents make it worse

Agents reason about both branches of a flag. Asked to refactor a function that contains a flag check, the agent will preserve the structure, update both branches, and present a diff that respects the conditional. The agent is doing the right thing. It does not know that one branch is dead. The cost is that every refactor touching flagged code touches dead code, which adds noise to the diff and time to the review.

More subtly, agents will pattern-match against existing flag usage and produce new flag usage. A codebase with twenty stale flags teaches the agent that wrapping new behavior in a flag is the local idiom. The agent helpfully writes more flags. Each new flag has the same lifecycle problem the existing ones did.

The combination is that flag debt does not just stay; it grows. The codebase that has tolerated flag accumulation produces an agent that produces more flag accumulation.

The lifecycle nobody runs

Every feature flag has, in principle, a lifecycle. It is added. It is rolled out. It is fully enabled. It is cleaned up. The cleanup step is the one teams skip.

The skip is structural, not lazy. The team that added the flag has moved on by the time it is fully rolled out. The cleanup is not anybody's current priority. There is no urgency. The system works whether the flag is cleaned up or not. There is no automated reminder, because most flag systems do not have one. So the cleanup sits in a backlog that grows by one row every time a new flag ships, and shrinks by one row almost never.

The fix is not better intentions. It is making the cleanup a step in the rollout, not a follow-up to it. The flag is not "done" when it is fully enabled; it is done when the code is removed.

Tooling helps

Modern feature flag platforms (LaunchDarkly, Unleash, Statsig, the open-source equivalents) increasingly include staleness detection. They report flags that have been at 100% (or 0%) for some period, flags with no usage, flags that nobody has updated in months. The reports give the team a target list without requiring anybody to remember.

For teams not on a platform, the analog is a script. Walk the codebase, find every reference to a flag-checking function, cross-reference with the flag store. Output a table of flags by age and current value. Run it weekly. Anyone can write this script in an afternoon; the value is in actually looking at the output.

The tooling does not delete the flags. It surfaces the ones that can be deleted. The deletion is still a human or agent decision, and it is still a code change. But it is no longer the question "what flags should we clean up?"; it is the question "should we clean up these specific flags this week?" The second question gets answered. The first does not.

What the cleanup looks like

Removing a flag is a small, well-defined refactor. The agent is good at it, given a clear instruction.

Pick the flag. Determine the value it has been pinned to for the long term: usually true, sometimes false. Replace every reference to the flag's check with that value. Simplify the resulting conditionals: if (true) { ... } becomes the body of the if; if (false) { ... } becomes nothing. Run the tests. The diff is mechanical. Most flags can be removed in a single PR by a single contributor in under an hour.

The work scales. A team that removes one flag per week ends a year with fifty fewer flags. The cumulative effect on readability is meaningful.

First steps

If your codebase has accumulated stale flags:

Inventory them. Pull the list of every flag your team has ever defined. For each, note the current value, when it was last changed, and whether the rollout it was created for has concluded.

Sort by age, descending. The top of the list is your cleanup target. Pick the oldest flag whose rollout is clearly done, one that has been at 100% for longer than anyone remembers, and remove it. One PR. Ship it.

Add a step to your flag-creation process: every new flag has an owner and a target removal date, written into the flag's description in the flag platform. The dates do not have to be precise, but they have to exist. A flag without a removal plan is a flag that will never be removed.

Add a recurring item to your team's weekly or biweekly review: stale flag report. Look at it. Pick one to clean up. Assign it. Move on.

Add a rule to AGENTS.md: "When making changes that touch a feature flag, check whether the flag's rollout has concluded. If yes, propose removal of the flag in the same change. Do not introduce new feature flags without an owner and a target removal date in the flag's description."

Feature flags are useful. The mistake is treating them as permanent fixtures rather than temporary scaffolding. Scaffolding stays up only as long as the construction needs it. After that, it is not scaffolding; it is junk in the yard. The same applies to flags.

The codebase that uses flags well is one that adds them confidently because it trusts itself to remove them when the work is done. Building that trust is a matter of running the cleanup, repeatedly, until it becomes the default rather than the exception.

DEV Community