Recently, I was chatting with some co-workers about ways we could improve our deployment pipeline. One tangent of this conversation was the fact that feature flags could be useful in a continuously deploying master situation, as we could easily feather in more features at once.
I've had some experience with flags and wanted to pull together some notes around feature flag usage. Most of this is probably not new information, but I think it is still worthwhile to write down and make a cohesive document.
Let's look at a few scenarios where feature flags are useful.
One of the most common use cases for flagging is to hide some bit of code or feature from users. This helps us keep master continuously deployable (or shippable if you prefer transportation metaphors). Merging extensive feature work into master is dangerous, and we'd never want to hold a codebase back from deploying because of incomplete work. With flagging, we can incorporate our work into master and still have it in a good state. Another benefit of a continuously deployable master is the ability to rollback to any previous commit easily. This gets hairy when external dependencies have changed based on master, but it is helpful most of the time.
Feature flags allow QA to test features incrementally in deployed environments, including production. If they can enable a flag for testing, they can look at in-flight cards related to a given feature, even if it is not complete. This prevents the massive deluge of testing right at the end of a feature branch that so often happens to testers.
Feature flags can also be used as a basic A/B testing solution, especially for quick validation and initial feature development. By conditionally toggling a feature on for a certain percentage of users, the behavior is measured to see how well the feature performs. This is especially powerful if you tell a user they are experiencing a new feature and ask for their feedback. Usually, users are more than willing to give feedback on a new feature and give a little more grace if something doesn't work quite right. The feeling of being an early access user also makes them feel special (or it does for me anyway).
Closely related to A/B testing is the ability to roll out new features slowly. This makes DevOps rejoice and can aid us in the early detection of potential bugs, especially for particularly intricate features that need more attention running in a live environment. For example, think about how we talk to a third-party provider where customers have many different setup configurations. If we could slowly turn that on, we could detect any issues with our configurations' assumptions much sooner.
So when are some good times to incorporate feature flags into your work? While they can be used at various times, two stick out as best practices that I've seen.
When beginning any substantive feature, the first addition to the codebase I do is a feature flag that changes the codebase in a small way. Yet, it is still verifiable by developers and testers.
For instance, if we were going to add a new setting to our application, the first card would be a mechanism to create a toggle that can be flipped used to enable that setting for a user. If the feature is enabled, we can do something simple like print a message to the console. No other code should be written to accomplish the feature.
This flag will be toggled off by default, so a developer or tester will need to toggle it on. The first verifiable chunk of work for this feature will be that the flag exists, and behavior changes somehow when it is enabled. In your code, you will need to easily read from the current state of the flag to alter behavior accordingly.
Make sure you have accounted for the flag being disabled as well. The beauty of flags is that, if built correctly, we can quickly shut off problematic code paths that customers are experiencing in production.
While discussing the new feature, questions might arise about usability, proper copy usage, desired outcomes, and the like. Instead of just guessing answers or committing to an unknown path, throw in some work items for testing out your assumptions.
I don't want to get too deep into the scientific method here, as I'm sure we all know how to experiment from school properly. I want to make two call-outs around clearly summarizing your theory and expected outcomes and making results measurable.
When developing the theory, be clear about what should be tested and the test's intended effect. Let's say we have a button called Sync Now that very clearly syncs now. Some conversation arises around the language; do customers understand what "syncing" means? We want to test the theory that Sync Now is the most appropriate verbiage. Our theory is that Sync Now should result in more clicks than a button that says Send Records. Clicks are our intended outcome, and the button's text is the experimental variable. We now know what to build and what the flag should change.
Finally, the results should be measurable. We need to know how many people saw our different button texts and if they clicked the button. There is plenty of off-the-shelf software that helps with this (Optimizely, for instance), but I recommend a straightforward database table approach. It will not result in too much overhead to record different variables for an experiment in a table. Running SQL against it should immediately answer most questions the bustiness has about the results. As the types of theories that you're testing become more sophisticated, that's when looking into third-party tools becomes more necessary.
Now that we know a few reasons why we're interested in feature flags and a few points in the software lifecycle, we should consider them; let's chat about a few ways to implement them. Like A/B testing, there are plenty of third-party tools that help with these (Firebase Remote Config, ConfigCat). Because flags are relatively trivial to implement, I recommend one of these simple approaches as we're getting started.
Perhaps the simplest way to implement a flag is to read from a value in environment variables. Every language has a way to get at these, and a simple boolean stored here is relatively straightforward to add through Portainer.
The main benefit here is we can disable or enable the feature on entire environments, with a large enough feature that reasonably touches enough areas of the codebase, enabling it while in the local and QA environments always has a huge benefit while testing. No one has to worry about enabling it. It's just on, and production can easily be kept off without fear of it being flipped accidentally.
There are a few downsides to consider. First, flags in the environment typically require full restarts of the container for changes to take effect. This does not impact lower environments as much, but could potentially be nasty in production. Likewise, if flipping between the enabled and disabled states factors into testing, going this route is can cause an enormous headache. Finally, if reading from the environment isn't an idea used in your codebase yet, introducing this paradigm could be overkill if you can access the next options.
By far, the most common way I've seen feature flags implemented is as flags in the database. A simple boolean can be added to an existing table, or a new table of feature flags can be added for a little bit of future-proofing. It then becomes relatively easy to introduce a set of hidden routes or a secret debug menu on a native app to enable and disable flags for the current user.
A large benefit of this is multiple users can be in different experiences, and there is some permanence to the enabled features. Rather than the all-or-nothing approach to environment variables, database flags require a little more programming for a lot more flexibility. It's also easier to visualize how many feature flags are currently in use, and encourage cleaning up old flags that no longer need to be around. Finally, if we need to disable a feature for all users, we can write in database kill switches and have them flipped in any environment via a SQL command.
A drawback includes the increased time spent in the database for actions that might not have spent time there previously. The more running feature flag checks that exist, the more time is spent hitting the database. While this is not a huge performance concern at our scale, it must be monitored and tuned appropriately. Also, the mechanism for disabling and enabling the flags, while not openly broadcast, is normally accessible from the outside. Just be aware of this when determining how to implement your flag code and do a risk assessment of a user stumbling upon a flag and enabling it.
One final approach I've seen commonly is to flash a feature flag to the session or another temporary storage such as local device cache, browser cookies, application storage, etc. This is a great approach for quick flags whose permanence doesn't matter past a few interactions that the user has with the app. These flags can be enabled and disabled identically to database flags.
A downside to temporary storage is just that; it's temporary. Any feature that a user would start using and expect to be there, which can be lost if the cache is cleared or a new session is initiated, needs to remain. This makes temporary storage not a great option. Also, most temporary storage solutions are accessible client-side or through network inspection. Keep this in mind with how you serialize your flags for the user.
I'm impressed you made it down here! Now that we know the basics, let's look at a few best practices that I've seen effective teams do related to feature flags.
At a previous company, we made extensive use of feature flagging to greatly affect our data analysts, product owners, developers, and customers. Our Achilles heel, however, was feature flags that in themselves became features.
While flagging is great during initial development and experimenting, the goal is to create a readable, well-maintained code that is easy to change in our products. Feature flags usually counter this goal and should be ruthlessly eliminated once we're satisfied with the feature. We want all of our customers to have great experiences, not just a subset.
Bias toward eliminating feature flags where possible. Not only does it clean up our codebase, but it can also clean up our databases, client-side footprint, and container environments. If you don't feel like you can eliminate a flag just yet, think how you can make the flag's scope smaller. Can we just put a smaller portion of the feature behind a new flag and turn more of it on for all users?
When introducing any new concept, repetitions are the key to the concept's stickiness. Many times I've begun work on a tiny new feature and immediately regretted not having it behind a flag. I either held up another developer from deployment or introduced unintentional effects that we wished we could have disabled without a code push.
This can be seen as daunting at first, or even counterproductive as some feature might take less time to code than writing the code for the feature flag, especially when a team is brand new to feature flagging and does not have a lot of the infrastructure built out. Keep in mind, the measure of the impact of work to a customer is not judged by the amount of code necessary, but by the perceived change in the application's behavior by the user.
As teams begin using feature flags more and more, a natural thought-process emerges when thinking through proposed work. Some questions that should come up include: How fast can we get this out to customers? What does enable this look like, and, equally, what would disabling it look like if there was an issue? How are we going to test this? Feature flags can help with all of these questions.
Hopefully, pitches will include the work necessary to experiment and slowly enable features for customers. A personal conviction is that a continuously deployable master being integrated into is always better than long-lived feature branches merging into stale masters. To accomplish this, flagging must be in any work breakdown at the very beginning, and we must be confident that we can deploy any codebase, regardless of who is working on what feature.
Thank you for sticking through to the end of this article. I'd love to hear how you've used feature flags on your team, as well as any best practices to which you adhere. I'd love for this post to evolve as more techniques are discussed and become a great, bookmark-able resource for all developers.