Feature flags or toggles are often* used in application code to allow for rapid integration of new changes without necessarily enabling them in production.
*I have no real idea how commonly used they really are
Feature flags can prevent the dreaded long running feature branch that keeps accumulating changes that you hold off from merging because you're not ready for it to go into production in the next release.
These same challenges can apply to your Infrastructure as Code (IaC) where having changes to shared resources you want to test in a lower environment but aren't ready to promote to production can be a real problem.
Consider this scenario:
- You're making changes to your network in a cloud environment.
- You've got your changes in a development environment and everything seems to be working, but you can't go to production until the next maintenance window because it's going to cause an outage.
- Then you suddenly have to make a different change to that same IaC template that has to go into production immediately, but you want to test it before you apply it to production.
Well, now you've got a problem because development has your other changes in it, so you're going to have to branch your template for production, but perhaps also update what's in development with the new change, or rollback your other changes to test.
🤷 Either way it's not ideal.
IaC tools have conditional logic of varying degrees of sophistication. This might be Conditions in CloudFormation, Conditional Expressions in Terraform or the full power of a programming language using the AWS Cloud Development Kit (CDK).
Once you've got conditional logic, you can turn parts of your infrastructure on/of or configured on way or another based upon some kind of input. The easiest option here is to have a parameter to your template act as your feature flag.
Your goal with the feature flags is to be able to merge changes and then use the flags to enable the features progressively in environments while still allowing additional changes to be made and tested.
With feature flags in application software, the new code that is not yet enabled is actually deployed to production, it's just not active. With IaC it's a little different, because your new change may be creating additional resources (or deleting old ones).
There are a few factors to consider when you decide whether your flag should control the existence of resources or just whether they're configured to be used:
- Are you adding new resources of significant cost that you aren't sure when they're going to be used?
- Can the change exist in parallel with existing resources?
- Does pre-creating new resources make the eventual switchover faster and less risky?
In other words it's going to depend upon the situation.
If the resources are cheap (perhaps even free if not being used) and can co-exist then you probably don't need to bother with a flag at all and just go ahead and create the new resources.
You can often test your changes by going ahead and deploying them in a non-production environment. Depending upon the change you may be able to deploy it in parallel to existing resources and not break anything if you get it wrong, but certain changes can have impacts (going back to our network changes), and some changes are only applied in production.
Depending upon the IaC tool you are using there are different levels of testing you can perform without deploying anything:
With CDK you can write regular unit tests that can easily be automated to ensure you create the correct infrastructure, and you can vary your inputs (your feature flags) to ensure they do the right thing when turned on or off. This can give you the confidence to push your new template to production and know that it won't change anything unexpectedly until you toggle your flag 🚩.
As I'm writing this, I'm doing a good job of convincing myself that I should look more seriously into using CDK for new projects.
Your level of confidence and familiarity with techniques like unit testing are going to depend upon your background, but robust testing for your Infrastructure code is just as important as your Application code.
There is some conditional logic in your IaC that will always be there to handle differences between environments.
Feature flags are different - they should be transient and cleaned-up once the flag has been turned on in every environment.
Once again unit testing is really useful here as you can ensure that removing the flag won't make unexpected changes.
Using feature flags in your Infrastructure as Code can allow you to merge changes and promote them to production without updating your infrastructure until you are ready. This can reduce the number of conflicting changes in your templates and avoid problems with when particular changes go live.
I wrote this article because I've been struggling with exactly the problem I described with changes in development that aren't ready to go to production.
It seems to me now like an obvious solution, but I wasn't using it before, and I couldn't find a lot of articles suggesting this approach.
Let me know in the comments if you're already been doing something like this.