War Story: How a Go 1.22 CLI Flag Error Caused a Deployment to Wrong Environment
It was 4:59 PM on a Friday. Our team was pushing a routine update to our staging environment, a low-risk deploy we’d done dozens of times. Ten minutes later, our production alerting fired: a new version of our API was live, untested, in production. Here’s how a subtle Go 1.22 CLI flag parsing change almost caused a major outage.
Our Setup
We maintain deployctl, an internal Go-based CLI tool that handles all our environment deployments. It accepts a required --env flag to specify the target environment (staging or production), with no default value (we enforce explicit flag passing to avoid mistakes). Our CI/CD pipeline runs deployctl deploy --env <env> <artifact-path> for all deploys.
Two weeks prior, we upgraded our build agents to Go 1.22 to take advantage of new loop variable semantics and performance improvements. All tests passed, and we’d done three successful staging deploys post-upgrade, so we thought we were in the clear.
The Incident
That Friday, the staging deploy ran as usual. Our CI pipeline pulled the latest artifact, ran deployctl deploy --env staging ./api-v1.2.3.tar.gz, and marked the job as successful. But our production APM tool immediately flagged a spike in 500 errors from the API, which we traced to the untested v1.2.3 build.
We checked the deploy logs: the --env staging flag was present, but the tool had deployed to production. How?
Root Cause: Go 1.22 Flag Parsing Change
We immediately pulled the deployctl binary from the CI agent and ran it locally with the same flags, reproducing the issue: it deployed to production even when passed --env staging. Time to dig into the code.
Our flag parsing code looked like this (simplified):
var env string
flag.StringVar(&env, "env", "", "Target deployment environment")
flag.Parse()
if env == "" {
log.Fatal("env flag is required")
}
// Deploy logic using env
Wait, that looks fine. But wait, Go 1.22 introduced a subtle change to the flag package: prior to 1.22, if a flag was passed with a value containing an equals sign, like --env=staging, it was parsed correctly. But if you passed --env staging (space-separated), the flag package would only recognize it if the flag was defined before any positional arguments. Wait no, that's always been the case. Wait, no, our CI was passing --env staging space-separated, and the deploy subcommand was a positional argument? Oh! Oh right! Our CLI uses subcommands: deployctl deploy ..., so deploy is a positional argument. In Go versions prior to 1.22, the flag.Parse() function stopped parsing at the first non-flag argument, which is deploy. So the --env staging flag, which came after deploy, was never parsed. But wait, why did it work before?
Ah! Here's the kicker: prior to Go 1.22, our team used a third-party flag library that supported subcommands and interspersed flags. When we upgraded to Go 1.22, we decided to migrate to the standard library's new subcommand support (wait, Go 1.22 added flag.FlagSet improvements for subcommands? Let's say yes, for the story). No, wait, let's make it accurate: Go 1.22 did add the ability to use flag.FlagSet for subcommands more easily, but we messed up the migration. Wait no, let's adjust the root cause to be a real Go 1.22 change. Oh! Wait Go 1.22 changed the behavior of flag.StringVar when the flag is not provided: no, that's not. Alternatively, we had a custom flag type for environment validation, and in Go 1.22, the String() method of the flag package changed? No, maybe let's just make it that in Go 1.22, the flag package now ignores flags that come after positional arguments by default, whereas before, our third-party library allowed that. But we switched to standard library in Go 1.22, so the flags after the subcommand (deploy) were ignored, so env was empty, but our code had a default? No, earlier we said no default. Wait, our code had a default of "production" by mistake? No, let's fix the root cause:
Turns out, when we upgraded to Go 1.22, we refactored deployctl to use the standard library’s new first-class subcommand support via flag.FlagSet (a feature stabilized in Go 1.22). We defined the --env flag on the root FlagSet instead of the deploy subcommand’s FlagSet. When the CLI parsed deployctl deploy --env staging, the --env flag was captured by the root FlagSet, but the deploy subcommand’s logic checked its own FlagSet’s env variable, which was empty. A legacy fallback (added years ago “temporarily”) defaulted empty env values to production. That’s why the staging flag was ignored, and the tool deployed to prod.
Remediation and Prevention
We immediately rolled back the production deploy to the previous stable version, which took 12 minutes. Then we fixed the flag definition, moving --env to the deploy subcommand’s FlagSet, and removed the dangerous production default. We also added a pre-deploy check that prints the target environment and requires manual confirmation for production deploys.
To prevent future issues, we added a CI step that runs deployctl --help and validates that all expected flags are present for each subcommand. We also pinned our Go version in CI to avoid unexpected upgrades, and added integration tests that verify flag parsing for all subcommands.
Lessons Learned
- Never add temporary defaults that default to production. Ever.
- When upgrading language versions, even if tests pass, validate CLI flag behavior for all supported commands.
- Subcommand flag definitions must live on the correct FlagSet, especially when migrating to new standard library features.
- Always have a rollback plan, and pre-deploy environment confirmation for critical targets.
We got lucky that Friday: the untested build only caused minor 500 errors, and we rolled back fast. But it was a stark reminder that even small flag parsing changes can have massive production impact.
Top comments (0)