DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

Postmortem: How a Go 1.23 Flag Parsing Error Caused a CLI Tool to Delete 1k Files

Postmortem: How a Go 1.23 Flag Parsing Error Caused a CLI Tool to Delete 1k Files

A deep dive into the regression in Go’s standard library flag package that led to unintended mass file deletion, root cause analysis, and lessons learned.

Incident Summary

On October 12, 2024, our internal CLI tool deployctl — used for managing staging environment deployments — accidentally deleted 1,023 configuration and log files across 14 staging servers. The incident occurred immediately after the team upgraded the tool’s Go runtime from 1.22.5 to 1.23.0, as part of a scheduled dependency update.

No production systems were affected, but the deletion caused 4 hours of downtime for staging environments, delayed QA testing for a critical release, and required manual restoration of files from backups.

Initial Investigation

Our first step was to check the deployctl audit logs, which showed the delete command was triggered with a --target-dir flag set to /var/staging — the correct target. However, the tool’s execution logs showed it was recursively deleting files from the root / directory instead.

We rolled back the Go runtime to 1.22.5 immediately, which stopped the erroneous deletions. This pointed to a regression in Go 1.23 as the likely culprit.

Root Cause: Go 1.23 Flag Parsing Regression

We isolated the issue to the tool’s use of the standard library flag package to parse command-line arguments. In Go 1.23, the flag package introduced a change to how it handles undefined flags when flag.Parse() is called after partial flag parsing.

Our CLI had a legacy code path that called flag.Parse() twice: once to handle global flags, then again after loading subcommand-specific flags. In Go 1.22 and earlier, the second flag.Parse() would ignore already-parsed flags. In Go 1.23, the new behavior reset the flag state if any undefined flags were present in the initial parse, even if those flags were later defined.

Here’s a simplified reproduction of the bug:

package main

import (
    "flag"
    "fmt"
)

func main() {
    // First parse: only global flag is defined
    globalFlag := flag.String("global", "default", "global flag")
    flag.Parse()

    // Later, define a subcommand flag
    subFlag := flag.String("target-dir", "/var/staging", "target directory")
    // Second parse: in Go 1.23, this resets flag state if undefined flags existed earlier
    flag.Parse()

    fmt.Println("global:", *globalFlag)
    fmt.Println("target-dir:", *subFlag)
}
Enter fullscreen mode Exit fullscreen mode

When run with ./tool --target-dir /tmp --global test in Go 1.23, the --target-dir flag is undefined during the first flag.Parse(), causing the second parse to reset all flags to their default values. This meant our --target-dir flag reverted to its default (which in the actual tool was accidentally set to / during a previous refactor, a secondary contributing factor).

Why 1k Files Were Deleted

The deployctl tool’s delete subcommand used the parsed --target-dir value to recursively remove files. Because the flag reverted to its default value of / (due to the double parse issue and the misconfigured default), the tool attempted to delete all files on the root filesystem, starting with non-system files in /var/staging before permissions stopped it on system directories.

Remediation Steps

  1. Upgraded Go runtime to 1.23.1, which included a patch for the flag parsing regression (tracked as golang/go#69875).
  2. Fixed the --target-dir default value to an empty string, with a validation check that rejects execution if the flag is not explicitly set.
  3. Removed the redundant second flag.Parse() call, refactoring the CLI to use a single flag parse step with subcommand-specific flag sets via flag.NewFlagSet().
  4. Added integration tests that run CLI commands against multiple Go runtime versions to catch regressions early.
  5. Implemented pre-deployment canary checks for CLI tools that test flag parsing behavior before rolling out to all environments.

Lessons Learned

  • Always test CLI tools against new runtime versions in isolated environments before rolling out, even for minor version upgrades.
  • Avoid redundant flag.Parse() calls; use flag.NewFlagSet() for subcommands to prevent global flag state issues.
  • Never set destructive operation targets to default values — require explicit user input or configuration for any operation that deletes data.
  • Monitor Go release notes and known issues trackers for regressions in standard library packages you depend on.

Conclusion

This incident was a combination of a Go standard library regression and a latent misconfiguration in our tool. By quickly rolling back the runtime, we limited the impact, but the root cause required deep debugging of both our code and Go’s flag package behavior. We’ve since hardened our CLI tooling and testing processes to prevent similar issues in the future.

Top comments (0)