Nithin Bharadwaj

Posted on Mar 3

Build a Dynamic Feature Flag System in Go: Real-Time Control, A/B Testing, and Zero Redeployment

#programming #devto #go #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

I want to talk about a way to change how your software works without the pain of redeploying it. Imagine you could turn features on and off like lights in a room, test different versions of a button on your website with real users, and do it all instantly, while your application is running. That's what a dynamic feature flag system lets you do.

Let me build one with you in Go, step by step. We'll make it fast, capable of updating in real-time, and smart enough to run experiments. This isn't just a simple on/off switch. It's a full control panel for your application's behavior.

Think of a feature flag as a rulebook. Instead of your code deciding what to do, it asks the flag system: "Should I show the new checkout page to user X?" The system checks the rules—like "Is the user in the US?" or "Are we rolling this out to 10% of users?"—and gives a yes or no answer. The code just follows instructions.

The heart of our system is a manager that coordinates everything. I'll call it FeatureFlagManager. It's like a conductor for an orchestra, making sure all the parts work together.

type FeatureFlagManager struct {
    store       *FlagStore       // Where flag rules live
    evaluator   *FlagEvaluator   // Decides yes/no for a user
    notifier    *UpdateNotifier  // Sends live updates
    experiments *ExperimentTracker // Runs A/B tests
    stats       *FlagStats       // Tracks performance
}

First, we need a place to keep all our flag configurations. I'll create a FlagStore. It's a simple in-memory map protected by a mutex for safe concurrent access. When we update a flag, we'll lock the store, change the map, and release the lock.

type FlagStore struct {
    flags    map[string]*FeatureFlag
    versions map[string]int64
    mu       sync.RWMutex
}

What does a flag look like? It's a struct that defines the rules of the game.

type FeatureFlag struct {
    Key         string
    Enabled     bool
    Rules       []TargetingRule  // Specific user targeting
    Rollout     *PercentageRollout // Gradual percentage rollout
    Variants    []Variant        // For A/B testing
    UpdatedAt   time.Time
}

Now, the most asked question: How do we decide if a flag is on for a specific person? That's the job of the FlagEvaluator. It takes a user's context—like their user ID, country, or subscription plan—and runs through the flag's rules.

The process is straightforward. First, check if the flag exists. Then, see if there's a manual override set by an engineer for debugging. After that, go through each targeting rule in order. If the user matches a rule, return that rule's decision. If no rules match, check if they fall into a percentage rollout. Finally, if it's an A/B test, assign them a variant.

func (ffm *FeatureFlagManager) Evaluate(ctx context.Context, flagKey string, context map[string]interface{}) (bool, map[string]interface{}, error) {
    // 1. Get the flag from the store
    ffm.store.mu.RLock()
    flag, exists := ffm.store.flags[flagKey]
    ffm.store.mu.RUnlock()

    if !exists {
        return false, nil, nil // Flag doesn't exist
    }

    // 2. Check for manual override (useful for debugging)
    if override, hasOverride := ffm.evaluator.overrides[flagKey]; hasOverride {
        return override, nil, nil
    }

    // 3. Evaluate targeting rules
    for _, rule := range flag.Rules {
        if ffm.evaluateRule(rule, context) {
            return rule.Enabled, map[string]interface{}{"matched_rule": rule.Name}, nil
        }
    }

    // 4. Check percentage rollout
    if flag.Rollout != nil {
        userID, hasUser := context["user_id"].(string)
        if hasUser {
            bucket := ffm.computeBucket(flagKey, userID, flag.Rollout.Salt)
            if bucket <= flag.Rollout.Percentage {
                return true, map[string]interface{}{"rollout_bucket": bucket}, nil
            }
        }
    }

    // 5. Default to the flag's global setting
    return flag.Enabled, nil, nil
}

Let's talk about that computeBucket function. It's how we do percentage rollouts consistently. We need to ensure the same user always gets the same result for the same flag. I use a hash function.

func (ffm *FeatureFlagManager) computeBucket(flagKey, userID, salt string) float64 {
    // Combine flag key, user ID, and salt into one string
    hashInput := fmt.Sprintf("%s:%s:%s", flagKey, userID, salt)
    // Create a SHA256 hash
    hash := sha256.Sum256([]byte(hashInput))

    // Use the first 4 bytes to create a number between 0 and 1
    bucketNumber := float64(uint32(hash[0])<<24 | uint32(hash[1])<<16 | uint32(hash[2])<<8 | uint32(hash[3]))
    // Convert to a percentage between 0 and 100
    bucket := (bucketNumber / math.MaxUint32) * 100

    return bucket
}

If the user's bucket number is less than or equal to our rollout percentage (say, 10%), they see the feature. This method is deterministic. User "Alice" will always hash to the same bucket for the "new_dashboard" flag.

Now, what if we want to run an A/B test? We need to show 50% of users a blue button (control) and 50% a red button (treatment). For that, we use variants.

type Variant struct {
    Key     string
    Weight  float64 // Percentage of traffic
    Enabled bool
    Payload map[string]interface{} // Extra data, like button color
}

The selection logic is similar to the rollout bucket but uses weights.

func (ffm *FeatureFlagManager) selectVariant(flagKey, userID string, variants []Variant) *Variant {
    hashInput := fmt.Sprintf("%s:%s:variants", flagKey, userID)
    hash := sha256.Sum256([]byte(hashInput))
    selection := float64(uint32(hash[0])<<24|uint32(hash[1])<<16|uint32(hash[2])<<8|uint32(hash[3])) / math.MaxUint32

    cumulativeWeight := 0.0
    for _, variant := range variants {
        cumulativeWeight += variant.Weight
        // If our selection falls into this variant's weight range, pick it
        if selection*100 < cumulativeWeight {
            return &variant
        }
    }
    return &variants[0] // Fallback
}

This is the core evaluation logic. But calling this function directly for every user request would be slow. We need to be smart about performance. Let's add caching.

I'll create an EvaluationCache that stores recent decisions. When a user asks about a flag, we first check the cache. The key is a combination of the flag key and the user's context.

type EvaluationCache struct {
    entries map[string]*EvaluationResult
    lru     *LRUList
    maxSize int
    mu      sync.RWMutex
}

func (ec *EvaluationCache) Get(key string) (*EvaluationResult, bool) {
    ec.mu.RLock()
    result, exists := ec.entries[key]
    ec.mu.RUnlock()

    if exists {
        ec.lru.Touch(key) // Mark as recently used
        return result, true
    }
    return nil, false
}

But there's a problem. What if 1000 requests for the same user and flag arrive at the exact same time before the cache is populated? They would all miss the cache and all start evaluating the flag. That's wasteful.

We can use a pattern called singleflight. It ensures that only one evaluation happens for duplicate simultaneous requests. The others wait for the result.

// Inside the evaluator
type FlagEvaluator struct {
    cache       *EvaluationCache
    flightGroup singleflight.Group // Prevents duplicate work
    overrides   map[string]bool
}

// In the Evaluate method
result, err, _ := ffm.evaluator.flightGroup.Do(cacheKey, func() (interface{}, error) {
    return ffm.evaluateFlag(flagKey, context)
})

Now our system is fast, but what about updates? When an engineer changes a flag from 10% to 20% rollout, we need all servers to know immediately. This is where real-time updates come in.

I'll create an UpdateNotifier that uses WebSockets to push changes to connected clients. Each running instance of our application connects to the notifier.

type UpdateNotifier struct {
    clients   map[string]*ClientConnection
    broadcast chan ConfigUpdate
    mu        sync.RWMutex
}

func (ffm *FeatureFlagManager) UpdateFlag(flag *FeatureFlag) error {
    // ... update the store ...

    // Notify all connected clients
    ffm.notifier.broadcast <- ConfigUpdate{
        Type:    UpdateTypeFlagChanged,
        FlagKey: flag.Key,
        Version: ffm.store.versions[flag.Key],
    }

    // Clear the cache for this flag
    ffm.evaluator.cache.InvalidatePrefix(flag.Key)

    return nil
}

The notifier has a broadcast channel. When a flag updates, we send a message to this channel. A separate goroutine reads from this channel and sends the update to every connected client.

func (n *UpdateNotifier) StartBroadcasting() {
    for update := range n.broadcast {
        n.mu.RLock()
        msg, _ := json.Marshal(update)
        for _, client := range n.clients {
            select {
            case client.Send <- msg:
                // Message sent
            default:
                // Client's buffer is full, maybe disconnect them
            }
        }
        n.mu.RUnlock()
    }
}

Each client connection runs two goroutines: one to read messages from the WebSocket, and one to write messages. The write pump also sends periodic ping messages to keep the connection alive.

Now, let's talk about A/B testing. It's not enough to just show different variants. We need to know which one performs better. We need an ExperimentTracker.

When a user sees a variant, we record an "exposure." When they complete a goal (like making a purchase), we record a "conversion." Later, we can calculate the conversion rate for each variant.

type ExperimentTracker struct {
    exposures    map[string][]*ExposureEvent
    conversions  map[string][]*ConversionEvent
    mu           sync.RWMutex
}

func (et *ExperimentTracker) TrackExposure(experimentID string, context map[string]interface{}, variant string) {
    et.mu.Lock()
    defer et.mu.Unlock()

    userID, _ := context["user_id"].(string)
    exposure := &ExposureEvent{
        ExperimentID: experimentID,
        Variant:      variant,
        UserID:       userID,
        Timestamp:    time.Now(),
    }
    et.exposures[experimentID] = append(et.exposures[experimentID], exposure)
}

We can call TrackExposure right after we evaluate a flag and determine the user is in an experiment. Then, when the user completes a purchase, we call TrackConversion.

At any time, we can ask for the results.

func (et *ExperimentTracker) GetExperimentResults(experimentID string) *ExperimentResults {
    et.mu.RLock()
    defer et.mu.RUnlock()

    exposures := et.exposures[experimentID]
    conversions := et.conversions[experimentID]

    results := &ExperimentResults{
        ExperimentID: experimentID,
        Variants:     make(map[string]*VariantStats),
    }

    // Count exposures and conversions per variant
    for _, exp := range exposures {
        if _, exists := results.Variants[exp.Variant]; !exists {
            results.Variants[exp.Variant] = &VariantStats{}
        }
        results.Variants[exp.Variant].Exposures++
    }

    for _, conv := range conversions {
        if stats, exists := results.Variants[conv.Variant]; exists {
            stats.Conversions++
            stats.TotalValue += conv.Value
        }
    }

    // Calculate conversion rates
    for _, stats := range results.Variants {
        if stats.Exposures > 0 {
            stats.ConversionRate = float64(stats.Conversions) / float64(stats.Exposures)
        }
    }

    return results
}

Let me put all these pieces together in a complete example. Imagine we're testing a new checkout flow.

func main() {
    // Create the manager
    ffm := NewFeatureFlagManager()

    // Define our A/B test flag
    checkoutFlag := &FeatureFlag{
        Key:     "new_checkout_experiment",
        Enabled: true,
        Variants: []Variant{
            {Key: "control", Weight: 50, Enabled: true},
            {Key: "treatment", Weight: 50, Enabled: true},
        },
        Description: "Test new single-page checkout",
    }

    ffm.UpdateFlag(checkoutFlag)

    // Simulate user requests
    for i := 0; i < 1000; i++ {
        userContext := map[string]interface{}{
            "user_id": fmt.Sprintf("user_%d", i),
            "country": "US",
        }

        // Evaluate the flag
        enabled, metadata, _ := ffm.Evaluate(context.Background(), "new_checkout_experiment", userContext)

        if enabled {
            variant := metadata["variant"].(string)

            // Record that this user saw the experiment
            ffm.experiments.TrackExposure("new_checkout_experiment", userContext, variant)

            // Simulate a purchase (conversion) with some probability
            if (variant == "treatment" && i%4 == 0) || (variant == "control" && i%6 == 0) {
                ffm.experiments.TrackConversion("new_checkout_experiment", "purchase", 99.99, userContext)
            }
        }
    }

    // Get results
    results := ffm.experiments.GetExperimentResults("new_checkout_experiment")
    fmt.Printf("Control: %d exposures, %d purchases (%.1f%%)\n",
        results.Variants["control"].Exposures,
        results.Variants["control"].Conversions,
        results.Variants["control"].ConversionRate*100)

    fmt.Printf("Treatment: %d exposures, %d purchases (%.1f%%)\n",
        results.Variants["treatment"].Exposures,
        results.Variants["treatment"].Conversions,
        results.Variants["treatment"].ConversionRate*100)
}

This gives us the basic numbers. In a real system, you'd want statistical significance tests to know if the difference is real or just random chance.

There are several important details to consider for a production system. First, persistence. Our flags are in memory. If the service restarts, we lose them. We need to save them to a database or file.

I can define a FlagPersistence interface.

type FlagPersistence interface {
    Save(flag *FeatureFlag) error
    LoadAll() (map[string]*FeatureFlag, error)
}

Then, in our FlagStore, we can have a persistence field. When we update a flag, we save it. When we start the service, we load all flags.

Second, we need to think about security and validation. Not everyone should be able to change flags. We should validate that flag configurations make sense before accepting them.

func (ffm *FeatureFlagManager) validateFlag(flag *FeatureFlag) error {
    if flag.Key == "" {
        return fmt.Errorf("flag key is required")
    }

    // Check variant weights sum to 100%
    if len(flag.Variants) > 0 {
        totalWeight := 0.0
        for _, v := range flag.Variants {
            totalWeight += v.Weight
        }
        if math.Abs(totalWeight-100.0) > 0.01 {
            return fmt.Errorf("variant weights must sum to 100, got %.2f", totalWeight)
        }
    }

    return nil
}

Third, monitoring. We should track how many evaluations we're doing, our cache hit rate, and how long evaluations take.

type FlagStats struct {
    Evaluations   uint64
    CacheHits     uint64
    CacheMisses   uint64
    TotalDuration uint64 // nanoseconds
}

func (fs *FlagStats) CacheHitRate() float64 {
    total := fs.CacheHits + fs.CacheMisses
    if total == 0 {
        return 0
    }
    return float64(fs.CacheHits) / float64(total)
}

func (fs *FlagStats) AverageDuration() time.Duration {
    if fs.Evaluations == 0 {
        return 0
    }
    return time.Duration(fs.TotalDuration / fs.Evaluations)
}

We can update these stats atomically during evaluation.

Fourth, we need to consider what happens when the update notification system fails. A client might miss an update. To handle this, each client can track the version of each flag it has. When it reconnects, or periodically, it can ask for any flags that have changed since a certain version.

The system I've described is a starting point. In practice, you might add more features. For example, you could add rule types that target based on the day of the week, or the user's device type. You could add more complex rollout strategies, like canary releases that start with 1% and increase only if error rates stay low.

You could also add a user interface for non-engineers to manage flags. This UI would talk to an API that uses the same UpdateFlag method we created.

The main goal is to give your team control. Control to release features safely. Control to test ideas with real users. Control to quickly disable something that's causing problems. All without waiting for a full deployment cycle.

Building this in Go gives us performance and simplicity. The concurrency primitives—goroutines, channels, mutexes—make it straightforward to build a system that handles thousands of requests per second while pushing updates in real-time.

Start simple. Implement the basic evaluation first. Add caching when you need performance. Add real-time updates when you have multiple servers. Add A/B testing when you need to make data-driven decisions. Each piece builds on the last, giving you more control over how your software behaves in production.

Remember, the code doesn't decide. The flags decide. Your code just asks questions and follows instructions. This separation is what gives you the power to change your application's behavior without changing its code.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!