DEV Community

Daniel Romitelli
Daniel Romitelli

Posted on • Originally published at craftedbydaniel.com

I Hardcoded the Kill Switch: Feature Flags as AI Guardrails (Series Part 5)

I knew something was wrong the first time the chat did exactly what I built it to do.

A recruiter asked a simple, forward-moving question—and the system responded with a clarifying question that technically improved precision… while practically blocking the workflow. The user wasn’t confused. The model was being “thorough.” And the whole interaction felt like a car that stops at every green light to re-check the route.

This is Part 5 of my series “How to Architect an Enterprise AI System (And Why the Engineer Still Matters)”. In Part 4, “Corrections as Ground Truth,” I showed how I treat user corrections as canonical signal and feed them back into the system. This time I’m going to the opposite side of the loop: the guardrails that prevent an AI feature from hurting users while you’re still learning.

The core decision is simple:

Feature flags aren’t just for product experiments—they’re safety rails for AI behavior.

And in one case, I didn’t just put up a rail. I made it impossible to re-enable the risky behavior without changing code.

The moment I stopped trusting “we can always flip it off”

The naive rollout pattern is familiar:

  • ship the new behavior behind an environment variable
  • tell yourself you can disable it quickly
  • assume nobody will accidentally (or “helpfully”) re-enable it later

That last bullet is the lie.

In a system with multiple services, multiple deploy pipelines, and multiple environments, “off by default” is not a guarantee. It’s a suggestion—sometimes a temporary one, sometimes a forgotten one. And AI failures don’t behave like typical bugs. When they show up, they’re often behavioral: they change how users feel about the product.

Once I saw autonomous clarification behavior block momentum in a production-like workflow, I stopped treating “off by default” as sufficient. I wanted a guard that made the unsafe path structurally unreachable until I deliberately chose to revive it.

Two kinds of flags: experiments vs. safety

In my platform, feature flags are a control surface with two distinct jobs:

  1. Experimentation flags: things I want to ramp carefully (percentage rollouts, A/B paths, dual data sources).
  2. Safety kill switches: things I never want to reappear without a deliberate code change and a deploy.

The first category belongs in configuration.

The second category belongs in code.

That’s the non-obvious part. Sometimes the correct “feature flag” isn’t a flag you can toggle at runtime. It’s a hardcoded constraint that forces intent.

The guardrail toolbox in app/config/feature_flags.py

I keep flags centralized in app/config/feature_flags.py and treat that module as a contract:

  • flags are documented with their purpose and default behavior
  • values are parsed once with explicit defaults
  • rollouts are stable (a user doesn’t bounce in/out randomly)

Three controls matter for this story:

  • USE_EXTERNAL_DEALS_API — dual-source switch so the legacy database path and the external deals API path can both exist
  • MULTI_AGENT_ROLLOUT_PERCENTAGE — gradual rollout control (0–100)
  • ENABLE_AGENTIC_CHAT — the “welded shut” kill switch (code-level gating + optional env)

The actual flag pattern: env-driven, typed, and defaulted

Most of my flags are environment-driven. That’s intentional: ops needs to change experiment exposure without a code deploy.

Here’s the exact pattern I use (booleans via os.getenv(...).lower() == "true", ints with bounds, and a stable rollout function).

# app/config/feature_flags.py
from __future__ import annotations

import hashlib
import os


def env_bool(name: str, default: bool = False) -> bool:
    """Parse a boolean feature flag from the environment.

    Convention: only the literal string 'true' (case-insensitive) enables the flag.
    """
    default_str = "true" if default else "false"
    return os.getenv(name, default_str).strip().lower() == "true"


def env_int(name: str, default: int = 0) -> int:
    raw = os.getenv(name)
    if raw is None:
        return default
    raw = raw.strip()
    if raw == "":
        return default
    try:
        return int(raw)
    except ValueError:
        return default


def clamp_int(value: int, lo: int, hi: int) -> int:
    return max(lo, min(hi, value))


# Dual-source support (default is deliberately conservative)
USE_EXTERNAL_DEALS_API: bool = env_bool("USE_EXTERNAL_DEALS_API", default=False)

# Gradual rollout (0..100)
MULTI_AGENT_ROLLOUT_PERCENTAGE: int = clamp_int(
    env_int("MULTI_AGENT_ROLLOUT_PERCENTAGE", default=0),
    0,
    100,
)


# ----
# Safety kill switch
#
# This is the important bit:
# - env can request enablement
# - but code can forbid enablement
#
# If AGENTIC_CHAT_CODE_ALLOWED is False, setting ENABLE_AGENTIC_CHAT=true does nothing.
# ----
AGENTIC_CHAT_CODE_ALLOWED: bool = False
ENABLE_AGENTIC_CHAT: bool = AGENTIC_CHAT_CODE_ALLOWED and env_bool(
    "ENABLE_AGENTIC_CHAT",
    default=False,
)


def in_percentage_rollout(stable_id: str, percentage: int) -> bool:
    """Deterministic, stable bucketing based on stable_id.

    percentage is clamped to [0, 100]. The same stable_id always lands in the same bucket.
    """
    percentage = clamp_int(percentage, 0, 100)
    if percentage <= 0:
        return False
    if percentage >= 100:
        return True

    digest = hashlib.sha256(stable_id.encode("utf-8")).hexdigest()
    bucket = int(digest[:8], 16) % 100
    return bucket < percentage
Enter fullscreen mode Exit fullscreen mode

A few deliberate choices are embedded in that file:

  • Defaults are conservative. Dual-source switches default to the known-good path.
  • Parsing is strict. Only "true" enables a boolean; you don’t get surprised by "1" or "yes" coming from a random deployment template.
  • Rollouts are stable. If a user is in the first 5%, they stay in the first 5%.

1) Dual-source architecture: USE_EXTERNAL_DEALS_API (default: false)

The simplest migration plan is “cut over and pray.” I don’t do that for critical business data.

When I add a new source of truth—especially an external deals system with its own rate limits, permission rules, and occasional oddities—I keep the legacy database path alive long enough to compare behaviors and verify parity.

That’s what USE_EXTERNAL_DEALS_API controls. It’s a single, obvious switch that determines whether a flow reads from the external deals API path or the database-backed path.

The key detail (and the one I corrected after getting burned in earlier projects): the default stays False. You earn your way to True with validation.

Why it matters operationally:

  • If the external API starts returning partial records, the fallback path still exists.
  • If a new permission model accidentally hides fields, the legacy path is your check.
  • If you need to backfill or reconcile, you can run both paths side-by-side.

This isn’t glamorous. It’s how you avoid “we migrated and now the pipeline is lying.”

2) Gradual rollout: MULTI_AGENT_ROLLOUT_PERCENTAGE

Some changes are worth ramping. Orchestrated multi-step behavior is one of them.

A binary on/off switch is too blunt for AI behavior because failures aren’t evenly distributed:

  • some users ask short questions with obvious answers (low risk)
  • some users ask ambiguous questions where autonomous tool choices can go sideways (high risk)
  • some users have workflows where latency is tolerated
  • some users are in time-sensitive contexts where any extra question is a blocker

So I use a percentage rollout. The important part isn’t that it’s 0–100; it’s that exposure is stable and deterministic.

The bucketing function in in_percentage_rollout() uses a hash of a stable identifier (in my case, tenant + user). That means:

  • a user doesn’t bounce in and out across refreshes
  • support can reproduce behavior (“this user is in the 10% bucket”)
  • you can ramp from 0 → 1 → 5 → 10 → 25 without re-randomizing your sample

3) The kill switch: “hardcoded” means env cannot override code

Now for the part that triggered the title.

When autonomous clarification behavior caused UX regressions, I didn’t want an ops toggle that someone could re-enable “just to test something” in the wrong environment.

So I made enabling it require a code edit.

Notice how the kill switch is built:

AGENTIC_CHAT_CODE_ALLOWED: bool = False
ENABLE_AGENTIC_CHAT: bool = AGENTIC_CHAT_CODE_ALLOWED and env_bool(
    "ENABLE_AGENTIC_CHAT",
    default=False,
)
Enter fullscreen mode Exit fullscreen mode

This is the weld.

  • You can set ENABLE_AGENTIC_CHAT=true in the environment.
  • It still won’t turn on.
  • The only way to turn it on is to change AGENTIC_CHAT_CODE_ALLOWED to True and deploy.

This gives me two distinct operational modes:

  • Experiment mode (when I’m ready): set AGENTIC_CHAT_CODE_ALLOWED=True, keep the runtime env flag default false, and ramp exposure deliberately.
  • Safety mode (when I’m not): keep AGENTIC_CHAT_CODE_ALLOWED=False so runtime config cannot surprise production.

That friction is intentional. If someone wants the behavior back, they need to:

1) change code,
2) get review,
3) deploy.

That’s not bureaucracy. That’s protecting user trust.

Enforcing the weld at startup: make “unsafe env” fail fast

I don’t rely on “people will notice the constant.” I also enforce it at runtime using environment validation so a misconfigured deployment doesn’t silently drift.

I already have an environment validator module, so I added a hard rule: if someone sets the env flag to true while the code-level allowlist is false, startup fails.

That turns a subtle behavioral regression into a loud configuration error.

# app/config/env_validator.py
from __future__ import annotations

import os

from app.config import feature_flags


class EnvValidationError(RuntimeError):
    pass


def validate_environment() -> None:
    # If the code does not allow agentic chat, reject attempts to force-enable via env.
    requested = os.getenv("ENABLE_AGENTIC_CHAT", "false").strip().lower() == "true"
    if requested and not feature_flags.AGENTIC_CHAT_CODE_ALLOWED:
        raise EnvValidationError(
            "ENABLE_AGENTIC_CHAT=true is not permitted: code-level allowlist is disabled"
        )
Enter fullscreen mode Exit fullscreen mode

That’s the difference between “we have a flag” and “we have a guardrail.”

How the routing works: orchestrator vs. legacy fallback

The real engineering move isn’t “turn the risky thing off.” The move is: turn it off while keeping the system functional and predictable.

In my chat service, I route between:

  • an orchestrated path (multi-step planning + tool calls)
  • a legacy linear path (known-good, low-surprise)

The orchestrated path is optional by design. If it’s disabled, or if it errors, the user still gets a response.

Here’s the core routing logic as it exists in my service layer (feature flag checks, percentage rollout, and clean fallback). This block is complete, syntactically valid, and you can run it as a standalone module.

from __future__ import annotations

from dataclasses import dataclass
from typing import Dict, Optional

import hashlib
import os


# ---- Feature flags (mirrors app/config/feature_flags.py) ----

def env_bool(name: str, default: bool = False) -> bool:
    default_str = "true" if default else "false"
    return os.getenv(name, default_str).strip().lower() == "true"


def env_int(name: str, default: int = 0) -> int:
    raw = os.getenv(name)
    if raw is None:
        return default
    raw = raw.strip()
    if raw == "":
        return default
    try:
        return int(raw)
    except ValueError:
        return default


def clamp_int(value: int, lo: int, hi: int) -> int:
    return max(lo, min(hi, value))


def in_percentage_rollout(stable_id: str, percentage: int) -> bool:
    percentage = clamp_int(percentage, 0, 100)
    if percentage <= 0:
        return False
    if percentage >= 100:
        return True
    digest = hashlib.sha256(stable_id.encode("utf-8")).hexdigest()
    bucket = int(digest[:8], 16) % 100
    return bucket < percentage


USE_EXTERNAL_DEALS_API: bool = env_bool("USE_EXTERNAL_DEALS_API", default=False)
MULTI_AGENT_ROLLOUT_PERCENTAGE: int = clamp_int(
    env_int("MULTI_AGENT_ROLLOUT_PERCENTAGE", default=0), 0, 100
)

AGENTIC_CHAT_CODE_ALLOWED: bool = False
ENABLE_AGENTIC_CHAT: bool = AGENTIC_CHAT_CODE_ALLOWED and env_bool(
    "ENABLE_AGENTIC_CHAT", default=False
)


# ---- Service ----

@dataclass(frozen=True)
class ConversationContext:
    tenant_id: str
    user_id: str
    thread_id: str


class Telemetry:
    def track_exception(self, name: str, exc: Exception, properties: Optional[Dict[str, str]] = None) -> None:
        # In the real system this reports to our telemetry sink.
        # Kept as a stub here so the module runs.
        _ = (name, exc, properties)


class LLM:
    def complete(self, prompt: str) -> str:
        return f"LLM response: {prompt}"


class VaultConversationService:
    def __init__(self, llm: LLM, telemetry: Telemetry) -> None:
        self.llm = llm
        self.telemetry = telemetry

    def respond(self, ctx: ConversationContext, message: str) -> str:
        stable_id = f"{ctx.tenant_id}:{ctx.user_id}"

        exposure_ok = in_percentage_rollout(stable_id, MULTI_AGENT_ROLLOUT_PERCENTAGE)
        orchestrator_allowed = ENABLE_AGENTIC_CHAT and exposure_ok

        data_source = "external_deals_api" if USE_EXTERNAL_DEALS_API else "database"

        if orchestrator_allowed:
            try:
                return self._run_orchestrator(ctx, message, data_source)
            except Exception as exc:
                self.telemetry.track_exception(
                    "vault_orchestrator_failed",
                    exc,
                    properties={
                        "tenant_id": ctx.tenant_id,
                        "thread_id": ctx.thread_id,
                        "data_source": data_source,
                    },
                )
                return self._run_legacy_flow(ctx, message, data_source)

        return self._run_legacy_flow(ctx, message, data_source)

    def _run_orchestrator(self, ctx: ConversationContext, message: str, data_source: str) -> str:
        # In production this calls the planner + tool router.
        prompt = f"[orchestrator|{data_source}] {ctx.thread_id}: {message}"
        return self.llm.complete(prompt)

    def _run_legacy_flow(self, ctx: ConversationContext, message: str, data_source: str) -> str:
        # In production this is the older, linear prompt + retrieval path.
        prompt = f"[legacy|{data_source}] {ctx.thread_id}: {message}"
        return self.llm.complete(prompt)


if __name__ == "__main__":
    svc = VaultConversationService(llm=LLM(), telemetry=Telemetry())
    ctx = ConversationContext(tenant_id="t1", user_id="u1", thread_id="th1")
    print(svc.respond(ctx, "Draft an outreach note for this candidate."))
Enter fullscreen mode Exit fullscreen mode

Three important properties fall out of this structure:

1) The new path is optional from day one. I can ship the orchestrator without forcing it on every user.

2) Failure is contained. If the orchestrator throws, I catch it, report it, and fall back to the legacy path—which I keep maintained and compatible with current prompt formats and retrieval contracts. Orchestration errors are tracked explicitly so I'm not blind while users silently get legacy behavior.

3) The kill switch forces a known-good path. When the code-level allowlist is disabled, the orchestrator is simply not reachable.

This is exactly how I avoid the “either we ship it everywhere or we don’t ship it” trap.

Architecture: feature flags as a control plane

My mental model is that feature flags form a control plane above AI behavior:

  • Dual-source toggles decide where data comes from.
  • Percentage rollouts decide how many users see a behavior.
  • Code-level kill switches decide what behaviors are allowed to exist at all.
flowchart TD
  userQuery[User query] --> flagGate[Feature flags]
  flagGate -->|USE_EXTERNAL_DEALS_API| dataPath[Data source path]
  flagGate -->|MULTI_AGENT_ROLLOUT_PERCENTAGE| rolloutGate[Rollout gate]
  flagGate -->|ENABLE_AGENTIC_CHAT disabled| safeMode[Legacy mode]

  rolloutGate --> orchestratorPath[Orchestrated multi-step flow]
  rolloutGate --> legacyPath[Legacy linear flow]

  dataPath --> orchestratorPath
  dataPath --> legacyPath

  orchestratorPath --> response[Response]
  legacyPath --> response
  safeMode --> legacyPath```



The important detail is that the kill switch does not “break” the system. It forces the system onto a known-good path.

That’s the difference between a panic button and a guardrail.

## What went wrong: the UX regression that earned a welded shut gate

Autonomous clarification is a tempting capability:

- it reduces ambiguity
- it can improve tool selection
- it makes downstream actions safer

But in my workflow surface, it did something subtle and damaging:

**It competed with the user’s momentum.**

Recruiting workflows often have a “keep moving” cadence:

- confirm a candidate detail
- draft an outreach note
- schedule a meeting
- log an interaction

When the assistant injects a clarifying question at the wrong time, it turns a single-step action into a multi-step negotiation. Even if the question is reasonable, it changes the feel of the product from “helper” to “gatekeeper.”

This was not an accuracy failure. It was a *behavioral* failure:

- users felt blocked
- the interaction took longer
- trust eroded because the assistant wasn’t matching intent

And once users feel blocked, they stop exploring. They stop asking. They stop trusting.

That’s why I treated this differently than, say, a minor ranking regression. A small ranking regression is recoverable; you can improve it in the background without users noticing. A momentum-blocking behavior retrains users immediately.

So I welded it shut.

## The decision rule I use now

I keep the rule simple because I want engineers to apply it quickly during review:

- **If the failure mode is recoverable** (slower response, occasional missing context, imperfect ranking), use a runtime flag and a percentage rollout.
- **If the failure mode damages trust or blocks workflows**, require a code change to re-enable.

Autonomous clarification landed in the second category.

A runtime toggle is great for experiments. It’s terrible for “things that should never surprise a user again.”

## The practical details that make this work in a real team

A kill switch that’s only a constant is better than nothing, but I treat it as part of a larger operating discipline.

### 1) Flags are owned

Every flag has an owner and a purpose. Not a committee—one owner. That doesn’t mean one person touches it forever, it means there’s a clear point of responsibility.

In practice, this prevents the classic problem where:

- a flag exists
- nobody remembers why
- it becomes a permanent source of uncertainty

### 2) Defaults are part of safety, not convenience

The dual-source flag default is conservative (`False`) because defaults have gravity.

If you set a default to `True`, you will eventually end up in an environment where you didn’t mean it. If you set the default to `False`, you have to intentionally opt in.

That’s why the default for the external deals API path stays off until I’ve validated parity and behavior.

### 3) Rollouts are stable and diagnosable

A percentage rollout that changes on every request is worse than no rollout. It becomes impossible to debug because:

- support can’t reproduce
- logs don’t correlate
- user experience fluctuates

Stable bucketing solves that. When someone reports “it asked me weird questions,” I can determine whether they were in the exposure set.

### 4) Fallback is not an afterthought

I design the orchestrated path so it can fail without affecting the ability to answer.

In practice, that means:

- the legacy flow is still maintained
- the legacy flow remains compatible with current prompt formats and retrieval contracts
- orchestration errors are tracked explicitly (so I’m not blind while users silently get legacy behavior)

This turns “new behavior” into a layer rather than a migration cliff.

### 5) Guardrails connect back to the correction loop

Part 4 (“Corrections as Ground Truth”) is the other half of this.

When I ship an experiment behind a rollout gate, I’m watching:

- explicit corrections (“no, I meant X”)
- implicit corrections (users re-asking in different words)
- abandonment (threads that stop immediately after a clarifying question)

A welded kill switch is what I use when that feedback is loud and consistent: the cost of the behavior is higher than the benefit *right now*.

## Closing

Disabling autonomous clarification wasn’t me giving up on orchestration. It was me admitting a truth I’ve seen repeatedly: the most impressive behavior in a demo can be the most irritating behavior in a real workflow.

The “hardcoded kill switch” in my system isn’t a dramatic gesture—it’s a design choice: **some behaviors are too risky to leave as a reversible runtime toggle.** If we want them back, we earn them back with a code change, review, and a deliberate rollout.

In Part 6 of **“How to Architect an Enterprise AI System (And Why the Engineer Still Matters)”**, I’m going to show how I instrument these rollouts so I can tell the difference between “the model was wrong” and “the product was blocked”—and how that telemetry feeds directly into what I decide to weld shut next.

---

🎧 [Listen to the Enterprise AI Architecture audiobook](https://shop.craftedbydaniel.com)
📖 [Read the full 13-part series with an AI assistant](https://craftedbydaniel.com/premium-access)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)