Why your AI keeps ignoring your safety constraints (and how we fixed it by engineering "Intent")

If you’ve spent any time prompting LLMs, you’ve probably run into this frustrating scenario: You tell the AI to prioritize "safety, clarity, and conciseness."

So, what happens when it has to choose between making a sentence clearer or making it safer?

With a standard prompt, the answer is: It flips a coin.

Right now, we pass goals to LLMs as flat, comma-separated lists. The AI hears "safety" and "conciseness" as equal priorities. There is no built-in mechanism to tell the model that a medical safety constraint vastly outranks a request for snappy prose.

That gap between what you mean and what the model hears is a massive problem for reliable AI. We recently solved this by building a system called Intent Engineering, relying on "Value Hierarchies."

Here is a breakdown of how it works, why it matters, and how you can actually give your AI a machine-readable "conscience."

The Problem: AI Goals Are Unordered
In most AI pipelines today, there are three massive blind spots:

Goals have no rank. optimize(goals="clarity, safety") treats both equally.
The routing ignores intent. Many systems route simple-looking prompts to cheaper models to save money, even if the user's intent requires deep reasoning.
No memory. Users have to re-explain priorities in every prompt.
The Fix: Value Hierarchies
Instead of a flat list of words, we created a data model that forces the AI to rank its priorities. We broke this down into four tiers: NON-NEGOTIABLE, HIGH, MEDIUM, and LOW.

Here is what the actual data structures look like under the hood (defined in our FastAPI backend):

class PriorityLabel(str, Enum):
    NON_NEGOTIABLE = "NON-NEGOTIABLE"  # Forces the smartest routing tier
    HIGH           = "HIGH"            # Forces at least a hybrid tier
    MEDIUM         = "MEDIUM"          # Prompt-level guidance only
    LOW            = "LOW"             # Prompt-level guidance only

class HierarchyEntry(BaseModel):
    goal: str                    
    label: PriorityLabel
    description: Optional[str]   

class ValueHierarchy(BaseModel):
    name: Optional[str]                  
    entries: List[HierarchyEntry]        
    conflict_rule: Optional[str]

By structuring the data this way, we can inject these rules into the AI's behavior at two critical levels.

Level 1: Changing the AI's "Brain" (Prompt Injection)
If a user defines a Value Hierarchy, we automatically intercept the request and inject a DIRECTIVES block directly into the LLM's system prompt.

...existing system prompt...

INTENT ENGINEERING DIRECTIVES (user-defined — enforce strictly):
When optimization goals conflict, resolve in this order:
  1. [NON-NEGOTIABLE] safety: Always prioritise safety
  2. [HIGH] clarity
  3. [MEDIUM] conciseness

Conflict resolution: Safety first, always.

(Technical note: We use entry.label.value here because Python 3.11+ changed how string-subclassing enums work. This ensures the prompt gets the exact string "NON-NEGOTIABLE".)

Level 2: The "Bouncer" (Routing Tiers)
Telling the LLM to be safe is great, but what if your system's router decides to send the prompt to a cheap, fast model to save compute?

We built a Router Tier Floor. If you tag a goal as NON-NEGOTIABLE, the system mathematically prevents the request from being routed to a lower-tier model.

# Calculate the base score for the prompt 
score = await self._calculate_routing_score(prompt, context, ...)

# The Floor: Only fires when a hierarchy is active:
if value_hierarchy and value_hierarchy.entries:
    has_non_negotiable = any(
        e.label == PriorityLabel.NON_NEGOTIABLE for e in value_hierarchy.entries
    )
    has_high = any(
        e.label == PriorityLabel.HIGH for e in value_hierarchy.entries
    )

    # Force the request to a smarter model tier based on priority
    if has_non_negotiable:
        score["final_score"] = max(score.get("final_score", 0.0), 0.72) # Guaranteed LLM
    elif has_high:
        score["final_score"] = max(score.get("final_score", 0.0), 0.45) #

Guaranteed Hybrid
Keeping it Fast (Cache Isolation)
To ensure that requests with hierarchies don't get mixed up in the cache, we generate a deterministic 8-character fingerprint for the cache key.

def _hierarchy_fingerprint(value_hierarchy) -> str:
    if not value_hierarchy or not value_hierarchy.entries:
        return ""   # empty string → same cache key as usual
    return hashlib.md5(
        json.dumps(
            [{"goal": e.goal, "label": str(e.label)} for e in entries],
            sort_keys=True
        ).encode()
    ).hexdigest()[:8]

Putting it into Practice (MCP Integration)
We integrated this into the Model Context Protocol (MCP). Here is the tool payload for a "Medical Safety Stack":

{
  "tool": "define_value_hierarchy",
  "arguments": {
    "name": "Medical Safety Stack",
    "entries":[
      { "goal": "safety", "label": "NON-NEGOTIABLE", "description": "Always prioritise patient safety" },
      { "goal": "clarity", "label": "HIGH" },
      { "goal": "conciseness", "label": "MEDIUM" }
    ],
    "conflict_rule": "Safety first, always."
  }
}

TL;DR
Prompt engineering is about telling an AI what to do. Intent engineering is about telling an AI how to prioritize.

If you want to play around with this, you can install the Prompt Optimizer via:

npm install -g mcp-prompt-optimizer

Would love to hear how you guys are handling conflicting constraints in your own pipelines!

DEV Community

Why your AI keeps ignoring your safety constraints (and how we fixed it by engineering "Intent")

Top comments (0)