Stop Begging Your AI to Be Safe: The Case for Constraint Engineering

#agents #ai #security #softwareengineering

I am tired of "Prompt Engineering" as a safety strategy.

If you are building autonomous agents—AI that can actually do things like query databases, move files, or send emails—you have likely felt the anxiety. You write a prompt like:

"Please generate a SQL query to get the user count. IMPORTANT: Do not delete any tables. Please, I beg you, do not drop the database."

Then you cross your fingers and hope the probabilistic math of the LLM respects your polite request.

This is madness. In traditional software engineering, we don't ask user input "please don't be a SQL injection." We sanitize it. We use firewalls. We use strict typing. Yet with AI agents, we seem to have forgotten the basics of deterministic systems.

I recently built a module I call the Constraint Engine, and it completely changed how I trust my AI agents. Here is why we need to stop prompting for safety and start coding for it.

The Philosophy: Brain vs. Hand

The core problem is that we are treating the LLM as both the Brain (planning, reasoning) and the Hand (execution).

The Brain should be creative. It should be able to hallucinate wild ideas, draft complex plans, and think outside the box. But the Hand? The Hand should be boring. The Hand should be strictly regulated.

The architecture I implemented separates these two with a hard, deterministic logic layer:

The Brain (LLM): Generates a plan (e.g., "I'll delete these temp files to save space").
The Firewall (Constraint Engine): A Python script that checks the plan against hard rules (regex, whitelists, cost limits).
The Hand (Executor): Executes the plan only if the Firewall returns True.

As I put it in the documentation: "The Human builds the walls; the AI plays inside them."

The "Logic Firewall" Implementation

The implementation is surprisingly simple. It doesn't use another AI to check the AI (which just adds more cost and uncertainty). It uses standard, boring Python.

Here is the base structure:

@dataclass
class ConstraintViolation:
    rule_name: str
    severity: ViolationSeverity # CRITICAL, HIGH, MEDIUM, LOW
    message: str
    blocked_action: str

class ConstraintEngine:
    def validate_plan(self, plan: Dict[str, Any]) -> ConstraintResult:
        # Loop through rules and return approval status
        # If CRITICAL or HIGH severity violations exist, BLOCK.

The Rules are Dumb (And That's Good)

We don't need the AI to "understand" why deleting the database is bad. We just need to catch the syntax.

Take the SQLInjectionRule. It doesn't ask an LLM if the query is safe. It uses Regex to look for dangerous patterns.

class SQLInjectionRule(ConstraintRule):
    DANGEROUS_PATTERNS = [
        r'\bDROP\s+TABLE\b',
        r'\bDELETE\s+FROM\b.*\bWHERE\s+1\s*=\s*1\b',
        r';\s*DROP\b',  # Command chaining
    ]

    def validate(self, plan):
        query = plan.get("query", "")
        for pattern in self.DANGEROUS_PATTERNS:
            if re.search(pattern, query, re.IGNORECASE):
                return ConstraintViolation(
                    severity=ViolationSeverity.CRITICAL,
                    message=f"Dangerous SQL detected: {pattern}"
                )

Is this primitive? Yes. Is it 100% effective against a standard DROP TABLE command? Also yes. It doesn't matter how "persuasive" the jailbreak prompt was; this Regex doesn't care about context. It cares about syntax.

The Paradox: Safety Enables Creativity

This is the most counter-intuitive finding from this experiment.

Usually, when we want an AI agent to be safe, we turn the Temperature (randomness) down to 0.0. We want it robotic and predictable. But this kills the AI's ability to come up with clever solutions to complex problems.

With a Constraint Engine, you can actually crank the temperature up.

You can run your LLM at 0.9 or 1.0. Let it have wild ideas! Let it try to write a recursive file deletion script to clean up logs!

Scenario A (No Firewall): AI hallucinates rm -rf /. Server is wiped.
Scenario B (With Firewall): AI hallucinates rm -rf /. The FileOperationRule catches it. The agent receives an error: "Action Blocked: root deletion not allowed." The AI then self-corrects: "Ah, sorry. I will delete specific log files instead."

The firewall acts as the boundaries of the playground. You can let the "children" (AI) run wild, knowing the gate is locked.

Beyond SQL: Cost and Scope

This approach extends beyond just blocking dangerous commands. It works beautifully for budget control.

I implemented a CostLimitRule. If an agent proposes an API call sequence or a cloud operation that exceeds $0.05, the engine blocks it.

class CostLimitRule(ConstraintRule):
    def validate(self, plan):
        if plan['estimated_cost'] > self.max_cost:
            return ConstraintViolation(
                severity=ViolationSeverity.HIGH,
                message="Cost exceeds authorized limit."
            )

This prevents the "Infinite Loop Bankruptcy" scenario where an agent gets stuck in a loop calling an expensive API.

Summary

We are entering an era where AIs are no longer just chatbots; they are doers. Relying on the AI's "self-control" (via system prompts) to protect your infrastructure is negligent.

There are great enterprise tools out there like NVIDIA NeMo Guardrails that do semantic checking, but sometimes you just need a simple, deterministic Python class.