Mukunda Rao Katta

Posted on May 25

Stop Your Agent from Sneaking in 'I would recommend...' When You Said Only 'escalate', 'resolve', or 'defer'

#hermeschallenge #ai #python #agents

The switch statement that kept breaking

I had a classification agent. Simple job: read a support ticket, return one of three words. escalate, resolve, or defer. Downstream, a switch statement routed the ticket.

The model kept returning things like:

I would recommend escalating this ticket.
This should be deferred to the billing team.
Based on the context, resolving this seems appropriate.

Every one of those broke the switch. The model was technically right. It was also completely wrong for the contract. The strings did not equal "escalate", "resolve", or "defer". They contained those words, buried in a sentence.

I patched the prompt three times. Added "respond with only one word". Added "do not explain". Added a few-shot example. The model complied for a while, then slipped again on edge cases. Longer tickets, ambiguous language, anything that made the model feel like it needed to hedge.

The fix was not another prompt patch. The fix was a validation layer that enforced the contract on every response before the switch ever saw it.

The shape of the fix

Install:

pip install llm-output-validator

from llm_output_validator import Validator
from llm_output_validator.rules import AllowedPhrases

validator = Validator(rules=[
    AllowedPhrases(["escalate", "resolve", "defer"])
])

response = call_model(ticket_text)
result = validator.validate(response)

if not result.passed:
    # log it, retry, raise, whatever fits your loop
    raise ValueError(f"Model output failed validation: {result.failures}")

route(response.strip().lower())

For the JSON case, where the model is supposed to return structured output:

from llm_output_validator.rules import JsonValid, JsonMatchesSchema

schema = {
    "type": "object",
    "properties": {
        "action": {"type": "string", "enum": ["escalate", "resolve", "defer"]},
        "confidence": {"type": "number"}
    },
    "required": ["action", "confidence"]
}

validator = Validator(rules=[
    JsonValid(),
    JsonMatchesSchema(schema)  # requires: pip install llm-output-validator[jsonschema]
])

result = validator.validate(response)

Multiple rules compose. All must pass for result.passed to be True. The result.failures list tells you which rules failed and includes the failing excerpt so you can log it or include it in a retry prompt.

Custom rules are one class:

from llm_output_validator import Rule, ValidationResult

class NoApologyOpener(Rule):
    def validate(self, text: str) -> ValidationResult:
        lower = text.strip().lower()
        if lower.startswith("i'm sorry") or lower.startswith("i apologize"):
            return ValidationResult(passed=False, failures=["output opens with apology"], excerpt=text[:80])
        return ValidationResult(passed=True)

validator = Validator(rules=[NoApologyOpener()])

What it does NOT do

It does not retry the model. That is your loop. Validation tells you the output is bad. What you do next is up to you.
It does not parse structured output. Use JsonValid to confirm the string is valid JSON, but deserialization is on you.
It does not score outputs. Scoring (0.0 to 1.0, rubric-based) is a different problem. See prompt-eval-rubric below.
It does not fix the output. No mutation, no repair. If validation fails, you get a result that tells you why, not a corrected string.

Inside the lib: the AllowedPhrases design

This one takes a minute to explain because it is not the obvious implementation.

The obvious implementation: check that text.strip().lower() is exactly one of the allowed phrases. Reject anything else.

The problem with that: it is too strict for multi-sentence outputs. Some prompts return a short phrase but also include punctuation, whitespace, or capitalization variation. Some return a single sentence where the answer is the full sentence, not a single word.

The design here is permissive at the word/phrase level, not the output level. AllowedPhrases(["yes", "no", "maybe"]) does not check that the entire output equals one of those strings. It checks that any substantive response segment only uses phrases from the allowed set.

Concretely: "No." passes. "Yes, absolutely." fails because "absolutely" is not in the set. "Maybe, I'm not sure." fails. "no" passes. "No\n" passes.

The goal is to catch the model sneaking in qualifiers, hedges, or full-sentence answers when you specified a closed vocabulary. It does not require you to enumerate every punctuation and casing variant. It does require that the model not add vocabulary outside the set.

For truly strict matching (exact string, no variation allowed), use MatchesRegex with an anchored pattern:

from llm_output_validator.rules import MatchesRegex

validator = Validator(rules=[
    MatchesRegex(r"^(escalate|resolve|defer)$", flags=re.IGNORECASE)
])

Both approaches are valid. The choice depends on how much variation you want to tolerate.

When this is useful

Classification agents. Any prompt that says "return one of: X, Y, Z". The model will drift. This catches the drift.

Tool-use orchestrators. If a planning step returns a string that an outer loop parses into a next action, validate the string before parsing.

Content moderation checks. NoForbiddenPhrases(["sorry, I cannot", "as an AI"]) catches canned refusals that slip through when they should not.

Length contracts. An agent that summarizes content for a UI field. LengthBetween(0, 280) before the string goes into the database.

JSON extraction. JsonValid() before you call json.loads(). Avoids the JSONDecodeError that happens at 2am.

When NOT to use this

If the model output is free-form prose and you want to evaluate quality, this is the wrong tool. ValidationResult is binary. It does not score tone, coherence, or helpfulness. Use prompt-eval-rubric for that.

If you need the model to produce structured output reliably and want automatic retry with feedback, agentcast is closer to what you want. It validates, formats the error as a hint, and feeds it back to the model in a retry loop.

If the validation you need is complex enough that it requires calling another model, you are outside the scope of rule-based validation entirely.

Install

pip install llm-output-validator

# for JsonMatchesSchema support
pip install "llm-output-validator[jsonschema]"

Zero required dependencies. jsonschema is the only optional one, and only needed for JsonMatchesSchema.

43 tests. Python 3.9+.

Source: github.com/MukundaKatta/llm-output-validator

Siblings

This is part of a larger set of agent-tooling libraries. Here is where it fits relative to the closest neighbors:

Lib	Boundary	Repo
agentcast	Structured output enforce + retry. Feeds validation errors back to model. Different abstraction than validate-and-raise.	MukundaKatta/agentcast
llm-pii-redact	Validate that PII was not leaked in the output. Focused on data compliance, not format contracts.	MukundaKatta/llm-pii-redact
prompt-eval-rubric	0.0 to 1.0 scoring rubrics. Complementary: validates quality, not format.	MukundaKatta/prompt-eval-rubric
agentvet	Validates tool ARGS before execution. This library validates model OUTPUT after generation. Mirror concerns.	MukundaKatta/agentvet

What is next

A few things I want to add:

SentimentBounds(min_score, max_score) using a pluggable scorer so you can gate on tone without a hard model dependency.
A RetryHint field on ValidationResult so the failure message is pre-formatted for inclusion in a follow-up prompt.
AllOf / AnyOf rule combinators for more complex logic without subclassing.

If you have a validation rule you keep writing from scratch, open an issue.

Building this

This is entry 29 in a series of agent-tooling libraries I am building for the Hermes Agent Challenge. The constraint is simple: each library solves one narrow problem, ships with real tests, and has zero required dependencies where possible.

The pattern behind the series: agent failures tend to cluster at boundaries. The point where a model output becomes program input. The point where a tool argument becomes a side effect. The point where one agent hands off to another. These are the places where a single-purpose, inspectable library is worth more than a framework.

llm-output-validator lives at the output-to-input boundary. Validate there, and the downstream logic can trust what it receives.

DEV Community