Validate LLM Output Before It Reaches Your Users

#hermeschallenge #ai #python #agents

The City That Was a Phone Number

The field label said "city." The model returned "(555) 234-7890."

It was not the model's fault. The prompt was ambiguous. The user had asked a question that contained a phone number, and the model answered the question instead of extracting the city. The output passed JSON parsing. It passed length checks. It went into the database. Three downstream reports were wrong before anyone noticed.

The bug was not in the model. It was in the assumption that a well-formed string is a valid string. Those are different things.

llm-output-validator is a Python library for defining rules that LLM string output must pass before your code uses it. Rules check length, regex patterns, allowed values, PII presence, and JSON schema. Each rule reports which constraint failed and what the actual value was. You decide what to do with that information.

The Shape of the Fix

from llm_output_validator import OutputValidator, rules

validator = OutputValidator([
    rules.Length(min=2, max=100),
    rules.Regex(r"^[A-Za-z\s\-]+$", description="letters and spaces only"),
    rules.NoPII(),
])

result = validator.validate("Dallas")
if result.ok:
    city = result.value
else:
    print(f"Rule '{result.failed_rule}' failed: {result.detail}")

When the model returns a phone number instead of a city:

result = validator.validate("(555) 234-7890")
# result.ok == False
# result.failed_rule == "Regex"
# result.detail == "value did not match pattern '^[A-Za-z\\s\\-]+$'"

The ValidationResult object always carries the original value, the name of the first failing rule, and a human-readable detail string. You can log the detail, retry with a stronger prompt, or return a default.

Here is a more complex example with allowed values and JSON:

from llm_output_validator import OutputValidator, rules

# Classify sentiment as exactly one of three values
sentiment_validator = OutputValidator([
    rules.AllowedValues(["positive", "negative", "neutral"]),
])

result = sentiment_validator.validate("mostly positive")
# Fails: "mostly positive" is not in the allowed set

# Validate a JSON response against a schema
json_validator = OutputValidator([
    rules.Length(max=5000),
    rules.JSONSchema({
        "type": "object",
        "required": ["name", "score"],
        "properties": {
            "name": {"type": "string"},
            "score": {"type": "number", "minimum": 0, "maximum": 1},
        },
    }),
])

The JSONSchema rule uses jsonschema as an optional dependency. If jsonschema is not installed, importing the rule raises ImportError with a clear message telling you what to install.

What It Does NOT Do

The validator does not call the model again. It validates output after you receive it. Retry logic is your responsibility.

The NoPII rule uses regex patterns for common PII formats: US Social Security numbers, email addresses, phone numbers, and credit card patterns with Luhn checking. It is not a comprehensive PII detection system. It will miss PII in non-English languages and unusual formats.

The AllowedValues rule does exact string matching by default. It does not do fuzzy matching or case-insensitive comparison unless you pass case_sensitive=False. If your model sometimes returns "Positive" instead of "positive," configure the rule accordingly.

Rules are checked in order and the validator stops at the first failure. It does not report all failing rules at once. If you need to know all failures, run each rule independently.

Inside the Library

Each rule is a class with a single validate(value: str) -> ValidationResult method. Adding a custom rule is straightforward:

from llm_output_validator import BaseRule, ValidationResult

class StartsWithCapital(BaseRule):
    name = "StartsWithCapital"

    def validate(self, value: str) -> ValidationResult:
        if value and value[0].isupper():
            return ValidationResult.ok(value)
        return ValidationResult.fail(
            rule=self.name,
            value=value,
            detail="first character must be uppercase",
        )

The OutputValidator class is a thin list wrapper. It holds rules, calls each in order, and returns the first failure. There is no global registry, no metaclass magic, and no decorator protocol. Rules are plain classes.

The 43 tests cover every built-in rule with valid inputs, invalid inputs, boundary values, and combinations. There are tests for the optional jsonschema import path and tests that verify behavior when jsonschema is absent. Test isolation is strict. No rule test touches another rule.

The hardest design decision was whether ValidationResult should be a dataclass or a named tuple. Dataclass won because it makes the ok and fail constructors readable and keeps the attributes named rather than positional. A named tuple would be slightly faster for attribute access but the difference is immeasurable at the call volumes LLM output validation runs at.

When It Helps and When It Doesn't

It helps most when you are using LLM output to populate a structured data field: city names, sentiment labels, JSON blobs, IDs, codes, or any constrained vocabulary. The model might return something syntactically valid but semantically wrong. A rule catches that before the wrong value propagates.

It helps in extraction pipelines where you are pulling structured facts from unstructured text. A well-placed AllowedValues rule catches the model hallucinating a category that is not in your taxonomy.

It helps less for free-form content generation where correctness is subjective. Validating a blog post with a regex rule is usually not the right tool. Use an evaluation rubric instead.

It does not help if you do not act on validation failures. The library tells you what failed. It does not retry, log, or alert automatically. Build those behaviors around the ValidationResult.

Install

pip install git+https://github.com/MukundaKatta/llm-output-validator

Optional JSON Schema support:

pip install git+https://github.com/MukundaKatta/llm-output-validator jsonschema

Python 3.10+. Zero required dependencies.

Quick start:

from llm_output_validator import OutputValidator, rules

v = OutputValidator([rules.Length(min=1, max=200), rules.NoPII()])
result = v.validate(llm_response)
if not result.ok:
    raise ValueError(f"Output validation failed: {result.detail}")

Sibling Libraries

Library	What it does
`llm-pii-redact`	Redact PII from prompts before sending
`llm-cost-cap`	Reject calls that exceed a USD budget
`prompt-eval-rubric`	Score free-form output on 0.0-1.0 rubrics
`tool-result-validator`	Validate tool call return values
`agent-guard-rails`	Composable guardrails around agent output

The typical pattern is: redact PII on the way in with llm-pii-redact, then validate structure on the way out with llm-output-validator. Cost and rate limits sit at a higher level, wrapping the whole call.

What's Next

Two rules are under active development. The first is SemanticSimilarity, which checks that output is above a cosine similarity threshold relative to a reference string. That requires an embeddings model, so it will be an optional dependency behind a lazy import.

The second is LanguageCode, which checks that the output is in a specific language. Useful for multilingual pipelines where the model sometimes responds in the wrong language.

The Hermes Agent Challenge sharpened the library's error messages significantly. Early versions returned terse failure codes. The current version returns full English sentences describing what was expected and what was found. When you are debugging a failed validation at 2am, the sentence is worth the extra characters.

Contributions that add new built-in rules are welcome. The rule interface is stable. Write the class, write the tests for valid/invalid/boundary cases, and open a PR.