Introducing correctover-patronus: 6-Dimensional Verification for Patronus AI

#ai #llm #verification #opensource

The Problem

LLM evaluation tools like Patronus AI excel at hallucination detection, toxicity checks, and semantic relevance. But they don't catch the structural failures:

A JSON response missing required fields
A function call with malformed parameters
Output that violates schema constraints
Latency budget overruns silently degrading UX
Cost explosions from runaway token usage

These aren't hallucinations. They're verification failures.

The Solution

correctover-patronus is an adapter that runs Correctover's 87 deterministic verification rules as native Patronus evaluators. Every verdict comes with a recomputable proof hash — meaning you can verify the verifier.

pip install correctover-patronus

The 6 Dimensions

Dimension	What It Checks	Example
Structure	Output format validity	JSON parses correctly
Schema	Field presence & types	Required fields exist
Identity	Semantic relevance to input	Response addresses the question
Integrity	Forbidden pattern absence	No Tracebacks or error messages
Latency	Response time budget	Under 30s threshold
Cost	Token usage budget	Under 10k token limit

Quick Start

from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig

config = CorrectoverConfig(
    min_confidence=0.7,
    latency_rules={"max_ms": 5000},
    cost_rules={"max_tokens": 4000}
)

evaluator = CorrectoverEvaluator(config=config)
result = evaluator.evaluate(
    task_input="Summarize this article...",
    task_output="The article discusses...",
    task_context={"source": "article", "word_count": 1500}
)

print(f"Overall: {result.score:.2f} ({'PASS' if result.pass_ else 'FAIL'})")
print(f"Proof hash: {result.metadata['proof_hash']}")

Recomputable Proof

Every evaluation produces a proof_hash in the metadata. This hash covers:

The input text
The output text
The verification rules applied
The verdict for each dimension

You can re-run the same verification and get the same hash. No black boxes.

Performance

P50 verification latency: 22μs
Self-healing rules: 87
SDK size: 586KB
Zero external API calls — fully deterministic, local execution

DEV Community