DEV Community

correctover
correctover

Posted on

Introducing correctover-patronus: 6-Dimensional Verification for Patronus AI

The Problem

LLM evaluation tools like Patronus AI excel at hallucination detection, toxicity checks, and semantic relevance. But they don't catch the structural failures:

  • A JSON response missing required fields
  • A function call with malformed parameters
  • Output that violates schema constraints
  • Latency budget overruns silently degrading UX
  • Cost explosions from runaway token usage

These aren't hallucinations. They're verification failures.

The Solution

correctover-patronus is an adapter that runs Correctover's 87 deterministic verification rules as native Patronus evaluators. Every verdict comes with a recomputable proof hash — meaning you can verify the verifier.

pip install correctover-patronus
Enter fullscreen mode Exit fullscreen mode

The 6 Dimensions

Dimension What It Checks Example
Structure Output format validity JSON parses correctly
Schema Field presence & types Required fields exist
Identity Semantic relevance to input Response addresses the question
Integrity Forbidden pattern absence No Tracebacks or error messages
Latency Response time budget Under 30s threshold
Cost Token usage budget Under 10k token limit

Quick Start

from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig

config = CorrectoverConfig(
    min_confidence=0.7,
    latency_rules={"max_ms": 5000},
    cost_rules={"max_tokens": 4000}
)

evaluator = CorrectoverEvaluator(config=config)
result = evaluator.evaluate(
    task_input="Summarize this article...",
    task_output="The article discusses...",
    task_context={"source": "article", "word_count": 1500}
)

print(f"Overall: {result.score:.2f} ({'PASS' if result.pass_ else 'FAIL'})")
print(f"Proof hash: {result.metadata['proof_hash']}")
Enter fullscreen mode Exit fullscreen mode

Recomputable Proof

Every evaluation produces a proof_hash in the metadata. This hash covers:

  • The input text
  • The output text
  • The verification rules applied
  • The verdict for each dimension

You can re-run the same verification and get the same hash. No black boxes.

Performance

  • P50 verification latency: 22μs
  • Self-healing rules: 87
  • SDK size: 586KB
  • Zero external API calls — fully deterministic, local execution

Links


Failover ≠ Correctover.

Top comments (0)